05 - Building Images with Dockerfile¶
What this session is¶
About an hour. You'll learn to build your own images using a Dockerfile - the recipe text file that tells Docker how to construct an image step by step.
A first Dockerfile¶
Create a folder myimage/. Inside, create a file named exactly Dockerfile (no extension):
Build it:
The . at the end means "use the current directory as the build context."
Run:
docker run --rm myimage:1.0
# hello from my image
docker run --rm myimage:1.0 curl --version
# (prints curl's version because we override the default CMD)
You just built a custom image.
The instructions you'll use most¶
| Instruction | What it does |
|---|---|
FROM image:tag |
Base image to start from. Always the first line. |
RUN command |
Run a shell command at build time (e.g. install packages). |
COPY src dest |
Copy files from the build context into the image. |
ADD src dest |
Like COPY but also fetches URLs and unpacks tarballs. Prefer COPY. |
WORKDIR path |
cd to this dir; affects subsequent RUN/CMD/COPY. |
ENV KEY=value |
Set an environment variable. |
EXPOSE port |
Documentation only - declares the container listens on this port. Does NOT publish it. |
CMD ["a", "b"] |
Default command when the container starts. Overridable at docker run. |
ENTRYPOINT ["a", "b"] |
Command that always runs. Args from CMD or docker run are appended. |
USER name-or-uid |
Switch to this user for subsequent layers and runtime. |
ARG name=default |
Build-time variable. Use with --build-arg. |
A realistic Dockerfile (Python app)¶
Suppose you have a small Python script app.py:
import http.server, socketserver, os
port = int(os.environ.get("PORT", "8000"))
with socketserver.TCPServer(("", port), http.server.SimpleHTTPRequestHandler) as httpd:
print(f"serving on {port}")
httpd.serve_forever()
And a requirements.txt (empty for this example, but typically lists pip packages).
Your Dockerfile:
FROM python:3.12-slim
WORKDIR /app
# Install dependencies first (separate from app code for cache reuse)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the application
COPY app.py .
EXPOSE 8000
ENV PORT=8000
CMD ["python", "app.py"]
Build and run:
docker build -t pyapp:1.0 .
docker run -d --rm --name pyapp -p 8000:8000 pyapp:1.0
curl http://localhost:8000/
Why the line order matters: layer caching¶
Each Dockerfile instruction creates a layer. Docker caches layers and reuses them on rebuilds if the instruction (and its inputs) haven't changed.
Order things so the most-frequently-changing things come last:
# Bad: every code change invalidates the pip install layer
FROM python:3.12-slim
WORKDIR /app
COPY . . # any file change invalidates this
RUN pip install -r requirements.txt
CMD ["python", "app.py"]
# Good: dependencies cached separately
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt . # changes rarely
RUN pip install -r requirements.txt # cached unless requirements.txt changed
COPY . . # changes often
CMD ["python", "app.py"]
The second form rebuilds in seconds when only your code changed, vs minutes when pip re-runs.
ENTRYPOINT vs CMD¶
Confusing topic. Quick answers:
CMDis the default command. Easily overridden atdocker run image arg1 arg2.ENTRYPOINTis what always runs. CMD (anddocker runargs) are passed as arguments to ENTRYPOINT.
Common patterns:
Pattern 1 - CMD only (most common):
docker run image runs python app.py. docker run image bash runs bash (overrides CMD).
Pattern 2 - ENTRYPOINT + CMD (for wrapper apps):
docker run image runs python app.py --default-arg. docker run image --other-arg runs python app.py --other-arg (CMD overridden, ENTRYPOINT kept).
For your own images: start with just CMD. Reach for ENTRYPOINT only when you have a clear use case.
Use a non-root user¶
By default, containers run as root inside the container. Even though the container is isolated, running as root means if there's a container-escape bug, the attacker is root on the host (assuming user namespaces aren't configured).
Add a user:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
RUN useradd --create-home --shell /bin/bash app && chown -R app:app /app
USER app
CMD ["python", "app.py"]
Now the container's main process runs as app, not root. Required by many production environments. Page 10 covers more security basics.
.dockerignore: keep junk out of the build context¶
The . in docker build . sends everything in the current directory to the Docker daemon. If your folder has .git, node_modules, target/, build artifacts, those bloat the build context.
Create .dockerignore (same folder as the Dockerfile):
Same syntax as .gitignore. Speeds builds; reduces image size; prevents accidental secrets in images.
Build arguments and labels¶
FROM alpine:3.20
ARG VERSION=unknown
LABEL org.opencontainers.image.version=$VERSION
LABEL org.opencontainers.image.source="https://github.com/example/repo"
RUN echo "building version $VERSION"
Build with:
ARG is build-time only (gone at runtime). LABEL stays on the image, queryable with docker inspect. The org.opencontainers.image.* labels are a convention - many tools (Docker Hub, GitHub Container Registry) read them.
Tag the build¶
Use semantic versions for releases; :latest for the latest. (Don't depend on :latest in production - pin specific versions.)
Inspect what you built¶
docker history shows you each layer's size. Useful for figuring out where the bloat is.
Exercise¶
-
Build the Python example above (
pyapp:1.0). Run, curl localhost:8000, see the directory listing it serves. -
Make a small change to
app.py(change the printed message) and rebuild. Notice the layers BEFORE the COPY were cached. -
Add a non-root user to the Dockerfile, rebuild, and confirm
whoamiinside the container reportsappnotroot: -
Create a
.dockerignorethat excludes__pycache__and.git. Rebuild; note any difference in build context size (Docker reports it at the start of a build). -
Build with a version arg:
What you might wonder¶
"Why RUN pip install --no-cache-dir?"
pip caches downloads in ~/.cache/pip. That cache is useless inside an image (you've already installed); only bloats the layer. --no-cache-dir skips it.
"Why COPY . . and not ADD . .?"
COPY does exactly what it says - copy files. ADD also extracts tarballs and fetches URLs, which is more magic than you usually want. Prefer COPY; use ADD only for those specific features.
"What's a build context?"
The directory you pass to docker build (the . at the end). Everything in it is sent to the Docker daemon - that's what the COPY commands draw from. The Dockerfile itself isn't special; it's just one file in the context.
"Can I have multiple Dockerfiles?"
Yes - docker build -f Dockerfile.dev -t foo:dev . uses a non-default-named one. Useful for "Dockerfile" + "Dockerfile.prod" + "Dockerfile.test" variants.
Done¶
- Write a Dockerfile from scratch.
- Use FROM, RUN, COPY, WORKDIR, ENV, EXPOSE, CMD, USER.
- Order instructions for cache friendliness.
- Distinguish CMD from ENTRYPOINT.
- Use
.dockerignoreto keep junk out. - Build with
--build-arg.
Next: Volumes and bind mounts →