Saltar a contenido

05 - Building Images with Dockerfile

What this session is

About an hour. You'll learn to build your own images using a Dockerfile - the recipe text file that tells Docker how to construct an image step by step.

A first Dockerfile

Create a folder myimage/. Inside, create a file named exactly Dockerfile (no extension):

FROM alpine:3.20

RUN apk add --no-cache curl

CMD ["echo", "hello from my image"]

Build it:

cd myimage
docker build -t myimage:1.0 .

The . at the end means "use the current directory as the build context."

Run:

docker run --rm myimage:1.0
# hello from my image

docker run --rm myimage:1.0 curl --version
# (prints curl's version because we override the default CMD)

You just built a custom image.

The instructions you'll use most

Instruction What it does
FROM image:tag Base image to start from. Always the first line.
RUN command Run a shell command at build time (e.g. install packages).
COPY src dest Copy files from the build context into the image.
ADD src dest Like COPY but also fetches URLs and unpacks tarballs. Prefer COPY.
WORKDIR path cd to this dir; affects subsequent RUN/CMD/COPY.
ENV KEY=value Set an environment variable.
EXPOSE port Documentation only - declares the container listens on this port. Does NOT publish it.
CMD ["a", "b"] Default command when the container starts. Overridable at docker run.
ENTRYPOINT ["a", "b"] Command that always runs. Args from CMD or docker run are appended.
USER name-or-uid Switch to this user for subsequent layers and runtime.
ARG name=default Build-time variable. Use with --build-arg.

A realistic Dockerfile (Python app)

Suppose you have a small Python script app.py:

import http.server, socketserver, os
port = int(os.environ.get("PORT", "8000"))
with socketserver.TCPServer(("", port), http.server.SimpleHTTPRequestHandler) as httpd:
    print(f"serving on {port}")
    httpd.serve_forever()

And a requirements.txt (empty for this example, but typically lists pip packages).

Your Dockerfile:

FROM python:3.12-slim

WORKDIR /app

# Install dependencies first (separate from app code for cache reuse)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application
COPY app.py .

EXPOSE 8000
ENV PORT=8000

CMD ["python", "app.py"]

Build and run:

docker build -t pyapp:1.0 .
docker run -d --rm --name pyapp -p 8000:8000 pyapp:1.0
curl http://localhost:8000/

Why the line order matters: layer caching

Each Dockerfile instruction creates a layer. Docker caches layers and reuses them on rebuilds if the instruction (and its inputs) haven't changed.

Order things so the most-frequently-changing things come last:

# Bad: every code change invalidates the pip install layer
FROM python:3.12-slim
WORKDIR /app
COPY . .                    # any file change invalidates this
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

# Good: dependencies cached separately
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .     # changes rarely
RUN pip install -r requirements.txt   # cached unless requirements.txt changed
COPY . .                    # changes often
CMD ["python", "app.py"]

The second form rebuilds in seconds when only your code changed, vs minutes when pip re-runs.

ENTRYPOINT vs CMD

Confusing topic. Quick answers:

  • CMD is the default command. Easily overridden at docker run image arg1 arg2.
  • ENTRYPOINT is what always runs. CMD (and docker run args) are passed as arguments to ENTRYPOINT.

Common patterns:

Pattern 1 - CMD only (most common):

CMD ["python", "app.py"]
docker run image runs python app.py. docker run image bash runs bash (overrides CMD).

Pattern 2 - ENTRYPOINT + CMD (for wrapper apps):

ENTRYPOINT ["python", "app.py"]
CMD ["--default-arg"]
docker run image runs python app.py --default-arg. docker run image --other-arg runs python app.py --other-arg (CMD overridden, ENTRYPOINT kept).

For your own images: start with just CMD. Reach for ENTRYPOINT only when you have a clear use case.

Use a non-root user

By default, containers run as root inside the container. Even though the container is isolated, running as root means if there's a container-escape bug, the attacker is root on the host (assuming user namespaces aren't configured).

Add a user:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .

RUN useradd --create-home --shell /bin/bash app && chown -R app:app /app
USER app

CMD ["python", "app.py"]

Now the container's main process runs as app, not root. Required by many production environments. Page 10 covers more security basics.

.dockerignore: keep junk out of the build context

The . in docker build . sends everything in the current directory to the Docker daemon. If your folder has .git, node_modules, target/, build artifacts, those bloat the build context.

Create .dockerignore (same folder as the Dockerfile):

.git
.gitignore
.idea
.vscode
node_modules
target
__pycache__
*.pyc
.env
Dockerfile
.dockerignore

Same syntax as .gitignore. Speeds builds; reduces image size; prevents accidental secrets in images.

Build arguments and labels

FROM alpine:3.20

ARG VERSION=unknown
LABEL org.opencontainers.image.version=$VERSION
LABEL org.opencontainers.image.source="https://github.com/example/repo"

RUN echo "building version $VERSION"

Build with:

docker build --build-arg VERSION=1.2.3 -t myimage:1.2.3 .

ARG is build-time only (gone at runtime). LABEL stays on the image, queryable with docker inspect. The org.opencontainers.image.* labels are a convention - many tools (Docker Hub, GitHub Container Registry) read them.

Tag the build

docker build -t myimage:1.0 .
docker build -t myimage:1.0 -t myimage:latest .       # two tags at once

Use semantic versions for releases; :latest for the latest. (Don't depend on :latest in production - pin specific versions.)

Inspect what you built

docker images
docker history myimage:1.0          # show layers
docker inspect myimage:1.0          # full metadata

docker history shows you each layer's size. Useful for figuring out where the bloat is.

Exercise

  1. Build the Python example above (pyapp:1.0). Run, curl localhost:8000, see the directory listing it serves.

  2. Make a small change to app.py (change the printed message) and rebuild. Notice the layers BEFORE the COPY were cached.

  3. Add a non-root user to the Dockerfile, rebuild, and confirm whoami inside the container reports app not root:

    docker run --rm pyapp:1.0 whoami
    

  4. Create a .dockerignore that excludes __pycache__ and .git. Rebuild; note any difference in build context size (Docker reports it at the start of a build).

  5. Build with a version arg:

    docker build --build-arg VERSION=1.2.3 -t pyapp:1.2.3 .
    docker inspect pyapp:1.2.3 | grep -A1 Labels
    

What you might wonder

"Why RUN pip install --no-cache-dir?" pip caches downloads in ~/.cache/pip. That cache is useless inside an image (you've already installed); only bloats the layer. --no-cache-dir skips it.

"Why COPY . . and not ADD . .?" COPY does exactly what it says - copy files. ADD also extracts tarballs and fetches URLs, which is more magic than you usually want. Prefer COPY; use ADD only for those specific features.

"What's a build context?" The directory you pass to docker build (the . at the end). Everything in it is sent to the Docker daemon - that's what the COPY commands draw from. The Dockerfile itself isn't special; it's just one file in the context.

"Can I have multiple Dockerfiles?" Yes - docker build -f Dockerfile.dev -t foo:dev . uses a non-default-named one. Useful for "Dockerfile" + "Dockerfile.prod" + "Dockerfile.test" variants.

Done

  • Write a Dockerfile from scratch.
  • Use FROM, RUN, COPY, WORKDIR, ENV, EXPOSE, CMD, USER.
  • Order instructions for cache friendliness.
  • Distinguish CMD from ENTRYPOINT.
  • Use .dockerignore to keep junk out.
  • Build with --build-arg.

Next: Volumes and bind mounts →

Comments