Saltar a contenido

09 - Slimming Images

What this session is

About 45 minutes. Image size matters: smaller = faster pulls, faster cold starts, smaller attack surface. You'll learn multi-stage builds, .dockerignore, base-image choices, and the common slimming techniques.

Why size matters

A 1.5GB image and a 50MB image both run the same. But:

  • The 1.5GB image takes 30 seconds to pull on a slow link; the 50MB takes 1.
  • The 1.5GB has thousands of files (extra attack surface, more CVE matches).
  • Cold-start a serverless container from a 1.5GB image? Painful.
  • CI builds with 1.5GB intermediates eat disk and slow caching.

Aim for the smallest sensible image. Not the absolute smallest (that route lies madness); the smallest one you can build comfortably.

Picking a base

Start with the smallest base that works:

Base Size Best for
scratch 0 bytes Static binaries (Go, Rust) - no OS at all
gcr.io/distroless/static ~2MB Static binaries - has CA certs, tzdata, /etc/passwd
alpine:3.20 ~5MB Anything that works on musl (most things)
debian:bookworm-slim ~75MB Things that need glibc but don't need many tools
python:3.12-slim ~150MB Python apps (slim variant)
ubuntu:24.04 ~80MB When you need a familiar full distro

Rule of thumb: start with alpine or *-slim. Reach for full distros only when a wheel/binary doesn't work on the smaller one.

Multi-stage builds

The biggest slimming win. Use one stage to build, another to package the result. Build tools, source code, test artifacts don't ship.

A real example - Go:

# Stage 1: build
FROM golang:1.23 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app/myapp ./cmd/myapp

# Stage 2: ship just the binary
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/myapp"]

Build:

docker build -t myapp:1.0 .
docker images myapp:1.0           # ~10MB instead of 1GB+

The FROM ... AS name creates a named stage. The COPY --from=builder copies from the previous stage. Only the final stage ships.

Same idea for any compiled language. For Rust:

FROM rust:1.80 AS builder
WORKDIR /src
COPY . .
RUN cargo build --release

FROM gcr.io/distroless/cc-debian12
COPY --from=builder /src/target/release/myapp /myapp
CMD ["/myapp"]

For Node.js (interpreted, but you can still avoid shipping dev-deps):

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev
USER node
CMD ["node", "dist/index.js"]

The second stage installs only production dependencies. Builds drop from "MB of dev deps + source + dist" to "just dist + runtime deps."

Distroless: nearly-empty base images

Google's "distroless" images (gcr.io/distroless/*) contain: - The language runtime (for python, java, etc.) - OR nothing (static). - CA certificates, tzdata, /etc/passwd, a few essentials. - No shell, no package manager, no debug tools.

Pros: tiny, minimal attack surface, no shell-injection footholds. Cons: harder to debug (no docker exec ... sh). For that, distroless ships a :debug variant for occasional use.

For static-binary languages (Go, Rust) shipping a CLI: distroless/static. For Java: distroless/java. For Python: distroless/python3. (Each has variants.)

.dockerignore

Already covered in page 05. Critical: anything not in .dockerignore is sent to the daemon as build context. .git, node_modules, target/, build caches all bloat builds.

A reasonable .dockerignore for a polyglot project:

.git
.gitignore
.dockerignore
Dockerfile*
.idea
.vscode
*.md
node_modules
target
__pycache__
*.pyc
.env
.env.*
dist
build
coverage
.cache

Combine RUN instructions

Each RUN creates a layer. If you RUN apt-get install foo then RUN apt-get remove foo, the second layer doesn't actually reclaim the disk - the first layer still has the package files.

Combine into one RUN:

# Bad - bloats the image
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# Good - one layer, ends clean
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

Three patterns above: - --no-install-recommends skips optional dependencies. - rm -rf /var/lib/apt/lists/* removes the apt cache. - Everything in one RUN so the cleanup is in the same layer.

Specific minor wins

  • Don't store secrets in the image. Pass them at runtime (env vars, mounts, secret managers). COPY them into a layer and they're there forever, even if you delete them in a later layer.
  • Set WORKDIR once at the top instead of cd in RUNs. Cleaner.
  • Pin versions in apt-get install foo=1.2.3. Reproducible builds.
  • Use --mount=type=cache (BuildKit) for things like apt/pip/go mod caches that should persist across builds without being in the image.

A typical "before/after"

A naive Python Dockerfile, ~1GB:

FROM python:3.12
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

Slimmed version, ~120MB:

FROM python:3.12-slim
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

RUN useradd --create-home --shell /bin/bash app && chown -R app /app
USER app

CMD ["python", "app.py"]

Changes: - python:3.12python:3.12-slim (Debian slim base). - requirements.txt separately (cache reuse on code changes). - --no-cache-dir (no pip cache in image). - Non-root user.

Multi-stage if you have compiled wheels takes it to ~80MB.

Exercise

  1. Build a Go hello-world with multi-stage:

    // hello.go
    package main
    func main() { println("hello from container") }
    
    // go.mod
    module hello
    
    go 1.23
    
    # Dockerfile
    FROM golang:1.23 AS builder
    WORKDIR /src
    COPY go.mod .
    COPY hello.go .
    RUN CGO_ENABLED=0 go build -o /app/hello .
    
    FROM gcr.io/distroless/static
    COPY --from=builder /app/hello /hello
    ENTRYPOINT ["/hello"]
    
    Build, run, check size:
    docker build -t hello:1.0 .
    docker run --rm hello:1.0
    docker images hello:1.0
    
    Should be ~5MB. Compare to a single-stage build using golang:1.23 directly - ~1GB.

  2. Find what's bloating an image with docker history:

    docker history python:3.12 --human --format "{{.Size}}\t{{.CreatedBy}}" | head
    docker history python:3.12-slim --human --format "{{.Size}}\t{{.CreatedBy}}" | head
    
    Note where the size differences come from.

  3. .dockerignore test: create a folder with a .git directory full of stuff. Build a trivial Dockerfile that just does COPY . /app. Note the "Sending build context to Docker daemon" line - large. Add .dockerignore with .git. Rebuild; context is much smaller.

What you might wonder

"Why does Alpine cause weird pip install issues?" Alpine uses musl libc (most Linux uses glibc). Many Python wheels are pre-compiled against glibc - they don't have musl variants, so pip falls back to compiling from source (slow, often fails). For Python on Alpine, expect occasional headaches; *-slim (Debian-based) is friendlier.

"What's BuildKit?" The modern Docker build engine, default in recent Docker. Faster, supports advanced features (cache mounts, secret mounts, multi-platform builds). Enable with DOCKER_BUILDKIT=1 (or it's already on).

"Should I shoot for the smallest possible image?" No. Shoot for "small enough to feel light, easy enough to maintain." A 50MB image is often a better trade-off than a 5MB one if the 5MB takes hours of debugging to keep working.

Done

  • Pick base images by size and ecosystem fit.
  • Use multi-stage builds.
  • Use distroless for static-binary-only ships.
  • Use .dockerignore.
  • Combine RUNs to minimize layers.

Next: Security basics →

Comments