09 - Slimming Images¶
What this session is¶
About 45 minutes. Image size matters: smaller = faster pulls, faster cold starts, smaller attack surface. You'll learn multi-stage builds, .dockerignore, base-image choices, and the common slimming techniques.
Why size matters¶
A 1.5GB image and a 50MB image both run the same. But:
- The 1.5GB image takes 30 seconds to pull on a slow link; the 50MB takes 1.
- The 1.5GB has thousands of files (extra attack surface, more CVE matches).
- Cold-start a serverless container from a 1.5GB image? Painful.
- CI builds with 1.5GB intermediates eat disk and slow caching.
Aim for the smallest sensible image. Not the absolute smallest (that route lies madness); the smallest one you can build comfortably.
Picking a base¶
Start with the smallest base that works:
| Base | Size | Best for |
|---|---|---|
scratch |
0 bytes | Static binaries (Go, Rust) - no OS at all |
gcr.io/distroless/static |
~2MB | Static binaries - has CA certs, tzdata, /etc/passwd |
alpine:3.20 |
~5MB | Anything that works on musl (most things) |
debian:bookworm-slim |
~75MB | Things that need glibc but don't need many tools |
python:3.12-slim |
~150MB | Python apps (slim variant) |
ubuntu:24.04 |
~80MB | When you need a familiar full distro |
Rule of thumb: start with alpine or *-slim. Reach for full distros only when a wheel/binary doesn't work on the smaller one.
Multi-stage builds¶
The biggest slimming win. Use one stage to build, another to package the result. Build tools, source code, test artifacts don't ship.
A real example - Go:
# Stage 1: build
FROM golang:1.23 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app/myapp ./cmd/myapp
# Stage 2: ship just the binary
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/myapp"]
Build:
The FROM ... AS name creates a named stage. The COPY --from=builder copies from the previous stage. Only the final stage ships.
Same idea for any compiled language. For Rust:
FROM rust:1.80 AS builder
WORKDIR /src
COPY . .
RUN cargo build --release
FROM gcr.io/distroless/cc-debian12
COPY --from=builder /src/target/release/myapp /myapp
CMD ["/myapp"]
For Node.js (interpreted, but you can still avoid shipping dev-deps):
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev
USER node
CMD ["node", "dist/index.js"]
The second stage installs only production dependencies. Builds drop from "MB of dev deps + source + dist" to "just dist + runtime deps."
Distroless: nearly-empty base images¶
Google's "distroless" images (gcr.io/distroless/*) contain:
- The language runtime (for python, java, etc.) - OR nothing (static).
- CA certificates, tzdata, /etc/passwd, a few essentials.
- No shell, no package manager, no debug tools.
Pros: tiny, minimal attack surface, no shell-injection footholds.
Cons: harder to debug (no docker exec ... sh). For that, distroless ships a :debug variant for occasional use.
For static-binary languages (Go, Rust) shipping a CLI: distroless/static. For Java: distroless/java. For Python: distroless/python3. (Each has variants.)
.dockerignore¶
Already covered in page 05. Critical: anything not in .dockerignore is sent to the daemon as build context. .git, node_modules, target/, build caches all bloat builds.
A reasonable .dockerignore for a polyglot project:
.git
.gitignore
.dockerignore
Dockerfile*
.idea
.vscode
*.md
node_modules
target
__pycache__
*.pyc
.env
.env.*
dist
build
coverage
.cache
Combine RUN instructions¶
Each RUN creates a layer. If you RUN apt-get install foo then RUN apt-get remove foo, the second layer doesn't actually reclaim the disk - the first layer still has the package files.
Combine into one RUN:
# Bad - bloats the image
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean
# Good - one layer, ends clean
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
rm -rf /var/lib/apt/lists/*
Three patterns above:
- --no-install-recommends skips optional dependencies.
- rm -rf /var/lib/apt/lists/* removes the apt cache.
- Everything in one RUN so the cleanup is in the same layer.
Specific minor wins¶
- Don't store secrets in the image. Pass them at runtime (env vars, mounts, secret managers).
COPYthem into a layer and they're there forever, even if you delete them in a later layer. - Set
WORKDIRonce at the top instead ofcdinRUNs. Cleaner. - Pin versions in
apt-get install foo=1.2.3. Reproducible builds. - Use
--mount=type=cache(BuildKit) for things likeapt/pip/go modcaches that should persist across builds without being in the image.
A typical "before/after"¶
A naive Python Dockerfile, ~1GB:
Slimmed version, ~120MB:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN useradd --create-home --shell /bin/bash app && chown -R app /app
USER app
CMD ["python", "app.py"]
Changes:
- python:3.12 → python:3.12-slim (Debian slim base).
- requirements.txt separately (cache reuse on code changes).
- --no-cache-dir (no pip cache in image).
- Non-root user.
Multi-stage if you have compiled wheels takes it to ~80MB.
Exercise¶
-
Build a Go hello-world with multi-stage:
Build, run, check size: Should be ~5MB. Compare to a single-stage build using# Dockerfile FROM golang:1.23 AS builder WORKDIR /src COPY go.mod . COPY hello.go . RUN CGO_ENABLED=0 go build -o /app/hello . FROM gcr.io/distroless/static COPY --from=builder /app/hello /hello ENTRYPOINT ["/hello"]golang:1.23directly - ~1GB. -
Find what's bloating an image with
Note where the size differences come from.docker history: -
.dockerignoretest: create a folder with a.gitdirectory full of stuff. Build a trivial Dockerfile that just doesCOPY . /app. Note the "Sending build context to Docker daemon" line - large. Add.dockerignorewith.git. Rebuild; context is much smaller.
What you might wonder¶
"Why does Alpine cause weird pip install issues?"
Alpine uses musl libc (most Linux uses glibc). Many Python wheels are pre-compiled against glibc - they don't have musl variants, so pip falls back to compiling from source (slow, often fails). For Python on Alpine, expect occasional headaches; *-slim (Debian-based) is friendlier.
"What's BuildKit?"
The modern Docker build engine, default in recent Docker. Faster, supports advanced features (cache mounts, secret mounts, multi-platform builds). Enable with DOCKER_BUILDKIT=1 (or it's already on).
"Should I shoot for the smallest possible image?" No. Shoot for "small enough to feel light, easy enough to maintain." A 50MB image is often a better trade-off than a 5MB one if the 5MB takes hours of debugging to keep working.
Done¶
- Pick base images by size and ecosystem fit.
- Use multi-stage builds.
- Use distroless for static-binary-only ships.
- Use
.dockerignore. - Combine
RUNs to minimize layers.