09 - Slimming Images¶

What this session is¶

About 45 minutes. Image size matters: smaller = faster pulls, faster cold starts, smaller attack surface. You'll learn multi-stage builds, .dockerignore, base-image choices, and the common slimming techniques.

Why size matters¶

A 1.5GB image and a 50MB image both run the same. But:

The 1.5GB image takes 30 seconds to pull on a slow link; the 50MB takes 1.
The 1.5GB has thousands of files (extra attack surface, more CVE matches).
Cold-start a serverless container from a 1.5GB image? Painful.
CI builds with 1.5GB intermediates eat disk and slow caching.

Aim for the smallest sensible image. Not the absolute smallest (that route lies madness); the smallest one you can build comfortably.

Picking a base¶

Start with the smallest base that works:

Base	Size	Best for
`scratch`	0 bytes	Static binaries (Go, Rust) - no OS at all
`gcr.io/distroless/static`	~2MB	Static binaries - has CA certs, tzdata, /etc/passwd
`alpine:3.20`	~5MB	Anything that works on musl (most things)
`debian:bookworm-slim`	~75MB	Things that need glibc but don't need many tools
`python:3.12-slim`	~150MB	Python apps (slim variant)
`ubuntu:24.04`	~80MB	When you need a familiar full distro

Rule of thumb: start with alpine or *-slim. Reach for full distros only when a wheel/binary doesn't work on the smaller one.

Multi-stage builds¶

The biggest slimming win. Use one stage to build, another to package the result. Build tools, source code, test artifacts don't ship.

A real example - Go:

# Stage 1: build
FROM golang:1.23 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app/myapp ./cmd/myapp

# Stage 2: ship just the binary
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/myapp"]

Build:

docker build -t myapp:1.0 .
docker images myapp:1.0           # ~10MB instead of 1GB+

The FROM ... AS name creates a named stage. The COPY --from=builder copies from the previous stage. Only the final stage ships.

Same idea for any compiled language. For Rust:

FROM rust:1.80 AS builder
WORKDIR /src
COPY . .
RUN cargo build --release

FROM gcr.io/distroless/cc-debian12
COPY --from=builder /src/target/release/myapp /myapp
CMD ["/myapp"]

For Node.js (interpreted, but you can still avoid shipping dev-deps):

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev
USER node
CMD ["node", "dist/index.js"]

The second stage installs only production dependencies. Builds drop from "MB of dev deps + source + dist" to "just dist + runtime deps."

Distroless: nearly-empty base images¶

Google's "distroless" images (gcr.io/distroless/*) contain: - The language runtime (for python, java, etc.) - OR nothing (static). - CA certificates, tzdata, /etc/passwd, a few essentials. - No shell, no package manager, no debug tools.

Pros: tiny, minimal attack surface, no shell-injection footholds. Cons: harder to debug (no docker exec ... sh). For that, distroless ships a :debug variant for occasional use.

For static-binary languages (Go, Rust) shipping a CLI: distroless/static. For Java: distroless/java. For Python: distroless/python3. (Each has variants.)

.dockerignore¶

Already covered in page 05. Critical: anything not in .dockerignore is sent to the daemon as build context. .git, node_modules, target/, build caches all bloat builds.

A reasonable .dockerignore for a polyglot project:

.git
.gitignore
.dockerignore
Dockerfile*
.idea
.vscode
*.md
node_modules
target
__pycache__
*.pyc
.env
.env.*
dist
build
coverage
.cache

Combine `RUN` instructions¶

Each RUN creates a layer. If you RUN apt-get install foo then RUN apt-get remove foo, the second layer doesn't actually reclaim the disk - the first layer still has the package files.

Combine into one RUN:

# Bad - bloats the image
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# Good - one layer, ends clean
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

Three patterns above: - --no-install-recommends skips optional dependencies. - rm -rf /var/lib/apt/lists/* removes the apt cache. - Everything in one RUN so the cleanup is in the same layer.

Specific minor wins¶

Don't store secrets in the image. Pass them at runtime (env vars, mounts, secret managers). COPY them into a layer and they're there forever, even if you delete them in a later layer.
Set WORKDIR once at the top instead of cd in RUNs. Cleaner.
Pin versions in apt-get install foo=1.2.3. Reproducible builds.
Use --mount=type=cache (BuildKit) for things like apt/pip/go mod caches that should persist across builds without being in the image.

A typical "before/after"¶

A naive Python Dockerfile, ~1GB:

FROM python:3.12
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

Slimmed version, ~120MB:

FROM python:3.12-slim
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

RUN useradd --create-home --shell /bin/bash app && chown -R app /app
USER app

CMD ["python", "app.py"]

Changes: - python:3.12 → python:3.12-slim (Debian slim base). - requirements.txt separately (cache reuse on code changes). - --no-cache-dir (no pip cache in image). - Non-root user.

Multi-stage if you have compiled wheels takes it to ~80MB.

Exercise¶

Build a Go hello-world with multi-stage:

// hello.go
package main
func main() { println("hello from container") }

// go.mod
module hello

go 1.23

# Dockerfile
FROM golang:1.23 AS builder
WORKDIR /src
COPY go.mod .
COPY hello.go .
RUN CGO_ENABLED=0 go build -o /app/hello .

FROM gcr.io/distroless/static
COPY --from=builder /app/hello /hello
ENTRYPOINT ["/hello"]

Build, run, check size:

docker build -t hello:1.0 .
docker run --rm hello:1.0
docker images hello:1.0

Should be ~5MB. Compare to a single-stage build using golang:1.23 directly - ~1GB.

Find what's bloating an image with docker history:

docker history python:3.12 --human --format "{{.Size}}\t{{.CreatedBy}}" | head
docker history python:3.12-slim --human --format "{{.Size}}\t{{.CreatedBy}}" | head

Note where the size differences come from.

.dockerignore test: create a folder with a .git directory full of stuff. Build a trivial Dockerfile that just does COPY . /app. Note the "Sending build context to Docker daemon" line - large. Add .dockerignore with .git. Rebuild; context is much smaller.

What you might wonder¶

"Why does Alpine cause weird pip install issues?" Alpine uses musl libc (most Linux uses glibc). Many Python wheels are pre-compiled against glibc - they don't have musl variants, so pip falls back to compiling from source (slow, often fails). For Python on Alpine, expect occasional headaches; *-slim (Debian-based) is friendlier.

"What's BuildKit?" The modern Docker build engine, default in recent Docker. Faster, supports advanced features (cache mounts, secret mounts, multi-platform builds). Enable with DOCKER_BUILDKIT=1 (or it's already on).

"Should I shoot for the smallest possible image?" No. Shoot for "small enough to feel light, easy enough to maintain." A 50MB image is often a better trade-off than a 5MB one if the 5MB takes hours of debugging to keep working.

Done¶

Pick base images by size and ecosystem fit.
Use multi-stage builds.
Use distroless for static-binary-only ships.
Use .dockerignore.
Combine RUNs to minimize layers.

Next: Security basics →