Containers From Scratch (Beginner)¶
Beginner path: heard-of-Docker → writing Dockerfiles, debugging containers, contributing to containerized OSS.
Printing this page
Use your browser's Print → Save as PDF. The print stylesheet hides navigation, comments, and other site chrome; pages break cleanly at section boundaries; advanced content stays included regardless of beginner-mode state.
Containers From Scratch - Beginner to OSS Contributor¶
From "I have heard of Docker" to "I can write a Dockerfile, debug a running container, read someone else's compose stack, and submit a fix to a containerized OSS project."
Who this is for¶
- You're comfortable enough in a terminal to follow instructions.
- You've never used Docker, OR you've copy-pasted
docker runanddocker buildwithout really understanding what they do.
Soft prerequisite¶
If terminals still feel alien, do Linux From Scratch first - at least pages 01-08. This path assumes you can cd, run commands, read errors, and edit a file.
What you'll need¶
- Docker Desktop (macOS / Windows / Linux), OR Podman, OR Docker Engine on Linux.
- A text editor.
- A terminal.
- About 5 hours/week. Path is sized for 3-4 months.
Why containers¶
- You'll ship software in containers. Every modern cloud, every modern infrastructure team, every modern deployment pipeline.
- They abstract over OS differences. "Works on my machine" mostly stops being a problem.
- OSS adoption is massive. Almost every popular OSS project today ships a Docker image. Knowing the container side lets you contribute to many of them.
How this path works¶
Each page does one thing: explains it, shows it, walks through it, gives an exercise, ends with a Q&A.
The pages¶
| # | Title | What you'll know after |
|---|---|---|
| 00 | Introduction | What we're doing and why |
| 01 | Setup | Docker installed; first container run |
| 02 | Running containers | docker run, common flags |
| 03 | Images and tags | What an image is; pulling from registries |
| 04 | Container lifecycle | ps, stop, rm, exec, logs |
| 05 | Building images with Dockerfile | FROM, COPY, RUN, CMD |
| 06 | Volumes and bind mounts | Persistent data |
| 07 | Networks and ports | Container-to-container, container-to-world |
| 08 | Docker Compose | Multi-container apps in one file |
| 09 | Slimming images | Multi-stage, distroless, .dockerignore |
| 10 | Security basics | Non-root, read-only, capabilities |
| 11 | Image registries | Docker Hub, GHCR, push and pull |
| 12 | Reading other people's Dockerfiles | The bridge |
| 13 | Picking a project | What "manageable" looks like |
| 14 | Anatomy of a containerized OSS project | Case study |
| 15 | Your first contribution | Workflow + PR |
Start with Introduction.
00 - Introduction¶
What this session is¶
A 10-minute read. No code. Sets expectations.
What you're going to be able to do, eventually¶
By the end:
- Run any container from a Docker image.
- Write a Dockerfile that packages your own application.
- Use volumes to keep data; networks to let containers talk.
- Compose multi-container apps with
docker compose. - Push and pull images to/from a registry.
- Read a real-world Dockerfile and know what it does and why.
- Clone a containerized open-source project, find and fix a small issue with its
Dockerfileorcompose.yaml, and submit a pull request.
That last bullet is the goal.
The deal¶
- It's slow on purpose. One concept per page.
- It assumes nothing about containers. It assumes basic terminal comfort.
- You will run real containers. Most pages have hands-on commands.
- You will see surprising behavior. Containers behave like tiny isolated machines, and the first 2 weeks of using them is "wait, why didn't that work?" That's normal.
What containers actually are (briefly)¶
A container is a running process (or a few) wrapped in: - Its own view of the filesystem. - Its own network namespace (own IP, own ports). - Its own process tree (it can't see the host's other processes). - Limited CPU and memory if you configure it.
The container shares the host's kernel - unlike a VM, no separate kernel boots up. Cheap and fast: starts in milliseconds, costs almost nothing when idle.
An image is a recipe for creating a container - a frozen filesystem snapshot plus some metadata ("when you start me, run this command").
You'll get the full picture as we go. Don't try to absorb it all from one paragraph.
What you need¶
- Docker Desktop on macOS / Windows. Free for personal use. Includes the Docker engine, the CLI, and a UI.
- On Linux: Docker Engine (
sudo apt install docker.ioor similar) or Podman (drop-in replacement, doesn't need a root daemon - often nicer). - A text editor.
- A terminal.
- ~5 hours/week. Path is sized for 3-4 months.
What you do NOT need¶
- Kubernetes. (Different path. Containers come first.)
- A cloud account. We work locally.
- A programming language. (Some examples use simple Node/Python/Go for variety, but you'll just be running pre-built images.)
How long this realistically takes¶
3 to 4 months at 5 hours/week to the "submit a PR" goal. Shorter than the language paths because there's no new syntax to learn - just commands, concepts, and YAML.
What success looks like¶
You'll be able to:
- Look at a Dockerfile and explain every instruction.
- Look at a compose.yaml and explain the service topology.
- Debug a container that won't start.
- Improve someone's Dockerfile to be smaller, faster, or more secure.
- Submit a PR.
You will not be able to: - Operate a production Kubernetes cluster. (Different path.) - Write your own container runtime. (Different path: "Container Internals" senior reference.)
A note on Docker vs Podman vs containerd¶
You'll see multiple "container runtimes" mentioned in the wild:
- Docker - original, most popular, has both a CLI and a daemon. Docker Desktop bundles everything.
- Podman - Red Hat's daemon-less alternative. CLI is nearly identical to Docker's;
alias docker=podmanoften works. No root daemon by default. - containerd - the low-level runtime under both Docker and Kubernetes. You usually don't talk to it directly.
This path uses Docker commands in examples (most common, most documented). If you're on Podman, the commands are the same - just substitute podman for docker.
One last thing¶
If a page feels too dense - stop, re-read. Still dense? Skip, come back.
Ready? Next: Setup →
01 - Setup¶
What this session is¶
About 30 minutes. Install Docker (or Podman), confirm it works, run your first container.
Step 1: Install¶
macOS / Windows: Docker Desktop Download from docker.com/products/docker-desktop. Run the installer. Open the Docker Desktop app once - it sets up the engine in the background. Free for personal / small-business use.
Linux: Docker Engine (Debian/Ubuntu example)
Log out and back in for the group change. Now you can rundocker without sudo.
Linux alternative: Podman
No root daemon. Optionallyalias docker=podman to make commands portable.
Step 2: Verify¶
Both should print something. docker info reports the engine version, OS, storage driver, etc.
If docker: command not found - install didn't complete. If Cannot connect to the Docker daemon - Docker Desktop isn't running (macOS/Windows), or the systemd service is off (Linux: sudo systemctl start docker).
Step 3: Run your first container¶
What happens:
1. Docker looks for an image named hello-world locally. Doesn't find it.
2. Pulls it from Docker Hub.
3. Creates a container from it.
4. Runs the container's default command (which prints a friendly message).
5. The container exits when the command finishes.
You should see:
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
...
Hello from Docker!
This message shows that your installation appears to be working correctly.
Congratulations - your first container.
Step 4: A more useful container¶
Let's run a real Linux distribution interactively:
You're now inside a fresh Ubuntu container. Try:
exit terminates the shell, which terminates the container. Because of --rm, the container is also automatically removed.
What's new:
-i(--interactive) - keep stdin open. Lets you type at the container.-t(--tty) - allocate a pseudo-terminal. Makes the shell behave like a real terminal.--rm- clean up the container after it exits. Without this, stopped containers accumulate.ubuntu- the image.bash- the command to run inside the container.
-it together is the standard "I want an interactive shell" combo.
Step 5: A container with a port¶
Let's run a web server:
What's new:
-d(--detach) - run in the background (returns to your terminal immediately).-p 8080:80- map host port 8080 to container port 80.--name webtest- give the container a name (so you can refer to it).
Open http://localhost:8080 in your browser. You should see the nginx welcome page. The container is serving HTTP on port 80; Docker forwarded it to your host's port 8080.
To stop it:
Since we used --rm, the container is also removed when it stops.
Common docker run flags (preview - full list in page 02)¶
| Flag | What it does |
|---|---|
-it |
Interactive shell |
--rm |
Remove container when it exits |
-d |
Detached / background |
-p HOST:CONTAINER |
Port mapping |
-v HOST:CONTAINER |
Volume / bind mount |
-e VAR=value |
Environment variable |
--name NAME |
Name the container |
--network NAME |
Connect to a specific network |
You'll meet each in detail.
Try changing things¶
- Run
docker run --rm alpine echo "hi from alpine". Notice Alpine is tiny (~5MB). - Run
docker run --rm python:3 python -c "print(1+1)". The container has Python; you used it once and threw it away. - Run
docker run -it --rm node:20 node- get a Node.js REPL. - Run
docker ps(lists running containers - probably empty if you used--rm). - Run
docker ps -a(lists all containers including stopped - also probably empty with--rm).
What just happened, conceptually¶
You can run any program from any (compatible) Linux distribution without installing it. Need Python 3? docker run python:3. Need a Postgres database? docker run postgres. Need to test on Debian instead of your usual Ubuntu? docker run -it debian bash.
The container is isolated - its filesystem and processes don't touch yours. Quit it and nothing leaks. The "throwaway environment" use case is huge.
What you might wonder¶
"Where did nginx and ubuntu come from?"
Docker Hub. The default registry. docker pull ubuntu is shorthand for docker pull docker.io/library/ubuntu:latest. We'll cover registries in page 11.
"What's a 'tag'?"
A version identifier on an image. ubuntu:24.04 and ubuntu:22.04 are different versions. ubuntu alone defaults to ubuntu:latest. Page 03.
"Did Docker install Ubuntu on my machine?" No. Your host OS is unchanged. The container has its own copy of Ubuntu's userspace, but uses your host's kernel. When the container exits, that filesystem can be removed.
"Is this safe? Could a container break my host?"
By default, containers are fairly isolated but not fully secure. Don't run untrusted images, don't --privileged, don't mount sensitive host paths. We'll cover security basics in page 10.
"Why docker run --rm?"
Without --rm, stopped containers stick around. Useful for debugging stopped containers but accumulates clutter. For one-shot commands, --rm is the right default.
Done¶
- Docker installed and working.
- Pulled and ran a "hello world" container.
- Ran a Linux distribution interactively (
-it). - Ran a web server with port mapping (
-p). - Recognized the common
docker runflags.
02 - Running Containers¶
What this session is¶
About 45 minutes. Deep dive on docker run - the flag that matters, what each does, how to think about a container's lifetime.
The mental model¶
A container is a wrapped process. docker run does three things:
- Creates a container from an image.
- Starts it (runs the configured command).
- Streams stdout/stderr to your terminal (or detaches if
-d).
When the main process exits, the container stops. (A container is alive while its main process is alive.)
Anatomy of docker run¶
The full shape:
- FLAGS modify how the container runs (interactive, ports, volumes, env, etc.).
- IMAGE is which image to use (e.g.
nginx,python:3.12). - COMMAND (optional) overrides the image's default command.
- ARGS (optional) are passed to that command.
Examples:
docker run nginx # run nginx with its default cmd
docker run -it ubuntu bash # override: run bash instead
docker run python:3 python -c "print(42)" # override + args
The flags that matter¶
-i / -t / -it - interactivity¶
-ikeeps stdin open. Required if you want to type into the container.-tallocates a pseudo-TTY. Makes the prompt look like a real terminal.-itis the combination - for shells.
For one-off commands that don't need input, omit -it:
--rm - auto-cleanup¶
Without --rm, the stopped container hangs around (so you can docker logs it, docker start it again, etc.). For one-shot runs, --rm is right.
-d - detached¶
Runs the container in the background. Returns to your shell immediately. Use for services (web servers, databases) you want to leave running.
-p HOST:CONTAINER - port mapping¶
A container has its own network namespace. Its localhost is not your localhost. To reach a container's port from outside, publish it:
You can publish multiple ports:
Or bind to a specific host interface:
If you don't -p a port, it's not reachable from your host. Containers on the same Docker network can still reach each other (page 07).
-e KEY=value - environment variables¶
Many images are configured via env vars. Read the image's documentation on Docker Hub.
For long lists, use an env-file:
--name NAME - give it a name¶
Without --name, Docker generates a random one (practical_einstein). With --name, you can refer to it by name in subsequent commands:
Names are unique per machine; only one container with a given name at a time.
-v HOST:CONTAINER - bind mounts and volumes¶
Mount a host directory or a named volume into the container. Full coverage in page 06; preview:
docker run -v $(pwd):/data alpine ls /data # mount current dir as /data inside
docker run -v mydata:/var/lib/postgresql/data postgres # named volume
--network NAME - network¶
Connects the container to a specific Docker network. Full coverage in page 07.
Resource limits¶
Limits memory and CPU. Containers without limits can starve the host. Always set limits for production-shaped runs.
--workdir DIR - initial working directory¶
Equivalent to cd /app inside the container before the command runs.
--user UID:GID - run as a specific user¶
Useful when working with mounted volumes (file ownership matches the host user).
Override the image's default command¶
Everything after the image name and before the optional -- is the command + args.
Use --entrypoint to override the image's entrypoint (a deeper override; we'll see entrypoint vs cmd in page 05):
Running interactively vs running detached¶
Two common shapes:
Interactive (foreground), one-shot:
Detached service:
For services you want to keep running, drop --rm (so the container survives a stop and can be restarted with docker start SVC).
A real example: a Redis instance¶
That starts Redis on port 6379. Test from your host:
Stop it:
Restart later:
Remove (when done):
A multi-line docker run (for readability)¶
Long commands are easier to read split across lines with \:
docker run -d \
--name api \
-p 8080:80 \
-e DATABASE_URL=postgres://db:5432/mydb \
-v $(pwd)/data:/data \
--memory=512m \
--restart=unless-stopped \
my-image:1.0
This is what real-world docker run invocations look like. By page 08 (Compose) you'll see how to write this as YAML and avoid retyping the flags.
Exercise¶
-
Hello with a name and port:
Openhttp://localhost:8081. Stop the container. -
Environment variable:
Should printhi from env. -
Volume preview:
You should see the contents of your current directory listed. -
Resource limits:
Note the memory limit reported inside. -
Use a name to re-attach:
What you might wonder¶
"Why both -i and -t?"
-i keeps stdin connected (you can type). -t makes Docker allocate a pseudo-terminal so the program thinks it's running in a real terminal (colors work, line editing works). For a shell you want both. For a pipe (e.g. echo "hi" | docker run -i ...) you only need -i.
"What's the difference between -p 8080:80 and -P (capital P)?"
-P publishes all the image's EXPOSEd ports to random host ports. Rarely used in practice. Stick with explicit -p.
"What's --restart?"
A policy for what to do when the container exits or the daemon restarts. --restart=always, --restart=unless-stopped, --restart=on-failure. Useful for services you want to survive reboots.
"Can a container have multiple processes?" Yes, but the convention is "one main process per container." If you need multiple, use multiple containers (page 08) or a process supervisor inside.
Done¶
- Run interactive shells in containers.
- Run detached services with port mappings.
- Pass environment variables.
- Name containers for easy reference.
- Set resource limits.
- Read a real-world
docker runinvocation.
03 - Images and Tags¶
What this session is¶
About 30 minutes. You'll learn what an image actually is, how tags work, how to find and inspect images, and the Docker Hub model.
What an image is¶
An image is a stack of read-only layers, plus some metadata (entrypoint, default command, exposed ports, environment variables).
When you create a container, Docker adds a thin read-write layer on top. Changes the container makes are in that layer; the underlying layers stay shared with other containers.
Two consequences: 1. Containers start fast (no copying - just stack a new writable layer). 2. Containers using the same image share disk space.
Tags: image versions¶
An image reference has the form:
REGISTRY- where the image lives (defaults todocker.io).NAMESPACE- the user/org publishing it (defaults tolibraryfor official images).IMAGE- the image name.TAG- a label, typically a version (defaults tolatest).
Examples:
| Short form | Full form |
|---|---|
nginx |
docker.io/library/nginx:latest |
nginx:1.27 |
docker.io/library/nginx:1.27 |
myorg/myapp:v1.2.0 |
docker.io/myorg/myapp:v1.2.0 |
ghcr.io/foo/bar:main |
(literal - GHCR registry) |
The trap
:latest is a label, not a guarantee. It points to whichever build the maintainer last tagged as latest - which can change. Pin to a specific version tag in production: nginx:1.27, not nginx:latest. For local experimentation, latest is fine.
Pull and list¶
docker images shows: repository, tag, image ID, size, age.
Inspect¶
Outputs a long JSON with: layers, env vars, exposed ports, entrypoint, default command, the build history. Useful when figuring out why an image behaves a certain way.
docker history nginx:1.27 is a friendlier view of just the layers:
Each line is a build step (a layer). Sizes tell you what dominates the image. A 1GB image is mostly something; docker history shows what.
Search Docker Hub¶
Returns matching repositories with star counts. For more, browse hub.docker.com - better filtering and READMEs.
Reading a Docker Hub page for an image tells you:
- Supported tags (versions).
- Configuration env vars.
- Usage examples.
- Source repo (often on GitHub) - the Dockerfile is public.
Official vs unofficial¶
library/nginx is an official image - curated, maintained by the upstream project or by Docker. They live under the library namespace (often hidden - nginx alone is shorthand for library/nginx).
Third-party images live under user/org namespaces: bitnami/postgresql, linuxserver/jellyfin, etc. Anyone can publish to Docker Hub; verify maintainers before running untrusted code.
Signals of trust:
- "Official Image" badge or Verified Publisher badge on Docker Hub.
- Maintained by the project itself (e.g. nginx, postgres, python).
- Pulls in the millions.
- Active CI, recent updates, signed images.
Image size matters¶
Smaller images = faster pulls, faster deploys, smaller attack surface. Compare:
docker pull ubuntu # ~80MB
docker pull debian # ~120MB
docker pull alpine # ~5MB
docker pull busybox # ~5MB
docker pull gcr.io/distroless/static # ~2MB
For your own images (page 05+): start from a small base unless you genuinely need a full distro.
Remove unused images¶
Local images pile up. Clean up:
docker image rm IMAGE # remove one
docker image prune # remove dangling (no tag)
docker image prune -a # remove ALL not used by any container
docker system prune # broader cleanup (containers, networks, etc.)
docker system df shows how much space Docker is using.
A worked example: which Python image to pick¶
Suppose you want a Python container. Docker Hub python page lists tags:
python:3.12- full Debian-based, ~1GB. Most flexible; has gcc, locales, etc.python:3.12-slim- Debian-based, ~150MB. Stripped down.python:3.12-alpine- Alpine-based, ~50MB. Smallest, but glibc-incompatible (some Python wheels won't install).
Rule of thumb: start with python:3.12-slim. If a pip install fails on a wheel, fall back to python:3.12. Try alpine last (often more pain than savings).
Multi-architecture images¶
Modern images are usually built for multiple architectures (linux/amd64, linux/arm64). Docker pulls the one matching your host. The same nginx:1.27 works on an Intel Mac, an Apple Silicon Mac, an x86 server, a Raspberry Pi.
You can force one:
Useful on Apple Silicon when an image hasn't been built for ARM.
Exercise¶
-
Pull two versions of nginx:
Note they share lots of disk space - common layers are shared. -
Inspect:
-
Compare sizes:
-
Pin discipline: find one place where you saw
nginx:latestin this path's earlier examples. Mentally substitutenginx:1.27. (Or any specific tag.) That's what you should write in production. -
Cleanup:
What you might wonder¶
"Why are images so big?" A base distro is hundreds of MB. Adding a language runtime adds more. Application code is usually small; system bloat dominates. Page 09 covers slimming.
"What's a 'digest' vs a 'tag'?"
A digest is the cryptographic hash of the exact image content (sha256:abc...). Immutable. A tag is a movable label. For maximum reproducibility, pin by digest: nginx@sha256:abc.... Verbose but unambiguous.
"Where is the data stored?"
On Linux: /var/lib/docker/. On macOS/Windows: inside a VM that Docker Desktop manages. docker system df shows usage; docker volume ls shows your named volumes.
Done¶
- Understand images as stacked layers.
- Read image references (
repo:tag). - Pull, list, inspect images.
- Pick a base image by size.
- Recognize official vs third-party.
- Clean up unused images.
04 - Container Lifecycle¶
What this session is¶
About 30 minutes. The container's life: create, start, pause, stop, restart, exec into, view logs, remove. Plus how to debug a container that won't start.
Container states¶
A container is in one of these states at any given time:
- created - Docker has set it up but not started it.
- running - main process is alive.
- paused - all processes frozen (uncommon).
- restarting - Docker is restarting it.
- exited - main process finished (with some exit code).
- dead - broken, can't recover. Remove and recreate.
docker ps shows running. docker ps -a shows all states.
List¶
docker ps # only running
docker ps -a # all states
docker ps -q # only IDs (useful in scripts)
docker ps --filter "status=exited" # filter
docker ps --format "table {{.ID}}\t{{.Image}}\t{{.Status}}" # custom columns
The default columns: ID, image, command, created, status, ports, name.
Start, stop, restart¶
When you docker run, it creates + starts. You can also do them separately:
docker create --name foo nginx # created but not running
docker start foo # starts (returns immediately)
docker start -a foo # starts AND attaches stdout/stderr
For an already-running container:
docker stop foo # SIGTERM, then SIGKILL after 10s
docker stop -t 30 foo # custom grace period (30s)
docker kill foo # immediate SIGKILL
docker restart foo # stop + start
docker stop is the polite option: it sends SIGTERM, gives the process up to 10 seconds to clean up, then SIGKILLs. docker kill is immediate. Use stop unless the process is wedged.
Logs¶
docker logs foo # all stdout/stderr from the main process
docker logs -f foo # follow (like tail -f)
docker logs --tail 100 foo # last 100 lines
docker logs --since 10m foo # last 10 minutes
docker logs -f --tail 20 foo # follow, starting from last 20
-f is the most-used. Open a second terminal, run docker logs -f myservice, watch as you exercise the service.
Exec: get a shell inside a running container¶
docker exec -it foo sh # shell into the container
docker exec foo cat /etc/hostname # one-off command
docker exec -e KEY=val foo env # one-off with extra env
exec is one of the most useful debugging tools. Container's web server returning 500s? docker exec -it web bash and look around. Database container won't accept connections? docker exec -it db psql and check from inside.
Many minimal images don't have bash. Use sh instead.
Stats¶
docker stats # live CPU/memory/network for all running
docker stats --no-stream # one-shot snapshot
docker stats foo # one container
Useful for quick "is this container hot?" checks.
Inspect a container¶
Long JSON: configuration, mounts, network settings, environment, restart policy. Useful when "why is this container behaving like X?"
Specific fields with --format:
docker inspect --format='{{.State.Status}}' foo
docker inspect --format='{{.NetworkSettings.IPAddress}}' foo
docker inspect --format='{{range .Mounts}}{{.Source}} -> {{.Destination}}{{"\n"}}{{end}}' foo
Cleanup¶
docker rm foo # remove a stopped container
docker rm -f foo # force (stops + removes a running one)
docker container prune # remove all stopped containers
Containers using --rm clean themselves up. Without --rm, they accumulate. Periodic docker container prune is fine.
Debugging "container won't start"¶
A container that exits immediately is the most common debug case. Workflow:
-
Check logs:
The last few lines of stdout/stderr almost always tell you why. Missing env var? Bad config file? Permission denied? -
Check exit code:
Exited (0)- finished normally.Exited (1)- error.Exited (139)- segfault.Exited (137)- killed (often OOM). -
Try running interactively - bypass the image's default command:
Now you have a shell in a fresh container of the same image. Inspect: are the files there? Is the script executable? Does the binary even run? -
Recreate the exact env, then exec into it:
-
Read the Dockerfile (page 05) - find the original repo on Docker Hub, look at the
ENTRYPOINT/CMD. Sometimes the image expects environment variables you didn't set.
A real session¶
Imagine the issue: "my Postgres container exits immediately."
docker run -d --name pg postgres:16
docker ps # not there!
docker ps -a # shows Exited (1)
docker logs pg
Output ends with:
Fix:
docker rm pg
docker run -d --name pg -e POSTGRES_PASSWORD=secret postgres:16
docker ps # running now
docker logs pg # confirm clean startup
This loop (run → check logs → fix → run again) is most of container debugging.
Restart policies¶
For containers you want to survive crashes or host reboots:
Policies:
- no (default) - never restart.
- on-failure - restart if exit code != 0.
- on-failure:3 - at most 3 times.
- always - always restart (even if you docker stop it).
- unless-stopped - always restart, EXCEPT if you manually stopped it.
unless-stopped is the right policy for most services.
Exercise¶
-
Run, inspect, exec, log:
-
Watch stats while running a CPU-bound container:
-
Debug a "container won't start":
-
Auto-restart:
What you might wonder¶
"My container exits with code 137 - what is that?"
SIGKILL (9) + 128 = 137. Either out of memory (Docker OOM-killed it) or someone docker killed it. Check docker inspect --format='{{.State.OOMKilled}}' foo.
"My container exits with 139?" SIGSEGV (11) + 128 = 139. Segfault - the program crashed. Look at logs; check if you're missing a library or running the wrong architecture image.
"Can I attach my terminal to a running container instead of exec?"
docker attach foo connects your stdin/stdout to the main process. Different from exec (which starts a new process inside). Less useful in practice; people use exec more.
Done¶
- Understand container states.
- List, start, stop, restart, kill containers.
- Read logs (and follow with
-f). - Exec into running containers.
- Inspect for configuration details.
- Debug containers that won't start.
- Set restart policies.
Next: Building images with Dockerfile →
05 - Building Images with Dockerfile¶
What this session is¶
About an hour. You'll learn to build your own images using a Dockerfile - the recipe text file that tells Docker how to construct an image step by step.
A first Dockerfile¶
Create a folder myimage/. Inside, create a file named exactly Dockerfile (no extension):
Build it:
The . at the end means "use the current directory as the build context."
Run:
docker run --rm myimage:1.0
# hello from my image
docker run --rm myimage:1.0 curl --version
# (prints curl's version because we override the default CMD)
You just built a custom image.
The instructions you'll use most¶
| Instruction | What it does |
|---|---|
FROM image:tag |
Base image to start from. Always the first line. |
RUN command |
Run a shell command at build time (e.g. install packages). |
COPY src dest |
Copy files from the build context into the image. |
ADD src dest |
Like COPY but also fetches URLs and unpacks tarballs. Prefer COPY. |
WORKDIR path |
cd to this dir; affects subsequent RUN/CMD/COPY. |
ENV KEY=value |
Set an environment variable. |
EXPOSE port |
Documentation only - declares the container listens on this port. Does NOT publish it. |
CMD ["a", "b"] |
Default command when the container starts. Overridable at docker run. |
ENTRYPOINT ["a", "b"] |
Command that always runs. Args from CMD or docker run are appended. |
USER name-or-uid |
Switch to this user for subsequent layers and runtime. |
ARG name=default |
Build-time variable. Use with --build-arg. |
A realistic Dockerfile (Python app)¶
Suppose you have a small Python script app.py:
import http.server, socketserver, os
port = int(os.environ.get("PORT", "8000"))
with socketserver.TCPServer(("", port), http.server.SimpleHTTPRequestHandler) as httpd:
print(f"serving on {port}")
httpd.serve_forever()
And a requirements.txt (empty for this example, but typically lists pip packages).
Your Dockerfile:
FROM python:3.12-slim
WORKDIR /app
# Install dependencies first (separate from app code for cache reuse)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the application
COPY app.py .
EXPOSE 8000
ENV PORT=8000
CMD ["python", "app.py"]
Build and run:
docker build -t pyapp:1.0 .
docker run -d --rm --name pyapp -p 8000:8000 pyapp:1.0
curl http://localhost:8000/
Why the line order matters: layer caching¶
Each Dockerfile instruction creates a layer. Docker caches layers and reuses them on rebuilds if the instruction (and its inputs) haven't changed.
Order things so the most-frequently-changing things come last:
# Bad: every code change invalidates the pip install layer
FROM python:3.12-slim
WORKDIR /app
COPY . . # any file change invalidates this
RUN pip install -r requirements.txt
CMD ["python", "app.py"]
# Good: dependencies cached separately
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt . # changes rarely
RUN pip install -r requirements.txt # cached unless requirements.txt changed
COPY . . # changes often
CMD ["python", "app.py"]
The second form rebuilds in seconds when only your code changed, vs minutes when pip re-runs.
ENTRYPOINT vs CMD¶
Confusing topic. Quick answers:
CMDis the default command. Easily overridden atdocker run image arg1 arg2.ENTRYPOINTis what always runs. CMD (anddocker runargs) are passed as arguments to ENTRYPOINT.
Common patterns:
Pattern 1 - CMD only (most common):
docker run image runs python app.py. docker run image bash runs bash (overrides CMD).
Pattern 2 - ENTRYPOINT + CMD (for wrapper apps):
docker run image runs python app.py --default-arg. docker run image --other-arg runs python app.py --other-arg (CMD overridden, ENTRYPOINT kept).
For your own images: start with just CMD. Reach for ENTRYPOINT only when you have a clear use case.
Use a non-root user¶
By default, containers run as root inside the container. Even though the container is isolated, running as root means if there's a container-escape bug, the attacker is root on the host (assuming user namespaces aren't configured).
Add a user:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
RUN useradd --create-home --shell /bin/bash app && chown -R app:app /app
USER app
CMD ["python", "app.py"]
Now the container's main process runs as app, not root. Required by many production environments. Page 10 covers more security basics.
.dockerignore: keep junk out of the build context¶
The . in docker build . sends everything in the current directory to the Docker daemon. If your folder has .git, node_modules, target/, build artifacts, those bloat the build context.
Create .dockerignore (same folder as the Dockerfile):
Same syntax as .gitignore. Speeds builds; reduces image size; prevents accidental secrets in images.
Build arguments and labels¶
FROM alpine:3.20
ARG VERSION=unknown
LABEL org.opencontainers.image.version=$VERSION
LABEL org.opencontainers.image.source="https://github.com/example/repo"
RUN echo "building version $VERSION"
Build with:
ARG is build-time only (gone at runtime). LABEL stays on the image, queryable with docker inspect. The org.opencontainers.image.* labels are a convention - many tools (Docker Hub, GitHub Container Registry) read them.
Tag the build¶
Use semantic versions for releases; :latest for the latest. (Don't depend on :latest in production - pin specific versions.)
Inspect what you built¶
docker history shows you each layer's size. Useful for figuring out where the bloat is.
Exercise¶
-
Build the Python example above (
pyapp:1.0). Run, curl localhost:8000, see the directory listing it serves. -
Make a small change to
app.py(change the printed message) and rebuild. Notice the layers BEFORE the COPY were cached. -
Add a non-root user to the Dockerfile, rebuild, and confirm
whoamiinside the container reportsappnotroot: -
Create a
.dockerignorethat excludes__pycache__and.git. Rebuild; note any difference in build context size (Docker reports it at the start of a build). -
Build with a version arg:
What you might wonder¶
"Why RUN pip install --no-cache-dir?"
pip caches downloads in ~/.cache/pip. That cache is useless inside an image (you've already installed); only bloats the layer. --no-cache-dir skips it.
"Why COPY . . and not ADD . .?"
COPY does exactly what it says - copy files. ADD also extracts tarballs and fetches URLs, which is more magic than you usually want. Prefer COPY; use ADD only for those specific features.
"What's a build context?"
The directory you pass to docker build (the . at the end). Everything in it is sent to the Docker daemon - that's what the COPY commands draw from. The Dockerfile itself isn't special; it's just one file in the context.
"Can I have multiple Dockerfiles?"
Yes - docker build -f Dockerfile.dev -t foo:dev . uses a non-default-named one. Useful for "Dockerfile" + "Dockerfile.prod" + "Dockerfile.test" variants.
Done¶
- Write a Dockerfile from scratch.
- Use FROM, RUN, COPY, WORKDIR, ENV, EXPOSE, CMD, USER.
- Order instructions for cache friendliness.
- Distinguish CMD from ENTRYPOINT.
- Use
.dockerignoreto keep junk out. - Build with
--build-arg.
Next: Volumes and bind mounts →
06 - Volumes and Bind Mounts¶
What this session is¶
About 45 minutes. You'll learn how containers persist data - through bind mounts (mount a host path into the container) and named volumes (Docker-managed storage).
The problem¶
By default, a container's filesystem disappears when the container is removed. For databases, uploaded files, user data - anything you need to keep - that's a problem.
There are three ways to address it:
| Mechanism | What it is | Use case |
|---|---|---|
| Named volume | Docker-managed storage living in /var/lib/docker/volumes/ |
Production data (databases, persistent app state). |
| Bind mount | Mount a host directory into the container | Development (live-reload source code). |
| tmpfs | In-memory filesystem | Secrets, temporary fast scratch. |
Of these, the first two are the default tools.
Bind mount: mount a host directory¶
docker run --rm -v $(pwd):/app -w /app python:3.12-slim python -c "import os; print(os.listdir('.'))"
What's new:
- -v HOST_PATH:CONTAINER_PATH - mount the host directory into the container.
- -w /app - set the container's working directory.
- The container sees your current directory as /app.
If the container modifies files at /app, they appear on your host. If the host modifies them, the container sees the new state. Two-way mirror.
The modern syntax (more readable):
Use whichever you find clearer; -v is shorter and very common.
A real use case: live development¶
You have a Python app you're actively editing. Mount the source so the container always sees the latest version:
docker run -d --rm --name dev -p 8000:8000 \
-v $(pwd):/app -w /app \
python:3.12-slim python app.py
Edit app.py on your host. The container sees the new version immediately. Restart the container (or use a hot-reloader like uvicorn/flask --reload) to pick up changes.
This is the standard "containerized dev environment" pattern. No installing Python locally; no version conflicts.
Read-only mounts¶
The :ro suffix makes the mount read-only. The container can't modify /app. Useful for "give the container access but don't let it mess with my files."
Named volumes¶
A named volume is Docker-managed storage. You name it; Docker stores it.
Use it:
docker run -d --rm --name pg \
-e POSTGRES_PASSWORD=secret \
-v mydata:/var/lib/postgresql/data \
postgres:16
Postgres writes to /var/lib/postgresql/data; that's actually the volume mydata. Stop and remove the container; the data is still in mydata. Recreate the container with the same volume - your database is intact.
Inspect:
You can see where Docker stored it on the host. On macOS/Windows it's inside Docker's VM.
Cleanup:
Use volumes for things you want to keep across container restarts. Bind mounts for development.
Bind vs volume: which when?¶
| Use case | Pick |
|---|---|
| Live-editing source code during development | Bind mount |
| Database data (Postgres, Redis, etc.) | Named volume |
| Sharing files between containers | Named volume |
| Loading a single config file at runtime | Bind mount (read-only) |
| App state that must survive container removal | Named volume |
| Backing up data | Easier with named volume |
tmpfs: in-memory¶
For sensitive data that should never hit disk (one-time secrets, temp files you don't want persisted):
/tmp is now an in-memory filesystem. Survives nothing - vanishes when the container exits.
Mounting a single file¶
Mounts just one file (not a directory). Common for injecting config.
Backups¶
Backup a named volume by running a temporary container that tars it up:
docker run --rm \
-v mydata:/source:ro \
-v $(pwd):/backup \
alpine tar -czf /backup/mydata-$(date +%Y%m%d).tar.gz -C /source .
Reads the volume mounted read-only at /source; writes a tarball to your current directory via the bind mount at /backup.
Restore:
docker run --rm \
-v mydata:/target \
-v $(pwd):/backup \
alpine tar -xzf /backup/mydata-20260517.tar.gz -C /target
(Backup strategy: this is the manual-and-simple form. Production uses dedicated backup tools.)
Common gotchas¶
-
Permissions / ownership. If your host user is UID 1000 and the container's app user is also UID 1000, file ownership lines up. If they differ, you get "permission denied" inside the container or strange ownership on host files. Match UIDs with
--user, orchownin the Dockerfile. -
Hidden files in the container. If you bind-mount a host directory into a container path that already has files, the container's files become hidden (the mount overlays). On the host, you see your files. In the container, you see your files (not the originals). Removing the mount restores the originals.
-
$(pwd)only works in POSIX shells. On Windows PowerShell, use${PWD}. In CMD, use%cd%.
Exercise¶
-
Bind-mount development: create
Editapp.pywith a small script. Run:app.pyon host; re-run. The container sees the change. -
Named volume for Postgres:
docker run -d --name pg \ -e POSTGRES_PASSWORD=secret \ -v pgdata:/var/lib/postgresql/data \ postgres:16 docker exec -it pg psql -U postgres -c "CREATE TABLE notes (text TEXT);" docker exec -it pg psql -U postgres -c "INSERT INTO notes VALUES ('hello');" docker stop pg && docker rm pg # Recreate with same volume: docker run -d --name pg -e POSTGRES_PASSWORD=secret -v pgdata:/var/lib/postgresql/data postgres:16 docker exec -it pg psql -U postgres -c "SELECT * FROM notes;" # 'hello' is still there. docker stop pg && docker rm pg docker volume rm pgdata -
Read-only config:
What you might wonder¶
"What's the actual difference between -v and --mount?"
-v is older, terser, less explicit. --mount is newer, key=value syntax, more explicit. Both work. Read both forms in real code.
"Where does my named volume live physically?"
On Linux: /var/lib/docker/volumes/<name>/_data/. On macOS/Windows: inside Docker Desktop's VM (you don't see them directly).
"Can two containers share a volume?" Yes - just mount the same named volume in both. Useful for "writer container produces data, reader container consumes."
"Bind mount or named volume on macOS / Windows?" Bind mounts on macOS/Windows are slower than on Linux (the file system has to translate across the VM boundary). Performance-sensitive workloads (Postgres, Rails dev, etc.) should prefer named volumes when possible.
Done¶
- Mount host directories into containers (bind mounts).
- Use named volumes for persistent data.
- Use tmpfs for in-memory storage.
- Mount single files for config injection.
- Back up and restore named volumes.
- Pick the right mechanism for each use case.
07 - Networks and Ports¶
What this session is¶
About 45 minutes. How containers talk to your host, to each other, and to the internet. The four network types Docker creates, why containers can find each other by name, and how to debug "why can't service A reach service B?"
The mental model¶
Each container gets its own network namespace - its own IP address, its own ports, its own loopback. Two consequences:
- The container's
localhostis not your host'slocalhost. They're separate worlds. - To reach a container's port from outside, you have to publish it with
-p(host port forwards to container port).
Networks tie containers together. Containers on the same Docker network can see each other; containers on different networks can't.
The default networks¶
You'll see at least:
| Network | Driver | What it's for |
|---|---|---|
bridge |
bridge | Default for containers without --network |
host |
host | Container shares the host's network (no isolation) |
none |
null | No networking at all |
The default bridge (bridge) is what containers join automatically. Containers on the default bridge get IPs but cannot find each other by name (a quirky default).
Create a user-defined network¶
Now run containers on it:
docker run -d --name db --network mynet -e POSTGRES_PASSWORD=secret postgres:16
docker run -d --name web --network mynet -p 8080:80 nginx
Both db and web are on mynet. They can reach each other by container name:
This is the right pattern for multi-container apps. Always create a user-defined network. Don't rely on the default bridge.
Port publishing: -p HOST:CONTAINER¶
A container on a Docker network is reachable from other containers on the same network. To reach it from your host (or from outside your machine), you must publish the port:
Now http://localhost:8080 on your host hits the nginx in the container.
Options:
docker run -p 8080:80 nginx # any interface on host port 8080
docker run -p 127.0.0.1:8080:80 nginx # only loopback on host
docker run -p 8443:443 -p 8080:80 nginx # publish multiple ports
docker run -P nginx # publish all EXPOSE'd ports to random host ports
docker port web shows the actual host port mapping.
Two containers talking¶
A canonical pattern: app and database.
docker network create mynet
docker run -d --name db --network mynet \
-e POSTGRES_PASSWORD=secret \
-v pgdata:/var/lib/postgresql/data \
postgres:16
docker run -d --name app --network mynet -p 8080:8080 \
-e DATABASE_URL=postgres://postgres:secret@db:5432/postgres \
my-app:1.0
Inside the app container, db resolves to the database container's IP. Port 5432 is reachable from app (containers on the same network can reach any port without explicit publishing). Outside the cluster, only port 8080 (on the host) is exposed.
host networking (Linux only)¶
nginx is on the host's network - port 80 binds directly to host port 80. No port mapping needed. No isolation; the container can see and use any host network interface.
Useful for performance-sensitive networking (lower overhead than the bridge). Not available on macOS/Windows because Docker runs in a VM there.
Inspect a network¶
Shows all containers attached, their IPs, the subnet, etc.
docker network inspect mynet --format='{{range .Containers}}{{.Name}} {{.IPv4Address}}{{"\n"}}{{end}}'
Useful for "what IP did Docker give my container?"
Disconnect, reconnect¶
Useful for testing failure scenarios (disconnect the database, see how the app behaves).
DNS inside containers¶
Docker runs an embedded DNS resolver. Inside a container on a user-defined network:
The 127.0.0.11 is Docker's embedded DNS. It resolves container names + external hostnames.
Test:
docker exec -it app nslookup db
# Should return the db container's IP
docker exec -it app nslookup google.com
# Should return Google's IP
If DNS resolution fails, networking is broken in some way. Common cause: container is on the default bridge (no name resolution). Recreate on a user-defined network.
Debugging "service A can't reach service B"¶
When connectivity isn't working:
-
Are they on the same network?
Both should list the same network. -
Can A's container resolve B's name?
If "no such host," they're not on the same network OR A is on the default bridge. -
Can A reach B's port?
Success: port is open. "Connection refused": B isn't listening yet (race) or B's app crashed. -
Is B's app actually listening?
Some images don't includenetstatorss; install or skip. -
Read B's logs:
Did it start cleanly? Is it bound to0.0.0.0(all interfaces) and not127.0.0.1(only its own loopback)?
The single most-common bug: an app inside a container binding to 127.0.0.1 instead of 0.0.0.0. Only the container's own loopback can reach it. The fix: configure the app to bind to all interfaces.
Exercise¶
-
Create a network and ping by name:
-
Two-container app:
-
Port publishing experiment:
-
Default bridge gotcha:
What you might wonder¶
"Can I make a container available outside my machine?"
Yes - -p 80:80 binds to all interfaces by default, including external ones. Anyone on your network can reach http://your-machine-ip:80. Firewall accordingly. To bind only to localhost: -p 127.0.0.1:80:80.
"What about IPv6?" Possible but disabled by default. Enable in Docker daemon config. Beyond beginner scope.
"Why is the default bridge so awkward?" Historical reasons. The default bridge was the original; user-defined networks were added later with better defaults. The original is kept for backwards compatibility but isn't recommended for new use.
"What about overlay networks?" For multi-host setups (containers on different machines, like in Docker Swarm or Kubernetes). Beyond beginner scope; recognize the name.
Done¶
- Understand container network isolation.
- Create user-defined networks for multi-container apps.
- Use container names for service discovery.
- Publish ports with
-p. - Debug network issues (DNS, port binding, connection refused).
08 - Docker Compose¶
What this session is¶
About 45 minutes. You'll learn Docker Compose - declarative YAML that describes a multi-container app (services, networks, volumes), so you stop typing 50-line docker run invocations.
The problem Compose solves¶
By page 07 you could run a multi-container app, but the commands were:
docker network create app-net
docker volume create pgdata
docker run -d --name db --network app-net -v pgdata:/var/lib/postgresql/data \
-e POSTGRES_PASSWORD=secret postgres:16
docker run -d --name web --network app-net -p 8080:8080 \
-e DATABASE_URL=postgres://postgres:secret@db:5432/postgres my-app:1.0
Hard to reproduce. Hard to share. Hard to remember.
Compose makes it:
# compose.yaml
services:
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
volumes:
- pgdata:/var/lib/postgresql/data
web:
image: my-app:1.0
ports:
- "8080:8080"
environment:
DATABASE_URL: postgres://postgres:secret@db:5432/postgres
depends_on:
- db
volumes:
pgdata:
Then:
That parses the YAML, creates the network, the volume, both containers, links them. One command.
Modern vs legacy¶
Two CLIs you'll encounter:
docker compose(Compose V2, no hyphen) - built into Docker Desktop and modern Docker Engine. The current standard.docker-compose(Compose V1, with hyphen) - legacy Python tool. Deprecated. Still works on older systems.
Use docker compose. The YAML format is the same; only the CLI invocation differs.
File naming¶
Compose looks for these files automatically in the current directory:
compose.yaml(preferred)compose.ymldocker-compose.yamldocker-compose.yml
For overrides: compose.override.yaml is merged on top of compose.yaml automatically. Useful for dev-vs-prod variants.
A real-world compose.yaml¶
services:
web:
build: . # build from a Dockerfile in this directory
ports:
- "8080:8080"
environment:
DATABASE_URL: postgres://postgres:secret@db:5432/postgres
LOG_LEVEL: ${LOG_LEVEL:-info} # from env, default "info"
volumes:
- ./src:/app/src:ro # bind mount for dev
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 3s
retries: 3
restart: unless-stopped
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
POSTGRES_DB: appdata
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
cache:
image: redis:7-alpine
ports:
- "127.0.0.1:6379:6379" # only localhost; not external
volumes:
pgdata:
What's new:
services:- each entry is a container.build: .- build the image from a Dockerfile in this directory (instead ofimage:).ports,environment,volumes- same concepts asdocker run.depends_onwithcondition: service_healthy- wait fordbto be healthy before startingweb.healthcheck- Docker periodically runs this command; the container is marked healthy or unhealthy.restart: unless-stopped- same asdocker run --restart=unless-stopped.${LOG_LEVEL:-info}- readLOG_LEVELfrom your shell env; default toinfo.
Common commands¶
docker compose up # foreground, follow logs (Ctrl-C to stop)
docker compose up -d # detached
docker compose down # stop and remove containers
docker compose down -v # also remove named volumes (lose data!)
docker compose ps # list services
docker compose logs # all logs
docker compose logs -f web # follow web's logs
docker compose restart web # restart one service
docker compose build # build (or rebuild) images
docker compose build --no-cache web # rebuild without cache
docker compose pull # pull latest images
docker compose exec web bash # shell into the web container
docker compose run --rm web bash # run a new throwaway container
up -d and down are the two you'll use most.
Networking in Compose¶
Compose automatically creates a network per project (named after the project directory). All services on that network. They reach each other by service name (the key in services:).
So in the example above, web reaches the database at db:5432. No docker network create needed.
You can also define explicit networks if you want isolation:
services:
web:
networks:
- frontend
- backend
db:
networks:
- backend
cache:
networks:
- backend
networks:
frontend:
backend:
web can talk to both db and cache. The frontend network might also be reached by a reverse proxy. db and cache are isolated from the frontend.
Environment files¶
Hardcoding POSTGRES_PASSWORD: secret in YAML is bad practice. Use a .env file:
Compose loads .env automatically. Reference variables in YAML with ${VAR} or ${VAR:-default}.
Add .env to .gitignore. Commit a .env.example template with placeholder values.
Profiles¶
For services you only want to run sometimes (debug containers, ops tools):
docker compose up starts only web and db. docker compose --profile debug up also starts pgadmin.
A typical dev workflow¶
# First time:
docker compose up -d
docker compose logs -f # watch startup
# Stop following with Ctrl-C; containers keep running
# During development (you change code):
# If using bind mount: containers see changes immediately
# If you changed the Dockerfile or dependencies:
docker compose build web
docker compose up -d web # recreate just web
# Exec into a container:
docker compose exec web bash
# Done for the day:
docker compose down
# Forget everything (including volumes - careful):
docker compose down -v
Real-world examples to read¶
Many OSS projects ship a compose.yaml so you can run them locally with one command. Some good ones to look at:
- Plausible Analytics (analytics -
plausible/community-edition) - Sentry (error tracking -
getsentry/onpremise) - Mastodon (social network)
- Nextcloud
Reading their compose.yaml teaches you patterns for production-shape multi-service setups.
Exercise¶
-
Create a compose.yaml for a simple web + database stack:
-
Run:
-
Open
http://localhost:8081in your browser. You should see Adminer's UI. Connect: server=db, user=postgres, password=secret. You can browse the (mostly empty) Postgres database. -
Shell into a service:
Type\dt(list tables - empty),\qto quit. -
Stop:
(Volumepgdatasurvives.docker compose down -vwould delete it too.) -
Edit the compose.yaml - change the Adminer port to 8082.
docker compose up -dagain. Adminer should be reachable at the new port; Postgres unchanged (Compose recreates only what changed).
What you might wonder¶
"What's a Compose 'project'?"
The collection of services defined in a single compose file. Project name defaults to the parent directory's name. docker compose -p myproj up overrides.
"Should I use Compose in production?" For small deployments on a single host, sure. For anything beyond ~5 services or that needs scaling/HA, look at Kubernetes (separate path).
"How do I run my own images instead of pulling?"
build: . (Dockerfile in current dir) or build: ./path/to/Dockerfile. Compose builds and uses the resulting image. Combined with image tagging: build: . + image: myapp:dev builds AND tags.
"What's extends? x-... ?"
x-name: defines reusable YAML anchors (custom keys starting with x- are ignored by Compose but available for YAML's anchor/alias feature). extends: lets one service inherit from another. Both for DRYing up large compose files. Recognize when you see them.
Done¶
- Write a
compose.yamldefining services, networks, volumes. - Use
up,down,logs,exec,ps,build,pull. - Use service names for container-to-container DNS.
- Use environment files for secrets.
- Use health checks and
depends_onfor ordering.
09 - Slimming Images¶
What this session is¶
About 45 minutes. Image size matters: smaller = faster pulls, faster cold starts, smaller attack surface. You'll learn multi-stage builds, .dockerignore, base-image choices, and the common slimming techniques.
Why size matters¶
A 1.5GB image and a 50MB image both run the same. But:
- The 1.5GB image takes 30 seconds to pull on a slow link; the 50MB takes 1.
- The 1.5GB has thousands of files (extra attack surface, more CVE matches).
- Cold-start a serverless container from a 1.5GB image? Painful.
- CI builds with 1.5GB intermediates eat disk and slow caching.
Aim for the smallest sensible image. Not the absolute smallest (that route lies madness); the smallest one you can build comfortably.
Picking a base¶
Start with the smallest base that works:
| Base | Size | Best for |
|---|---|---|
scratch |
0 bytes | Static binaries (Go, Rust) - no OS at all |
gcr.io/distroless/static |
~2MB | Static binaries - has CA certs, tzdata, /etc/passwd |
alpine:3.20 |
~5MB | Anything that works on musl (most things) |
debian:bookworm-slim |
~75MB | Things that need glibc but don't need many tools |
python:3.12-slim |
~150MB | Python apps (slim variant) |
ubuntu:24.04 |
~80MB | When you need a familiar full distro |
Rule of thumb: start with alpine or *-slim. Reach for full distros only when a wheel/binary doesn't work on the smaller one.
Multi-stage builds¶
The biggest slimming win. Use one stage to build, another to package the result. Build tools, source code, test artifacts don't ship.
A real example - Go:
# Stage 1: build
FROM golang:1.23 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app/myapp ./cmd/myapp
# Stage 2: ship just the binary
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/myapp"]
Build:
The FROM ... AS name creates a named stage. The COPY --from=builder copies from the previous stage. Only the final stage ships.
Same idea for any compiled language. For Rust:
FROM rust:1.80 AS builder
WORKDIR /src
COPY . .
RUN cargo build --release
FROM gcr.io/distroless/cc-debian12
COPY --from=builder /src/target/release/myapp /myapp
CMD ["/myapp"]
For Node.js (interpreted, but you can still avoid shipping dev-deps):
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev
USER node
CMD ["node", "dist/index.js"]
The second stage installs only production dependencies. Builds drop from "MB of dev deps + source + dist" to "just dist + runtime deps."
Distroless: nearly-empty base images¶
Google's "distroless" images (gcr.io/distroless/*) contain:
- The language runtime (for python, java, etc.) - OR nothing (static).
- CA certificates, tzdata, /etc/passwd, a few essentials.
- No shell, no package manager, no debug tools.
Pros: tiny, minimal attack surface, no shell-injection footholds.
Cons: harder to debug (no docker exec ... sh). For that, distroless ships a :debug variant for occasional use.
For static-binary languages (Go, Rust) shipping a CLI: distroless/static. For Java: distroless/java. For Python: distroless/python3. (Each has variants.)
.dockerignore¶
Already covered in page 05. Critical: anything not in .dockerignore is sent to the daemon as build context. .git, node_modules, target/, build caches all bloat builds.
A reasonable .dockerignore for a polyglot project:
.git
.gitignore
.dockerignore
Dockerfile*
.idea
.vscode
*.md
node_modules
target
__pycache__
*.pyc
.env
.env.*
dist
build
coverage
.cache
Combine RUN instructions¶
Each RUN creates a layer. If you RUN apt-get install foo then RUN apt-get remove foo, the second layer doesn't actually reclaim the disk - the first layer still has the package files.
Combine into one RUN:
# Bad - bloats the image
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean
# Good - one layer, ends clean
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
rm -rf /var/lib/apt/lists/*
Three patterns above:
- --no-install-recommends skips optional dependencies.
- rm -rf /var/lib/apt/lists/* removes the apt cache.
- Everything in one RUN so the cleanup is in the same layer.
Specific minor wins¶
- Don't store secrets in the image. Pass them at runtime (env vars, mounts, secret managers).
COPYthem into a layer and they're there forever, even if you delete them in a later layer. - Set
WORKDIRonce at the top instead ofcdinRUNs. Cleaner. - Pin versions in
apt-get install foo=1.2.3. Reproducible builds. - Use
--mount=type=cache(BuildKit) for things likeapt/pip/go modcaches that should persist across builds without being in the image.
A typical "before/after"¶
A naive Python Dockerfile, ~1GB:
Slimmed version, ~120MB:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN useradd --create-home --shell /bin/bash app && chown -R app /app
USER app
CMD ["python", "app.py"]
Changes:
- python:3.12 → python:3.12-slim (Debian slim base).
- requirements.txt separately (cache reuse on code changes).
- --no-cache-dir (no pip cache in image).
- Non-root user.
Multi-stage if you have compiled wheels takes it to ~80MB.
Exercise¶
-
Build a Go hello-world with multi-stage:
Build, run, check size: Should be ~5MB. Compare to a single-stage build using# Dockerfile FROM golang:1.23 AS builder WORKDIR /src COPY go.mod . COPY hello.go . RUN CGO_ENABLED=0 go build -o /app/hello . FROM gcr.io/distroless/static COPY --from=builder /app/hello /hello ENTRYPOINT ["/hello"]golang:1.23directly - ~1GB. -
Find what's bloating an image with
Note where the size differences come from.docker history: -
.dockerignoretest: create a folder with a.gitdirectory full of stuff. Build a trivial Dockerfile that just doesCOPY . /app. Note the "Sending build context to Docker daemon" line - large. Add.dockerignorewith.git. Rebuild; context is much smaller.
What you might wonder¶
"Why does Alpine cause weird pip install issues?"
Alpine uses musl libc (most Linux uses glibc). Many Python wheels are pre-compiled against glibc - they don't have musl variants, so pip falls back to compiling from source (slow, often fails). For Python on Alpine, expect occasional headaches; *-slim (Debian-based) is friendlier.
"What's BuildKit?"
The modern Docker build engine, default in recent Docker. Faster, supports advanced features (cache mounts, secret mounts, multi-platform builds). Enable with DOCKER_BUILDKIT=1 (or it's already on).
"Should I shoot for the smallest possible image?" No. Shoot for "small enough to feel light, easy enough to maintain." A 50MB image is often a better trade-off than a 5MB one if the 5MB takes hours of debugging to keep working.
Done¶
- Pick base images by size and ecosystem fit.
- Use multi-stage builds.
- Use distroless for static-binary-only ships.
- Use
.dockerignore. - Combine
RUNs to minimize layers.
10 - Security Basics¶
What this session is¶
About 45 minutes. Five things that make any container deployment notably safer. Not exhaustive - just the high-leverage moves.
1. Don't run as root¶
By default, the container's process runs as root inside the container. If an attacker escapes the container (rare but happens), they're root on your host (unless user namespaces are configured, which Docker doesn't do by default).
Always switch to a non-root user:
Or use a numeric UID:
Some images do this for you (e.g. nginx switches to nginx user, postgres to postgres). Many don't. Verify with:
If uid=0(root), you're running as root. Fix.
2. Read-only root filesystem¶
Most apps don't need to write to their root filesystem. Make it read-only:
Anything that tries to write outside the explicit tmpfs mounts (or other mounted volumes) fails. This neutralizes a class of attacks where malware drops a binary into /usr/bin or similar.
For specific writable areas (a cache directory, /var/log), add --tmpfs PATH (in-memory) or -v VOL:PATH (named volume).
In compose:
3. Drop unnecessary capabilities¶
Linux capabilities split root's powers into ~40 distinct privileges. By default Docker grants a subset (~14). For most apps, you can drop them all:
If the app needs one specific capability (e.g. binding to port < 1024):
Most modern apps need no capabilities - they don't try to do privileged things. Drop ALL by default; add only when needed and justified.
In compose:
4. --security-opt no-new-privileges¶
Prevents the container from gaining privileges via setuid binaries (e.g. sudo, su):
Almost always safe to add. Pair with non-root user for defense-in-depth.
In compose:
5. Don't bake secrets into images¶
Never COPY a .env file. Never ENV PASSWORD=hunter2. The secret is in a layer forever - even if you delete it in a later layer, docker history reveals it.
Options:
- Env vars at runtime:
-e PASSWORD=...or--env-file secrets.envatdocker run. Don't commit the env file. - Mounted secret files: mount a directory or file with secrets at runtime.
- Secret managers: Docker Swarm secrets, Kubernetes secrets, HashiCorp Vault, AWS Secrets Manager, etc. For non-toy deployments.
For local dev, env files in .gitignore. For production, a real secret manager.
Quick audit: is this image clean?¶
Run an image vulnerability scanner:
# Trivy (Aqua Security - free, open source):
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image myapp:1.0
# Grype (Anchore):
grype myapp:1.0
Both produce a list of known CVEs in the image's packages. Triage: fix high/critical first. Many are inherited from the base image; updating the base often fixes batches.
docker scout (built into modern Docker) is another option:
Run scanners as part of your CI. Don't ship images with known critical CVEs without explicit acknowledgement.
Putting it together: a hardened Dockerfile¶
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
# Non-root user with a known UID
RUN useradd --create-home --shell /bin/bash --uid 1001 app && \
chown -R app:app /app
USER 1001:1001
EXPOSE 8000
# Don't put secrets here - pass at runtime.
CMD ["python", "app.py"]
Run it hardened:
docker run -d --rm \
--name app \
-p 8000:8000 \
--read-only \
--tmpfs /tmp \
--cap-drop=ALL \
--security-opt=no-new-privileges \
-e DATABASE_URL=postgres://... \
myapp:1.0
Compose form:
services:
app:
image: myapp:1.0
ports: ["8000:8000"]
read_only: true
tmpfs: [/tmp]
cap_drop: [ALL]
security_opt: ["no-new-privileges:true"]
environment:
DATABASE_URL: ${DATABASE_URL}
The big footguns¶
Things to never do (or do only with very deliberate awareness):
--privileged- disables most isolation. Equivalent to "this container IS the host." Used by Docker-in-Docker and a few other special cases; almost never appropriate.-v /:/hostor-v /var/run/docker.sock:...- mounting host paths into the container. Especially the Docker socket - anyone in the container can issue Docker commands, which means root on the host.- Running with
--user 0when the image has a non-root default. You're undoing the safety. --network=hostwith untrusted images. The container has full access to your host's network stack.
What you're not doing here¶
This page is the basics. Real container security also includes: pod-level policies (in Kubernetes), seccomp profiles, AppArmor/SELinux, signed images, supply-chain attestations, runtime detection (Falco). Those are advanced topics; the "Container Internals" senior reference path covers them.
For a first deployment, the five basics on this page get you 80% of the value.
Exercise¶
-
Identify which images run as root:
-
Run an app hardened:
(nginx needs writabledocker run -d --rm \ --name web \ -p 8090:80 \ --read-only \ --tmpfs /var/cache/nginx \ --tmpfs /var/run \ --cap-drop=ALL \ --cap-add=NET_BIND_SERVICE \ --security-opt=no-new-privileges \ nginx curl -s http://localhost:8090 | head docker stop web/var/cache/nginxand/var/run, hence the tmpfs mounts.) -
Scan an image:
Read the report. Note how many CVEs - and how many are critical vs informational.
What you might wonder¶
"Should I rebase my images regularly?" Yes. Rebuild your images periodically with the same Dockerfile - the base image you depend on receives security updates that flow into your image only when you rebuild. Automate via CI (rebuild weekly, run image scans, alert on new vulns).
"What about supply-chain attacks?"
Pin base images by digest (FROM nginx@sha256:abc...) not tag. Use signed images. Verify via cosign / sigstore. Way beyond beginner; mentioned for awareness.
"What's seccomp?" A Linux kernel feature that filters which syscalls a process can make. Docker has a default seccomp profile that blocks a handful of dangerous syscalls. You can customize. Beyond beginner.
Done¶
- Run containers as non-root.
- Make root filesystems read-only.
- Drop unneeded Linux capabilities.
- Use
--security-opt=no-new-privileges. - Keep secrets out of images.
- Run image scanners.
11 - Image Registries¶
What this session is¶
About 30 minutes. How to share images: log in to a registry, tag for upload, push, pull. The two big ones - Docker Hub and GitHub Container Registry (GHCR).
What a registry is¶
A registry is just a server that stores images. docker pull alpine contacts Docker Hub. docker pull ghcr.io/owner/image contacts GitHub Container Registry. Any compliant server can host a registry; you can even run one yourself.
The biggest public registries:
| Registry | URL prefix | Notes |
|---|---|---|
| Docker Hub | docker.io/ (often omitted) |
The default; largest ecosystem |
| GitHub Container Registry | ghcr.io/ |
Free for OSS; tightly integrated with GitHub |
| Google Artifact Registry | <region>-docker.pkg.dev/ |
Cloud-native |
| AWS ECR | <account>.dkr.ecr.<region>.amazonaws.com/ |
AWS-specific |
| Azure Container Registry | <name>.azurecr.io/ |
Azure-specific |
For your first contributions, Docker Hub and GHCR are the relevant ones.
Image references, full form¶
REGISTRY- defaults todocker.iowhen omitted.NAMESPACE- your username, org, orlibrary(for official images).NAME- the image name.TAG- version label.DIGEST- content hash (immutable).
Examples (same image, four ways):
When you docker pull, the short forms work; the daemon fills in defaults.
Log in¶
docker login # Docker Hub
docker login ghcr.io # GitHub Container Registry
docker login <other-registry> # other
For Docker Hub, use your dockerhub username + password (or a personal access token - recommended).
For GHCR, use your GitHub username + a Personal Access Token (PAT) with write:packages scope. Generate one at github.com/settings/tokens.
Your credentials are stored in ~/.docker/config.json. For security, modern Docker uses your OS keychain on macOS/Windows by default.
Tag an image for upload¶
To push to a registry, the image must be tagged with the registry's prefix:
docker tag myimage:1.0 myname/myimage:1.0 # Docker Hub
docker tag myimage:1.0 ghcr.io/myname/myimage:1.0 # GHCR
docker tag SOURCE TARGET doesn't copy - it adds a new label to the same image. After tagging, both names refer to the same image. Remove either via docker rmi; the underlying image stays as long as one tag points to it.
Push¶
Watches each layer upload (only changed layers transfer - Docker compares hashes).
After a push, visit Docker Hub or GHCR in your browser. You should see the image. Add a README on Docker Hub's UI; verify visibility (public vs private - check the settings).
Pull on another machine¶
If private, you need to docker login first.
Multi-arch images¶
Modern registries support multi-arch manifests: one tag points to several architecture-specific images. nginx:1.27 resolves to the right one for your host (amd64, arm64, etc.).
Build multi-arch yourself with docker buildx:
docker buildx create --name multiarch --use
docker buildx build --platform linux/amd64,linux/arm64 -t myname/myimage:1.0 --push .
That builds both architectures in one go and pushes a single multi-arch manifest. Useful when you publish for both x86 servers and ARM (Macs, Raspberry Pis).
Pull-through caches¶
Big setups run a local registry as a pull-through cache - pulls from your local one, which only contacts Docker Hub on cache misses. Lessens hit on Docker Hub rate limits (free Docker Hub limits per-IP pull rate); faster pulls in your network.
registry:2 is the official open-source image. Run it; configure your Docker daemon (daemon.json) to use it. Beyond beginner; recognize the pattern.
Self-hosted registries¶
docker run -d -p 5000:5000 registry:2 runs a private registry on localhost:5000. Push/pull:
docker tag myimage:1.0 localhost:5000/myimage:1.0
docker push localhost:5000/myimage:1.0
docker pull localhost:5000/myimage:1.0
For shared internal use, you'd also want TLS, auth, and storage backed by something durable (S3, GCS). The defaults are insecure-by-design for local-only testing.
Public vs private¶
Both Docker Hub and GHCR support both. By default:
- Docker Hub: new repos are public unless you have a paid plan.
- GHCR: inherits the parent repo's visibility (public if your repo is public, private if not). For org-owned images, configure in package settings.
Public images can be pulled by anyone, no auth. Private require docker login.
CI: building and pushing on every commit¶
A typical GitHub Actions workflow:
name: Build and push image
on: { push: { branches: [main] } }
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ghcr.io/${{ github.repository }}:latest
Every push to main rebuilds and pushes to ghcr.io/owner/repo:latest. You'll see this exact pattern in many OSS Rust/Go/Python projects.
Exercise¶
You need a Docker Hub account (free at hub.docker.com) and/or a GitHub PAT for GHCR.
-
Tag and push to Docker Hub:
Open Docker Hub in a browser. Find your image. -
Pull from another tag:
-
Push to GHCR:
Visit GitHub → your profile → Packages. Find the image. Optionally make it public from the package settings. -
Pull-through cache (advanced):
What you might wonder¶
"What's the difference between docker push and the image actually appearing?"
After push, Docker Hub processes the upload; usually visible immediately. GHCR shows it under your packages quickly too.
"How do I delete images from a registry?"
Docker Hub: via the web UI. GHCR: via GitHub Packages UI. Programmatically via each registry's API. There's no docker rm for remote images; pushing again with the same tag overwrites.
"What about image signing?"
For real supply-chain integrity, sign images with cosign (sigstore). Verify at deploy. The "Container Internals" senior reference path covers this. Beyond beginner.
"Docker Hub rate limits?" Free anonymous pulls: ~100/6 hours per IP. Logged-in free: 200/6h. Paid plans: higher or unlimited. For CI on free tier, log in or use a registry mirror.
Done¶
- Tag images for a registry.
- Push to Docker Hub and GHCR.
- Pull (private images need
docker loginfirst). - Recognize multi-arch images.
- Know about pull-through caches and self-hosted registries.
Next: Reading other people's Dockerfiles →
12 - Reading Other People's Dockerfiles¶
What this session is¶
About 30 minutes. Strategy for reading a real-world Dockerfile and compose.yaml so you understand what an OSS project is doing.
The five-minute orientation¶
For any containerized project:
- Read the project's README - what does it do, how to run it.
- Find the
Dockerfile(orDockerfile.*variants) - usually at repo root ordocker/. - Read top to bottom. Each instruction has an obvious purpose; you've seen them in pages 05-09.
- Find any
compose.yaml- tells you the multi-container topology. - Find the CI workflow (
.github/workflows/) - shows how the image is built and pushed.
After five minutes you should be able to summarize: "This project produces an image based on X, running Y as Z user, exposing port N."
Reading top to bottom¶
FROM golang:1.23 AS builder # build stage - full Go toolchain
WORKDIR /src
COPY go.mod go.sum ./ # dep manifest first (cache)
RUN go mod download
COPY . . # source
RUN CGO_ENABLED=0 go build -o /app/myapp ./cmd/myapp
FROM gcr.io/distroless/static:nonroot # final stage - minimal
COPY --from=builder /app/myapp /myapp # copy just the binary
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/myapp"]
Read each line: "build with Go 1.23, copy deps then download then source, compile, switch to distroless, copy binary, run as non-root user, expose 8080."
You can predict from this Dockerfile: - Image will be tiny (~10MB) - distroless + static binary. - Runs as a non-root user - hard to escape. - Single binary - easy to debug.
Read a compose.yaml¶
services:
web:
build: .
ports: ["8080:8080"]
environment:
DATABASE_URL: postgres://postgres:secret@db:5432/myapp
depends_on:
db:
condition: service_healthy
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
POSTGRES_DB: myapp
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
volumes:
pgdata:
Read: "two services, web builds from the Dockerfile in this dir and talks to a postgres database named db over the auto-created network, with the database's data in a named volume."
You can predict: "to run this locally, docker compose up -d will probably Just Work after I set the right env vars."
Patterns you'll see in real projects¶
Multi-stage with --from=builder - almost universal for compiled languages.
HEALTHCHECK instructions inside Dockerfiles (alternative to compose health checks). The image documents how to determine if it's healthy.
ARG for version pinning at build time:
ONBUILD - instructions that run when this image is used as a base. Rare; recognize.
Init systems (tini, dumb-init):
tini is a minimal init that handles signals (SIGTERM, zombie reaping) correctly. Useful when the app doesn't handle PID 1 duties itself (typical for Node, Python apps).
SHELL instruction - uses bash instead of /bin/sh -c:
docker-entrypoint.sh - a wrapper script as the entrypoint that does setup before running the main command. The Postgres official image's entrypoint, for example, sets up the database directory on first run.
What to look for when evaluating a project¶
When considering contributing:
- Does the image build cleanly? Try
docker build .from a fresh clone. If it errors, that's a "good first issue" target right there. - Is the image reasonable size?
docker images <name>- anything over 500MB for a typical web service deserves attention. - Does it run as non-root? Check with
docker run --rm <image> id. - Are secrets baked in? Run
docker history <image> --no-truncand grep for suspicious things. - Is it pinned?
FROM ubuntuinstead ofFROM ubuntu:24.04is fragile. PRs that pin base images are usually welcome. - Multi-arch builds? If they ship only amd64 in 2026, ARM users (Apple Silicon, Raspberry Pi) can't use it without slow QEMU emulation. PRs adding ARM builds are valuable.
These are all PR opportunities for someone with container skills.
A worked example: read a real project's container setup¶
Pick a public project. Suggestion: Plausible Community Edition (plausible/community-edition).
- Clone it:
git clone https://github.com/plausible/community-edition. - Look at
Dockerfile(or look up the upstream one). - Look at
docker-compose.yml. - Read the README about deployment.
After 10 minutes you should know: - What base image they use. - Whether they build multi-stage. - What services compose into the stack (web app, postgres, clickhouse, etc.). - How they configure secrets.
This is exactly the work you'd do before opening a PR.
Exercise¶
Pick a small OSS project with a Dockerfile. Suggestions:
- peterbourgon/ff (Go) - small CLI library; may or may not have a Dockerfile; you can suggest one if not.
- fatih/color (Go) - terminal colors library.
- mholt/caddy (Go) - web server. Has a public Dockerfile.
- grafana/grafana (Go + TypeScript) - observability. Excellent Dockerfile + CI.
Clone one. Find its Dockerfile. Apply the five-minute orientation. Write a paragraph: - What base image? - Multi-stage? - Final size (build it; check)? - Non-root? - Anything you'd improve?
That paragraph IS your potential PR plan.
What you might wonder¶
"What if the project's Dockerfile uses things I haven't seen?"
Look them up. Most instructions are covered in pages 05-10. Less common ones (ONBUILD, STOPSIGNAL, HEALTHCHECK) are in the Dockerfile reference docs.
"What's the right time to suggest a Dockerfile improvement?" After understanding why it's structured the way it is. Some quirks are intentional (work around an upstream bug, need a specific tool). Investigate before "improving."
"What about non-Dockerfile container projects? (Podman, Buildah, Nixpkgs OCI builders, etc.)" The Dockerfile format is the lingua franca; almost every project uses it. Podman/Buildah read the same format. Nix is a different world (declarative builds, reproducibility); rare but powerful.
Done¶
- Read a Dockerfile top to bottom.
- Read a compose.yaml's services topology.
- Recognize common patterns (multi-stage, tini, entrypoint scripts).
- Spot common improvement opportunities for PRs.
13 - Picking a Project¶
What this session is¶
About 30 minutes + browsing. What "containerized OSS" projects accept first contributions, with specific candidates.
What kinds of projects fit container skills¶
The OSS work you can do with container skills (without needing deep programming):
- Improve Dockerfiles - slimming, multi-stage builds, non-root user, pinning, multi-arch.
- Improve compose.yaml - health checks, env-file examples, missing services.
- Fix bugs in container-related code - entrypoint scripts, init scripts, install scripts.
- Improve documentation - most container docs lack examples or have inconsistencies.
- Add GitHub Actions - build-and-push workflows that don't yet exist.
- Translate container/deployment docs.
These are everywhere. Almost every OSS project today ships container images.
10-minute evaluation¶
Same criteria as the other beginner paths:
| Signal | Target |
|---|---|
| Stars | 100-50000 |
| Last commit | Within a month |
| Open PRs | Some, not 200+ |
| Recent PR merge time | Under 14 days |
good first issue count |
At least 5 |
| Has a CONTRIBUTING.md | yes |
docker build works on fresh clone |
yes |
Candidates¶
Tier 1 - small, gentle¶
nginxinc/docker-nginx- official nginx Docker image (the Dockerfile fornginx:latest). PRs improving the Dockerfile here ship to millions.docker-library/- Docker's official image collection. Each language/database has a sub-repo. Excellent labels.linuxserver/docker-baseimage-alpine- base images used by manylinuxserver.ioimages. Small, active.cookiecutter-docker-science- small templates for containerized science workflows.
Tier 2 - medium, well-organized¶
Plausible/community-edition- analytics platform with a compose-based deployment.getsentry/onpremise- Sentry's on-premise containerized stack.nextcloud/docker- Nextcloud's container images. Active.testcontainers/testcontainers-*- Testcontainers (various languages). Containerization for tests.bitnami/charts- Bitnami's Helm charts (technically Kubernetes; container topology nonetheless).
Tier 3 - larger, more visible¶
docker/docs- Docker's docs site. Improving examples or fixing typos is a great first PR.docker-library/official-images- the meta-repo that governs all official images.docker/buildx- Buildx itself.
Tier 4 - don't start here¶
- The Docker engine itself. Large, Go, complex.
moby/moby- Docker's runtime. Same.
A specific recommendation: docker-library/¶
docker-library/* repos are excellent first targets. Each Dockerfile is small, public, and gets heavy use. Issues range from "fix typo in the README" to "add example for X". Maintainers are responsive.
Pick one whose underlying project you use: docker-library/postgres, docker-library/python, docker-library/redis, etc.
Finding issues¶
Project's Issues tab → Labels. Filter:
- good first issue
- documentation
- help wanted
Read 5-10. Pick one with: - Clear description. - Contained fix (one file, ideally). - Unclaimed. - Not open for a year.
Comment to claim. Wait for maintainer confirmation.
What counts¶
For container work:
- Updating a Dockerfile to a newer base.
- Adding a non-root user.
- Adding a .dockerignore.
- Adding multi-stage.
- Adding multi-arch builds via buildx.
- Fixing a typo in a Docker Hub README.
- Adding an example to documentation.
- Fixing a broken compose example.
All real, all count.
Exercise¶
- Browse three Tier 1 / Tier 2 projects.
- 10-minute evaluation on each.
- Pick the most responsive.
- Read CONTRIBUTING.md.
- Clone: If the build fails on a fresh clone, that's already a flag - either you missed setup, or the project's docs are out of date (might be a good first issue right there).
- Browse
good first issuetickets. Pick two candidates.
What you might wonder¶
"What if I don't see Docker-specific labels?"
Some projects use generic labels. Filter by docker keyword in the issue search bar, or Dockerfile, or containerization.
"What if no one's published a Dockerfile for a tool I love?" That IS a contribution. Open an issue: "Would a Dockerfile + GHCR build be welcome?" If yes, submit one.
Done¶
- Recognize container-OSS contribution shapes.
- Run the 10-minute eval.
- Have specific candidate projects.
Next: Anatomy of a containerized OSS project →
14 - Anatomy of a Containerized OSS Project¶
What this session is¶
About 30 minutes. Walk through the typical file layout of a containerized OSS project.
Typical layout¶
.
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── Dockerfile (main image)
├── Dockerfile.dev (variants - optional)
├── .dockerignore
├── compose.yaml (or docker-compose.yml - local-dev stack)
├── compose.prod.yaml (override for prod, sometimes)
├── .github/
│ └── workflows/
│ ├── ci.yml (tests)
│ └── docker.yml (build + push image to a registry)
├── src/ (or app/, cmd/, lib/ - application code)
├── docker/ (Docker-related helpers, optional)
│ ├── entrypoint.sh
│ └── nginx.conf
├── deploy/ (deployment manifests, sometimes)
└── docs/
Not every project has every file. Roles:
Root-level container files¶
Dockerfile- the main image's recipe. Always at the root by convention.Dockerfile.*variants - for different roles:Dockerfile.dev(with dev dependencies and live reload),Dockerfile.test(with test tools),Dockerfile.alpine(slimmer variant)..dockerignore- paths excluded from the build context.compose.yaml- local dev stack: app + dependencies (databases, queues).compose.override.yaml- automatic override for local; usually adds dev-only settings.compose.prod.yaml- sometimes; for "production-ish" runs.
.github/workflows/¶
Two patterns:
- ci.yml - runs tests on PRs. Builds the image as part of testing.
- docker.yml (or release.yml) - on tag pushes, builds + pushes to a registry (Docker Hub, GHCR).
Read both. They tell you exactly what your PR's CI will measure.
docker/ (sometimes)¶
Container-specific helpers that live outside the main source tree:
- entrypoint.sh - the script that runs first when the container starts. Often does setup (waits for the DB, migrates, sets env from secrets) then exec's the actual app.
- nginx.conf, prometheus.yml - config templates for sidecar services.
- healthcheck.sh - sometimes.
Reading the entrypoint script¶
Many real-world projects use an entrypoint.sh to do dynamic setup at container start:
#!/bin/sh
set -e
# Wait for the database
until pg_isready -h "$DB_HOST" -p "$DB_PORT"; do
echo "Waiting for db..."
sleep 1
done
# Run migrations
./manage.py migrate --noinput
# Collect static files
./manage.py collectstatic --noinput
# Exec the actual command (whatever was passed to the container)
exec "$@"
The Dockerfile invokes it as the entrypoint:
Reading order: see ENTRYPOINT → read the script → understand what runs on startup.
A worked walkthrough¶
Imagine you cloned a project named "blog-app." Apply orientation:
- README. Says: "A small blog engine. Run with
docker compose up." -
Dockerfile:Python slim base, basic structure. Could be improved (non-root user, multi-stage perhaps,FROM python:3.12-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["gunicorn", "wsgi:app", "-b", "0.0.0.0:8000"].dockerignorecheck). -
.dockerignore: present? Check for things like.git,*.pyc, etc. -
Two services. App + Postgres. Secret hardcoded (not great for production; fine for local).compose.yaml: -
.github/workflows/docker.yml: if present, look at the build + push job. Probably usesdocker/build-push-action. Note which tags get pushed.
You can now confidently say: "This is a Python web app, Dockerized with a slim base, with a Postgres dependency, deployed via CI to GHCR. Possible improvements: non-root user, pin base image by digest, add multi-arch builds, add a healthcheck."
That mental map is the platform for your PR.
What "good improvements" look like¶
Easy PRs you can make to most containerized projects:
- Pin the base image:
FROM python:3.12-slim→FROM python:3.12.5-slim(or by digest). Smaller surprise surface. - Add
USER: create a non-root user; switch to it. - Add
.dockerignoreif missing. - Split into multi-stage if there's a build step that could be separated from runtime.
- Add multi-arch builds in CI (
docker buildx). - Add a healthcheck.
- Reduce image size by combining
RUNs, switching to slim base, or shifting to distroless. - Improve documentation - explain env vars, port mappings, volume layout.
Each is a contained, reviewable PR.
Exercise¶
Use the project you picked in page 13:
- Clone locally.
- Walk the layout. Map each file to a category.
- Read
CONTRIBUTING.mdend-to-end. - Find CI workflow YAMLs. List the commands they run.
- Run those commands locally:
- Open your tentative issue. Identify which file(s) it touches (likely Dockerfile, compose.yaml, or a doc file).
You're ready to make a change.
What you might wonder¶
"What if there's no CI workflow?" Sometimes projects don't have one yet. Adding a basic GitHub Actions workflow that builds the image and pushes to GHCR is a great PR - but check with maintainers first; some prefer to add CI themselves.
"What about projects using Bazel, Nix, or other build systems?"
Different worlds. Bazel-built containers use rules_oci/rules_docker. Nix produces deterministic OCI images via dockerTools. Recognize when you see them; they're less common.
"What if the project uses Podman / Buildah?"
Same Dockerfile format. The CLI invocations change (podman build instead of docker build). Most concepts transfer.
Done¶
- Recognize the typical containerized-project layout.
- Read entrypoint scripts.
- Read CI workflows for build/push steps.
- Identify likely improvement PRs.
Next: Your first contribution →
15 - Your First Contribution¶
What this session is¶
The whole thing. We walk through making a real contribution to a real containerized OSS project, end-to-end.
The workflow¶
Identical to the workflow in the other beginner paths:
- Fork on GitHub.
- Clone your fork.
- Add upstream as remote.
- Branch off main.
- Set up: ensure
docker buildworks on a fresh clone. - Change the Dockerfile / compose.yaml / docs.
- Test locally: rebuild, run, verify nothing broke.
- Push to your fork; open PR.
Step 1: Fork & clone¶
GitHub → Fork (top right). Then:
git clone git@github.com:<you>/<project>.git
cd <project>
git remote add upstream git@github.com:<owner>/<project>.git
git fetch upstream
Step 2: Branch¶
Branch names should hint at the change.
Step 3: Verify the baseline¶
Should succeed. If it doesn't on a fresh clone, fix that first (or ask in the issue).
Step 4: Make the change¶
Edit the Dockerfile. Suppose your change is "add a non-root user." Before:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]
After:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN useradd --create-home --shell /bin/bash --uid 1001 app && \
chown -R app:app /app
USER 1001:1001
CMD ["python", "app.py"]
Step 5: Test¶
docker build -t test:after .
docker run --rm test:after id # should report uid=1001
docker run --rm test:after python app.py # should still work
If the app needs to write somewhere (e.g. a cache dir), confirm permissions are correct.
If the project has a compose.yaml, test that too:
Step 6: Run CI's checks locally¶
Open the CI workflow (.github/workflows/*.yml). Whatever it runs, run those commands too:
docker buildx build --platform linux/amd64,linux/arm64 .- multi-arch test.hadolint Dockerfile- Dockerfile linter (install:brew install hadolintor use the Docker image).trivy image test:after- vulnerability scan.
If any fail, fix before pushing.
Step 7: Commit & push¶
git add Dockerfile
git commit -m "Dockerfile: run as non-root user (UID 1001)"
git push origin fix/dockerfile-non-root-user
Commit message conventions vary by project. Some require Conventional Commits (fix:, feat:); most don't.
Step 8: Open the PR¶
On the upstream repo, click "Compare & pull request."
- Title. Mirror the commit message.
- Description. What changed, why, how you tested. If there's a related issue:
Closes #123. - Checklist. Address every item in the PR template.
Submit. CI runs. Fix anything red by pushing more commits to the same branch.
What review looks like¶
A maintainer reads. Outcomes:
1. "LGTM, merging." Done.
2. "Could you change these?" Most common. Address each comment, push commits.
3. "Not what we want." Rare for good first issue work. Ask about related work.
4. Silence. Polite check-in after 1 week; escalate after 3.
Address feedback efficiently. Disagree only on substance.
After the merge¶
- Update your fork's
main. - Delete the branch.
- Take a screenshot.
- Sit with it for a day.
Worked example: contributing to docker-library/python (hypothetical)¶
Suppose you noticed docker-library/python doesn't have an example in its README for using build arguments to pin a Python patch version. You decide to add one.
git clone git@github.com:<you>/docker-python.git
cd docker-python
git remote add upstream git@github.com:docker-library/python.git
git fetch upstream
git checkout -b docs/add-build-arg-example
# Edit README.md, add the example.
# Test that the README renders correctly (markdown preview in your editor).
git add README.md
git commit -m "Add example: pinning Python patch version via build-arg"
git push origin docs/add-build-arg-example
Open PR. Wait for review. Address style nits ("please use fenced code blocks with dockerfile language tag"). Push fixes. Merge.
You're now a docker-library contributor.
After your first PR: what next¶
- Pick another issue in the same project. Familiarity compounds.
- After 3-5 PRs, become a regular. Watch issues, help others, review PRs (you don't need maintainer permissions to leave helpful comments).
- Branch out to Tier 3-4 projects.
- Build your own containerized service. Publish the image. Maintain the Dockerfile.
- Pick the next path: Kubernetes From Scratch is the natural follow-up.
What you might wonder¶
"PR sits for weeks?" Polite check-in after 1 week. After 3, ask in the project's chat/discussions.
"My change broke CI?" Read the failing job's logs. Fix locally, push another commit. The PR updates automatically.
"Maintainer rude?" Disengage. Try another project.
"Can I list this on a CV?" Yes - link to specific merged PRs.
Done with this path¶
You've: - Installed Docker, run your first containers. - Built your own images with Dockerfile. - Used volumes, networks, ports. - Composed multi-container apps. - Published images to a registry. - Read a real containerized OSS project. - Submitted a PR.
What you should do next: keep using containers daily. Use them for development, for ad-hoc tools, for experiments. Familiarity compounds.
Recommended next paths on this site:
- Kubernetes From Scratch - containers' big sibling. Orchestration, scaling, declarative deploys.
- Container Internals - senior reference path. How containers actually work (namespaces, cgroups, OCI, runtimes). Assumes you've done this path.
- Linux Kernel - the substrate. Containers ARE Linux features.
Congratulations. You are no longer a beginner.