Saltar a contenido

14 - Anatomy of a Containerized OSS Project

What this session is

About 30 minutes. Walk through the typical file layout of a containerized OSS project.

Typical layout

.
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── Dockerfile                       (main image)
├── Dockerfile.dev                   (variants - optional)
├── .dockerignore
├── compose.yaml                     (or docker-compose.yml - local-dev stack)
├── compose.prod.yaml                (override for prod, sometimes)
├── .github/
│   └── workflows/
│       ├── ci.yml                   (tests)
│       └── docker.yml               (build + push image to a registry)
├── src/                             (or app/, cmd/, lib/ - application code)
├── docker/                          (Docker-related helpers, optional)
│   ├── entrypoint.sh
│   └── nginx.conf
├── deploy/                          (deployment manifests, sometimes)
└── docs/

Not every project has every file. Roles:

Root-level container files

  • Dockerfile - the main image's recipe. Always at the root by convention.
  • Dockerfile.* variants - for different roles: Dockerfile.dev (with dev dependencies and live reload), Dockerfile.test (with test tools), Dockerfile.alpine (slimmer variant).
  • .dockerignore - paths excluded from the build context.
  • compose.yaml - local dev stack: app + dependencies (databases, queues).
  • compose.override.yaml - automatic override for local; usually adds dev-only settings.
  • compose.prod.yaml - sometimes; for "production-ish" runs.

.github/workflows/

Two patterns: - ci.yml - runs tests on PRs. Builds the image as part of testing. - docker.yml (or release.yml) - on tag pushes, builds + pushes to a registry (Docker Hub, GHCR).

Read both. They tell you exactly what your PR's CI will measure.

docker/ (sometimes)

Container-specific helpers that live outside the main source tree: - entrypoint.sh - the script that runs first when the container starts. Often does setup (waits for the DB, migrates, sets env from secrets) then exec's the actual app. - nginx.conf, prometheus.yml - config templates for sidecar services. - healthcheck.sh - sometimes.

Reading the entrypoint script

Many real-world projects use an entrypoint.sh to do dynamic setup at container start:

#!/bin/sh
set -e

# Wait for the database
until pg_isready -h "$DB_HOST" -p "$DB_PORT"; do
  echo "Waiting for db..."
  sleep 1
done

# Run migrations
./manage.py migrate --noinput

# Collect static files
./manage.py collectstatic --noinput

# Exec the actual command (whatever was passed to the container)
exec "$@"

The Dockerfile invokes it as the entrypoint:

COPY docker/entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["gunicorn", "wsgi:app"]

Reading order: see ENTRYPOINT → read the script → understand what runs on startup.

A worked walkthrough

Imagine you cloned a project named "blog-app." Apply orientation:

  1. README. Says: "A small blog engine. Run with docker compose up."
  2. Dockerfile:

    FROM python:3.12-slim
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    COPY . .
    CMD ["gunicorn", "wsgi:app", "-b", "0.0.0.0:8000"]
    
    Python slim base, basic structure. Could be improved (non-root user, multi-stage perhaps, .dockerignore check).

  3. .dockerignore: present? Check for things like .git, *.pyc, etc.

  4. compose.yaml:

    services:
      app:
        build: .
        ports: ["8000:8000"]
        depends_on: [db]
        environment:
          DATABASE_URL: postgres://blog:secret@db:5432/blog
      db:
        image: postgres:16
        environment:
          POSTGRES_USER: blog
          POSTGRES_PASSWORD: secret
          POSTGRES_DB: blog
        volumes:
          - pgdata:/var/lib/postgresql/data
    volumes:
      pgdata:
    
    Two services. App + Postgres. Secret hardcoded (not great for production; fine for local).

  5. .github/workflows/docker.yml: if present, look at the build + push job. Probably uses docker/build-push-action. Note which tags get pushed.

You can now confidently say: "This is a Python web app, Dockerized with a slim base, with a Postgres dependency, deployed via CI to GHCR. Possible improvements: non-root user, pin base image by digest, add multi-arch builds, add a healthcheck."

That mental map is the platform for your PR.

What "good improvements" look like

Easy PRs you can make to most containerized projects:

  1. Pin the base image: FROM python:3.12-slimFROM python:3.12.5-slim (or by digest). Smaller surprise surface.
  2. Add USER: create a non-root user; switch to it.
  3. Add .dockerignore if missing.
  4. Split into multi-stage if there's a build step that could be separated from runtime.
  5. Add multi-arch builds in CI (docker buildx).
  6. Add a healthcheck.
  7. Reduce image size by combining RUNs, switching to slim base, or shifting to distroless.
  8. Improve documentation - explain env vars, port mappings, volume layout.

Each is a contained, reviewable PR.

Exercise

Use the project you picked in page 13:

  1. Clone locally.
  2. Walk the layout. Map each file to a category.
  3. Read CONTRIBUTING.md end-to-end.
  4. Find CI workflow YAMLs. List the commands they run.
  5. Run those commands locally:
    docker build -t test:dev .
    docker compose up -d                # if compose.yaml present
    
  6. Open your tentative issue. Identify which file(s) it touches (likely Dockerfile, compose.yaml, or a doc file).

You're ready to make a change.

What you might wonder

"What if there's no CI workflow?" Sometimes projects don't have one yet. Adding a basic GitHub Actions workflow that builds the image and pushes to GHCR is a great PR - but check with maintainers first; some prefer to add CI themselves.

"What about projects using Bazel, Nix, or other build systems?" Different worlds. Bazel-built containers use rules_oci/rules_docker. Nix produces deterministic OCI images via dockerTools. Recognize when you see them; they're less common.

"What if the project uses Podman / Buildah?" Same Dockerfile format. The CLI invocations change (podman build instead of docker build). Most concepts transfer.

Done

  • Recognize the typical containerized-project layout.
  • Read entrypoint scripts.
  • Read CI workflows for build/push steps.
  • Identify likely improvement PRs.

Next: Your first contribution →

Comments