Saltar a contenido

14 - Anatomy of a Small OSS Repo

What this session is

About 45 minutes. We're going to walk through the file layout of a real (small) Go open-source project, file by file, so you know what every common piece is for. The next page asks you to make a contribution; this page makes the project feel less like a maze.

We'll use the standard Go project layout as our template, because most projects you'll meet are close variations of it. There's no official spec - but the conventions are stable enough that you can predict where things live.

A typical small Go project, from the top

After you git clone a repo and cd into it, you'll usually see something like:

.
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
├── go.mod
├── go.sum
├── Makefile           (or justfile, or scripts/)
├── .github/
│   ├── workflows/     (GitHub Actions CI files)
│   └── ISSUE_TEMPLATE/
├── .gitignore
├── cmd/
│   └── mytool/
│       └── main.go
├── internal/
│   ├── foo/
│   │   ├── foo.go
│   │   └── foo_test.go
│   └── bar/
│       ├── bar.go
│       └── bar_test.go
├── pkg/               (sometimes; many projects skip this)
│   └── public/
│       └── public.go
├── docs/
│   ├── architecture.md
│   └── examples/
└── examples/
    └── basic/
        └── main.go

Not every project has all of these. Many have only a subset. The shape varies, but the roles of these files and folders are consistent.

What each piece is for

Root-level files

  • README.md - the project's homepage. Three things you want from it: one-line description, install instructions, smallest possible working example. If the README isn't useful, the project is incomplete.

  • LICENSE - the legal terms (MIT, Apache 2.0, BSD, GPL, etc.). You should know what license the project uses before contributing. For most contributions this is a formality (you agree by submitting a PR), but for some projects (like Apache foundation projects) you'll need to sign a Contributor License Agreement (CLA).

  • CONTRIBUTING.md - the most important file for you right now. Spells out how to propose changes, conventions to follow, branch naming, commit message style, how tests should look. Read it before doing anything.

  • CODE_OF_CONDUCT.md - community standards. Usually a copy of the Contributor Covenant. You don't have to memorize it; just know that "be respectful, no harassment" is the gist.

  • go.mod and go.sum - module definition and dependency checksums (page 11). Quick check: is this module's import path what you expected? What are the direct dependencies?

  • Makefile - a script of common commands. Run make help if there's a help target, or just open the file and read the targets. Common ones: make build, make test, make lint, make clean. These often run with project-specific flags you'd get wrong from memory.

  • .gitignore - files git should ignore. Mostly compiled binaries, IDE config, OS junk. You don't need to touch this.

.github/

GitHub-specific configuration:

  • workflows/ - CI pipelines (YAML files). One file per workflow. Each defines triggers (push, PR, schedule) and jobs (build, test, lint, deploy). Reading these tells you what the project considers "the green path" - exactly which commands your PR will be measured against.

  • ISSUE_TEMPLATE/ - templates for filing different kinds of issue (bug report, feature request). When you file an issue, GitHub picks the template based on what you click.

  • PULL_REQUEST_TEMPLATE.md - what GitHub pre-fills the PR description with. Usually includes a checklist ("tests pass", "docs updated", "no breaking changes"). Read it and follow it; reviewers expect every checkbox to be addressed.

  • CODEOWNERS - who automatically gets assigned to review PRs touching a file. Useful for understanding who will read your PR.

cmd/

By convention, cmd/<name>/main.go is the entry point for a runnable program called <name>. If a project produces multiple binaries (e.g., a server and a CLI client), each gets its own folder under cmd/.

A main.go here typically is short - it parses flags, sets up a config, calls into internal/ or pkg/ to do the real work, handles signals, exits. Long main.go files are a smell; the work lives elsewhere.

internal/

The magic folder. Go's compiler enforces: packages under internal/ can only be imported from within the same module. So internal/foo is private to this project; nobody else can import it. This is how libraries hide their implementation details from users.

Most of the actual code lives in internal/. It's where you'll spend most of your reading time. The subdirectories under internal/ are usually organized by responsibility (internal/server, internal/storage, internal/auth) - read the names to get a mental map.

pkg/

Public reusable packages - code that other projects can import. Many projects don't use pkg/ and put public code at the top level instead. There's no rule.

For a CLI tool with no public API, you may not see pkg/. For a library, the public API is at the top level or in pkg/<name>/.

docs/

Project documentation beyond the README. Architecture overviews, design decisions, runbooks. If you want to understand why the project is shaped the way it is, this is where to look.

examples/ or _examples/

Runnable example code showing how to use the project. Underscore prefix (_examples/) tells the Go tools to ignore it (so go build ./... doesn't try to compile examples). Read these - they show you the "official" way to use the project.

vendor/

A copy of the project's dependencies, committed into the repo. Common before Go modules; less common now. If a vendor/ exists, the project does vendored builds - you build against vendor/ instead of downloading dependencies fresh. Run go build -mod=vendor.

testdata/

By Go convention, any folder named testdata is ignored by the build system. Used to store input files for tests (sample JSON, fixture databases, etc.). You'll see this scattered through projects with significant tests.

A worked walkthrough: peterbourgon/ff

Let's apply the above to a real, small project: peterbourgon/ff, a flag and config library. Clone it:

git clone https://github.com/peterbourgon/ff ~/code/ff
cd ~/code/ff

Look at the top-level structure:

ls

You should see something close to:

README.md  LICENSE  go.mod  go.sum
ff.go      ff_test.go
parse.go   parse_test.go
testdata/
ffcli/     ffyaml/  fftest/  ffjson/  fftoml/  ...
.github/

Apply what you just learned:

  1. README.md - read it. What does ff do? (A flag and configuration parser.)
  2. go.mod - what's the module path? (github.com/peterbourgon/ff/v3.) Any dependencies? (Almost none - that's a quality signal.)
  3. No cmd/ - meaning this is a library, not a runnable program.
  4. No internal/ - meaning everything here is part of the public API or extensible.
  5. ff.go, parse.go - the core. Open them.
  6. *_test.go - tests right next to the code. Standard Go layout.
  7. testdata/ - fixtures for tests. Open one to see what kind of data the tests use.
  8. ffyaml/, ffjson/, fftoml/ - subpackages adding YAML/JSON/TOML config support. Each is independently importable.
  9. .github/workflows/ - what's in the CI? Open the workflow YAML. It probably runs go test ./... on several Go versions.

Five minutes later, you have a map. You haven't read the implementation; you don't need to. You know what's there.

The conventions in CONTRIBUTING.md

Open the CONTRIBUTING.md (if one exists) and look for:

  • Branch naming. Some projects expect fix/issue-123 or similar.
  • Commit message format. Some require Conventional Commits (feat: add X, fix: handle Y). Many don't care.
  • PR description. What sections are expected? (Summary, motivation, testing.)
  • Sign-off / DCO. Some projects require commits to be signed with git commit -s (adds Signed-off-by: ... to your commit). The Linux kernel and many CNCF projects require this.
  • Test expectations. Often: "all new code must have tests, and go test ./... must pass."
  • Linting. Often: golangci-lint run must be clean. Install it if mentioned.

These conventions are not annoying gatekeeping; they're what makes a project run smoothly with hundreds of contributors. Follow them; the maintainers will be relieved.

Exercise

Use the project you picked in page 13.

  1. Clone it locally.
  2. Walk the layout, file by file, applying the categories above. Write down where each piece lives.
  3. Read the CONTRIBUTING.md end to end. Note any unusual requirements.
  4. Open one CI workflow YAML in .github/workflows/. Identify: what commands does CI run? On what platforms? Against what Go versions?
  5. Run those CI commands locally (go test ./..., golangci-lint run, whatever the workflow does). Confirm they pass on a fresh clone.
  6. Open the issue you tentatively picked. Identify the three files most likely to be involved in the fix. (You don't have to be right - just guess based on file names and a quick grep.)

That's everything you need to make a change. The next page walks through actually doing it.

What you might wonder

"What if a project doesn't follow the standard layout?" Some don't. Read the README and any ARCHITECTURE.md; they'll explain the layout. If neither exists, fall back to: "follow main.go and see where it leads."

"What's the difference between pkg/foo/ and just foo/?" Convention only. pkg/foo/ was popular for a while; the Go team's official "Standards" page doesn't endorse it. Many high-profile projects (like cobra, viper) don't use pkg/.

"What's a 'go generate' file?" Sometimes Go projects use code generators (for protobuf bindings, mocks, embedded files). A line //go:generate ... near the top of a file declares a generation command. go generate ./... runs them all. Generated files usually have a // Code generated ... DO NOT EDIT. header - don't edit them; regenerate them.

"What if CI is breaking on main when I clone?" A red flag about the project's health. Either the project is in transition (a big refactor mid-flight) or maintainers aren't watching closely. Reconsider whether this is the right first project; if the bar to landing a PR includes "first I have to fix CI," that's too much for a first contribution.

Done

You can now: - Recognize the typical Go project layout. - Locate every common file/folder by role (cmd, internal, pkg, docs, etc.). - Read a CONTRIBUTING.md for conventions you'll need to follow. - Read CI workflows to know exactly what your PR will be measured against. - Make a confident guess at which files a given change will touch.

You're ready to actually do the thing.

Next: Your first contribution15-your-first-contribution.md

Comments