Skip to content

Workshop - Build a Claude Skill from scratch

DifficultyDeepTime60 min
Needs: Python 3.11+, Claude Code or Claude Desktop with Skills enabled

Before you start:

Launch in KillercodaFree browser-based environment - no install required to follow along.

Companion to AI Systems -> Month 05 -> Week 17-18: Serving Systems, and the third in the AI implementations workshop series. Workshops 1 and 2 covered MCP - the live-process protocol for connecting an LLM to your code. This workshop covers the other half of how you package capabilities for Claude in 2026: Skills, a static directory of Markdown plus scripts that Claude discovers and reads on demand. By the end you will have a working Skill that any Claude-Code or Claude-Desktop user can drop into their setup, and you will know exactly when to pick a Skill versus an MCP server - a real engineering decision in production AI work.

~60 minutes. Needs: Python 3.11+, a Claude Code installation (or Claude Desktop with Skills enabled), an Anthropic API key, optionally pandoc if you build the markdown-to-PDF Skill example. No GPU.

What you'll build, and the idea it makes concrete

You'll build a Claude Skill called csv-investigator that bundles together everything Claude needs to investigate a CSV file someone hands it - the instructions, the helper scripts, and the reference material. The Skill is a folder of files; you install it by copying the folder to a known location; Claude discovers it on startup; when a user's request matches the Skill's trigger conditions Claude reads the relevant parts on demand and uses the scripts.

The idea this makes concrete:

A Skill is not a runtime or a service. It is a packaged workflow - a folder containing one SKILL.md (the entry point), zero or more supporting markdown files, and zero or more scripts. Claude reads SKILL.md on startup to learn the Skill exists and what it does; the rest is loaded only when needed (progressive disclosure). Scripts are invoked as ordinary subprocesses. There is no server process, no JSON-RPC, no transport - just files on disk and a convention for how Claude reads them.

A second idea, equally important:

MCP and Skills are complementary, not competing. Use MCP when you need state (a database connection, an authenticated session, a cache, a watcher). Use a Skill when you need workflow knowledge (here's how to investigate a CSV, here's how to do code review on a Python project, here's how to write a release note). The MCP server is a process holding context; the Skill is a repository of instructions and helpers. Real production systems use both: one Skill that says "to debug a slow query, run the diagnose tool from the workshop MCP server, then check the slow-log file at..." composes the two layers.

Step 0: the architecture you're about to assemble

~/.config/claude/skills/         (or wherever your client looks)
  csv-investigator/
    SKILL.md                     <-- entry point, ALWAYS loaded by Claude
    instructions/
      profile.md                 <-- detail, loaded on demand
      common-issues.md           <-- detail, loaded on demand
      output-format.md           <-- detail, loaded on demand
    scripts/
      profile.py                 <-- helper, invoked as subprocess
      sample_rows.py             <-- helper, invoked as subprocess
    examples/
      example-output.md          <-- reference, loaded on demand

A few truths this layout encodes that distinguish Skills from MCP:

  • The entry point is SKILL.md. Claude reads it on startup, ingests the name, description, and trigger conditions, but does not read the rest of the folder unless the Skill activates. This is progressive disclosure - the same model that loads SKILL.md for ten Skills could not afford to load every detail file from every Skill.
  • Detail files are loaded on demand. When the Skill activates ("the user uploaded a CSV"), Claude knows about instructions/profile.md from the Skill's manifest and reads it then. This is a token-budget optimization: a deep Skill with thousands of lines of detail costs nothing unless used.
  • Scripts are subprocesses. Claude does not embed a Python interpreter or fetch the script source; it runs the script with whatever interpreter is on PATH and reads stdout/stderr. The script does not need to know it is being invoked by an AI; any script that has clean CLI semantics is reusable from a Skill.
  • There is no daemon. Compare to MCP, where every tools/call goes through a live process - in Skills, scripts are spawned fresh, do their work, and exit. State that needs to persist across invocations has to go somewhere external (a file, a database, the conversation itself).

Step 1: the absolute minimum Skill (just SKILL.md)

Create the directory:

$ mkdir -p ~/.config/claude/skills/csv-investigator
$ cd ~/.config/claude/skills/csv-investigator

The minimum is one file. Create SKILL.md:

---
name: csv-investigator
description: Profile and investigate a CSV file - column types, value distributions, anomalies, suggested questions to ask. Use when the user uploads a CSV or asks "what's in this file" about a spreadsheet/tabular dataset.
---

# CSV Investigator

When the user provides a CSV file and asks you to look at it, follow this routine:

1. Run `python scripts/profile.py <path>` to get column types, null counts, and value distributions.
2. Run `python scripts/sample_rows.py <path>` to get 10 representative rows.
3. Summarize in this order: what the file is about (one sentence), what each column means (one bullet each), what stands out (anomalies, skewed distributions, suspicious nulls), and what questions the data would be good for.

Output format: keep the summary under 300 words. Use a table for the column list. End with a "next steps" line suggesting one specific question to investigate.

That's it. Restart Claude Code (or whatever client supports Skills). Hand it a CSV file and ask "what's in this file?" - Claude reads SKILL.md, sees the trigger condition matches, runs the (yet-to-be-written) scripts, and produces the structured summary.

Three things to notice about that SKILL.md:

  • The YAML front-matter is what Claude indexes. The name and description are what Claude sees in the skill catalog on startup. The description has to clearly state when to activate the Skill - this is the trigger condition, and it works the same way tool docstrings work in Workshop 1. "Use when..." is the magic phrase.
  • The body is the playbook. Written for Claude to follow, not for a human to read. Numbered steps, explicit script invocations, exact output format. Treat Claude like a competent intern who just joined the team: tell them exactly what to do, in order, with no ambiguity.
  • There is no code. The Skill itself is documentation. The code (the scripts) lives separately and is invoked by the documentation.

Step 2: add the scripts

Now the helpers. Create scripts/profile.py:

#!/usr/bin/env python3
"""Profile a CSV: column types, null counts, basic distribution stats.

Usage: python profile.py <path.csv>
Output: JSON to stdout.
"""
import csv
import json
import sys
from collections import Counter, defaultdict


def infer_type(value: str) -> str:
    v = value.strip()
    if v == "":
        return "null"
    try:
        int(v)
        return "int"
    except ValueError:
        pass
    try:
        float(v)
        return "float"
    except ValueError:
        pass
    if v.lower() in ("true", "false"):
        return "bool"
    return "string"


def main():
    path = sys.argv[1]
    with open(path, newline="") as f:
        reader = csv.DictReader(f)
        rows = list(reader)
    if not rows:
        print(json.dumps({"error": "empty file"}))
        sys.exit(1)

    columns = reader.fieldnames or []
    profile = {
        "row_count": len(rows),
        "column_count": len(columns),
        "columns": [],
    }
    for col in columns:
        values = [row.get(col, "") for row in rows]
        types = Counter(infer_type(v) for v in values)
        non_null = [v for v in values if v.strip() != ""]
        col_profile = {
            "name": col,
            "type": types.most_common(1)[0][0],
            "type_breakdown": dict(types),
            "null_count": len(values) - len(non_null),
            "unique_count": len(set(non_null)),
            "samples": list(set(non_null))[:5],
        }
        profile["columns"].append(col_profile)
    print(json.dumps(profile, indent=2))


if __name__ == "__main__":
    main()

And scripts/sample_rows.py:

#!/usr/bin/env python3
"""Print 10 representative rows from a CSV: first 3, last 3, and 4 random middle rows.

Usage: python sample_rows.py <path.csv>
Output: JSON to stdout.
"""
import csv
import json
import random
import sys


def main():
    path = sys.argv[1]
    with open(path, newline="") as f:
        rows = list(csv.DictReader(f))
    if len(rows) <= 10:
        sample = rows
    else:
        middle_indices = random.sample(range(3, len(rows) - 3), k=4)
        sample = rows[:3] + [rows[i] for i in sorted(middle_indices)] + rows[-3:]
    print(json.dumps(sample, indent=2))


if __name__ == "__main__":
    main()

Make them executable: chmod +x scripts/*.py. Test from the command line:

$ python scripts/profile.py some_data.csv | head -20
{
  "row_count": 1247,
  "column_count": 8,
  "columns": [
    {
      "name": "user_id",
      "type": "int",
      "type_breakdown": {"int": 1247},
      "null_count": 0,
      "unique_count": 1247,
      "samples": ["8421", "1003", "5567", "2299", "9012"]
    },
    ...

Test from Claude: drop a CSV in the conversation, ask "what's in this file?" Claude should: read SKILL.md, run profile.py, run sample_rows.py, summarize in the structured format you specified.

Step 3: add progressive disclosure with detail files

The SKILL.md you wrote is fine for a simple Skill. For a real one, you have more detail you want available but not always loaded. Move the detailed instructions into separate files and reference them from SKILL.md:

---
name: csv-investigator
description: Profile and investigate a CSV file - column types, value distributions, anomalies, suggested questions. Use when the user uploads a CSV or asks about a spreadsheet/tabular dataset.
---

# CSV Investigator

Routine (high-level):

1. Run `python scripts/profile.py <path>` and `python scripts/sample_rows.py <path>`.
2. Read `instructions/output-format.md` to see the exact summary structure.
3. If the profile shows anomalies (high null rate in a single column, mixed types, outliers), read `instructions/common-issues.md` for how to investigate each.
4. Produce the summary.

See also:
- `instructions/profile.md` - what each field of profile.py's output means
- `examples/example-output.md` - a reference of what a good summary looks like

Now create the detail files. instructions/output-format.md:

# Output format

A CSV summary must contain, in this order:

1. **One-sentence summary** - what this file is about. Read the column names and a sample.
2. **Column table** - markdown table with columns: name, type, null %, sample values.
3. **What stands out** - 1-3 bullets calling out anomalies, skewed distributions, suspicious nulls, or non-obvious column meanings.
4. **Suggested questions** - 1 specific question this data could answer well.

Total length: under 300 words. Be specific (cite column names and numbers); avoid vague phrases like "looks interesting."

instructions/common-issues.md:

# Common CSV anomalies and how to investigate them

## A single column has >50% nulls

Likely either: (a) the column is genuinely sparse (e.g., optional fields), (b) the column was added later and historical rows lack it, or (c) the CSV export bug stripped values. Sample the non-null rows and check the date range to distinguish.

## Mixed types in one column

Indicator: profile.py's `type_breakdown` shows two or more types per column.
Common causes: header row leaked into data (the type "string" appears alongside int), placeholder strings like "N/A" or "-" in numeric columns, dates in inconsistent formats.

## Highly skewed distribution

Indicator: a small number of values appear in most rows.
Often a categorical column with imbalance; not always a bug. Mention it in the summary so the user sees the skew.

## Identical row count to another column's null count

Indicator: two columns have the same null_count.
Often means those columns are paired (one missing implies the other is missing). Flag the dependency.

examples/example-output.md:

# Example output - a good CSV summary

> **What it is:** Daily customer order data for 2026 Q1, one row per order.
>
> | Column | Type | Null % | Samples |
> |--------|------|--------|---------|
> | order_id | int | 0% | 8421, 1003, 5567 |
> | user_id | int | 0% | 102, 305, 89 |
> | created_at | string | 0% | 2026-01-15, 2026-01-15 |
> | total_cents | int | 0% | 4999, 1500, 9999 |
> | discount_code | string | 78% | NEWUSER, SUMMER, "" |
>
> **What stands out:**
> - `discount_code` is 78% null - this looks correct (most orders have no discount) but check whether your business expects more usage.
> - `created_at` is stored as a string, not a date. Parse before sorting.
>
> **Suggested next question:** What's the average order value among users who used NEWUSER vs. no code?

When the user asks Claude to investigate a CSV, Claude reads SKILL.md (it always knows that one), runs the scripts, and then reads the relevant detail files based on what the scripts returned. If the profile shows a single mixed-type column, Claude opens common-issues.md's "Mixed types in one column" section. If the profile is clean, Claude jumps to output-format.md directly. The detail files cost nothing in context until needed.

This is the progressive-disclosure pattern in action. A Skill with 20 KB of detail files loads SKILL.md (~500 tokens) on startup, and pulls in detail only when justified by what the work has revealed.

Step 4: test it end-to-end with Claude

Install the Skill (it's already in place if you used ~/.config/claude/skills/csv-investigator). Restart Claude Code. Run a session:

$ claude
> I have a CSV at ~/Downloads/orders.csv. What's in it?

[Claude reads SKILL.md, sees the trigger condition matches]
[Claude runs: python ~/.config/claude/skills/csv-investigator/scripts/profile.py ~/Downloads/orders.csv]
[Claude runs: python ~/.config/claude/skills/csv-investigator/scripts/sample_rows.py ~/Downloads/orders.csv]
[Claude reads instructions/output-format.md to confirm structure]
[Claude reads instructions/common-issues.md because profile showed an anomaly]

**What it is:** Daily customer order data...
[full structured summary follows]

You can see Claude's tool invocations in the trace (Claude Code displays them). The Skill's logic is entirely transparent - every script invocation is visible, every detail-file read is logged, and the model's choice to read one file or another is traceable to what the previous step returned.

Step 5: when to pick a Skill vs an MCP server (the real engineering question)

This is the bit you read this workshop for. The same capability can often be packaged either way; the right choice depends on the operational shape.

Pick a Skill when:

  • The work is stateless. Each invocation does its job and exits. Profile a CSV, render a markdown to PDF, format a code block, run a linter.
  • The capability is a workflow, not just an action. "Here is the multi-step routine for X" is what SKILL.md is for.
  • You want distribution by file copy. Skills are folders; you can ship one as a git repo, a tarball, or a clipboard paste. No server to deploy.
  • The scripts need no shared state across invocations.
  • You expect the workflow's prose (the "how to investigate," the "what to do when X happens") to be the bulk of the value, and the scripts to be the smaller part.

Pick an MCP server when:

  • The work needs persistent state. A database connection pool, a cache, an authenticated session, a live data feed.
  • You want a single source of truth with many clients pointing at it. A team-wide Postgres MCP server, an organization-wide GitHub MCP server.
  • The tools are fundamentally services rather than scripts: "query this database," "search this Slack workspace," "list these Kubernetes pods."
  • You need authentication and authorization scoped per-user (the MCP spec defines OAuth 2.1 for this).
  • You want structured tool catalogs that the model can introspect (JSON Schema for every parameter).

Real-world rule of thumb: if your capability would naturally be a CLI tool you'd write anyway (my-csv-profile orders.csv), package it as a Skill. If it would naturally be a service you'd deploy (my-team-postgres-tool running on a server), package it as an MCP server. Many production setups use both: the MCP server provides primitives, and a Skill provides the workflow that calls them. The CSV investigator we built could just as easily reach out to an MCP server's "query_database" tool for the data instead of (or in addition to) reading from a file.

Step 6: break it (the failure modes)

6.1 The over-eager trigger

Write description: "Use this Skill for any data analysis task." and Claude will activate the Skill for every spreadsheet, dataframe, JSON file, and database question - flooding context with SKILL.md reads for irrelevant work. Triggers should be specific. "Use when the user uploads a CSV or asks 'what's in this file' about a spreadsheet/tabular dataset" is the right kind of precision. Re-read your descriptions like you re-read tool docstrings - they directly control selection accuracy.

6.2 The unloadable detail file

Reference instructions/missing-file.md from SKILL.md and the Skill silently underperforms - Claude tries to read the file, the read fails, the rest of the workflow proceeds without that knowledge. Always test the full graph of references; broken references degrade quietly.

6.3 The destructive script

Skill scripts execute on the user's machine with the user's permissions. A rm -rf in a Skill script will delete files; an ssh will reach networks; a curl | bash will install software. Skill scripts are a security boundary, and unlike MCP tools they do not currently have a per-tool permission prompt in most clients. Defenses:

  • Keep scripts in their own directory; never chmod +x something you have not read.
  • Skills installed from a third party should be treated like an unsigned shell script.
  • Where possible, write scripts that are read-only (profile, format, lint) and require explicit invocation for anything destructive.

6.4 The token bomb

A Skill that bundles a 100,000-line reference document in instructions/ is a token bomb waiting to fire. If Claude decides to read the file (the description was too eager), you have just spent a sizable chunk of the context window. Keep individual detail files small (~500-1500 tokens each) and structure the Skill so multiple small files are read selectively, rather than one giant one.

Step 7: distribute and version the Skill

A Skill folder is a perfectly good git repo. Put yours on GitHub as csv-investigator-skill. Users install with:

$ git clone https://github.com/you/csv-investigator-skill ~/.config/claude/skills/csv-investigator

Add a VERSION file. Add a README.md that explains to humans what the Skill does and how to install it (distinct from SKILL.md, which is written for Claude). Tag releases. If the Skill includes scripts that need dependencies, ship a requirements.txt in the folder and have the user install before first use.

Anthropic maintains a public Skills repository alongside the MCP registry. Publishing yours there makes it discoverable to anyone with Claude Code or compatible clients.

Now extend it

  1. Add a real-data Skill. Build one for your team's actual workflow: "investigate a flame graph," "review a Terraform plan," "audit a Dockerfile for size and security." The combination of script + playbook is your moat over a base-model prompt.
  2. Combine with the Workshop 1 MCP server. Have the Skill's playbook say "to look up the schema, call the query_database tool from the workshop server." Now the Skill composes the MCP server. This is the production pattern.
  3. Add streaming output. Long-running Skill scripts should write JSONL to stdout (one record per line) so Claude can react to partial results. The CSV-profile case is small enough not to need this; a "audit-this-codebase" Skill would.
  4. Build a Skill that writes Skills. Meta, but useful: a Skill that takes a CLI tool and generates a SKILL.md plus initial scripts to wrap it. The "Skill for skilling Skills."
  5. Audit your installed Skills. Run cat ~/.config/claude/skills/*/SKILL.md and read the descriptions. Are the triggers specific? Are the script paths absolute and correct? Are there any you forgot you installed?

What you might wonder

"Can a Skill use Python packages that aren't installed system-wide?" Yes - have the script point at a venv: #!/usr/bin/env -S /path/to/venv/bin/python as the shebang, or have the script subprocess.run into the venv itself. Alternatively, ship a setup.sh the user runs once to create the venv.

"How is this different from just putting instructions in CLAUDE.md?" CLAUDE.md files are always-loaded for every conversation in their directory. Skills are only loaded when their trigger matches. For a workflow used 1% of the time, a Skill keeps it out of the context budget the other 99%. For a workflow used every conversation, CLAUDE.md is right.

"Can a Skill call an MCP server?" Yes - the SKILL.md can instruct Claude to "use the workshop.query_database tool from the workshop server" as one of its steps. This is the composition pattern from §5. Skills are the playbook layer; MCP servers are the capability layer; they layer cleanly.

"What about Cursor / Cline / Zed - do they support Skills?" As of 2026 Skills are an Anthropic-specific convention. Other clients have analogues (Cursor has Rules, Cline has Custom Instructions) but the file layout is Anthropic's. The portable layer is MCP, which all of them speak. If you want one capability to work across clients, MCP; if you want a rich Claude-specific workflow, Skill.

"How do I test a Skill without running Claude every time?" Two paths. (1) Read SKILL.md and check the script invocations work standalone (python scripts/profile.py test-data.csv). (2) Use Claude Code with a fixture: a known-good CSV, a known-good question, and grep the output for the expected structure. The second is a real regression test you can run in CI - it costs API tokens but catches Skill drift.

"My Skill needs a secret (API key, etc.) - where does it go?" Environment variables, read by the script at invocation time. Never hardcode secrets in SKILL.md or scripts; never check them into the Skill repo. The user sets export OPENAI_API_KEY=... once in their shell and your scripts pick it up.

What this gave you

  • You built a real Claude Skill (entry point, scripts, detail files, example output) and watched Claude use it end-to-end.
  • You understand progressive disclosure - the entry-point file is always loaded, detail files only on demand - and why that matters for the token budget.
  • You can articulate the Skill-vs-MCP decision with concrete criteria: stateless vs stateful, workflow vs service, file-copy distribution vs deployed service.
  • You saw the four failure modes: over-eager triggers, broken references, destructive scripts as a security boundary, and the token-bomb risk.
  • You can read any third-party Skill, evaluate whether to install it, and extend it.

The trilogy is now complete: MCP server (Workshop 1) is how you expose live capabilities, MCP client (Workshop 2) is how you build something that consumes them, and Skill (this workshop) is how you package workflow knowledge. Real production AI systems combine all three.

Next: Workshop 4 - Build a tool-use loop without a framework, where we strip every layer of abstraction down to its bare minimum and prove that an "agent" is a ~50-line while loop with structured output.

Submit your build

When you finish this workshop, share what you built so others can see and learn from your work. Include:

  • Public repo of your Skill (SKILL.md + scripts + detail files)
  • Screenshot of Claude using your Skill end-to-end (with the tool-invocation trace visible)
  • One paragraph stating the Skill-vs-MCP choice for a real capability on your team and why
  • Short note (3 to 5 sentences) on what surprised you about progressive disclosure

Submit your build  Request feedback on your output  Discuss this workshop

Browse the gallery  |  All discussions

Comments