Saltar a contenido

Workshop - Build an MCP client + tool-use loop

DifficultyDeepTime75 min
Needs: Python 3.11+, anthropic + mcp packages, Workshop 1 server (or any MCP server), an Anthropic API key

Before you start:

Launch in KillercodaFree browser-based environment - no install required to follow along.

Companion to AI Systems -> Month 05 -> Week 17-18: Serving Systems, and the second in the AI implementations workshop series. In Workshop 1 you built the server side and watched Claude Desktop drive it. This workshop is the inverse: you build the client side - the program that loads an MCP server, lists its tools, hands them to Claude, and routes the tool-call requests back. By the end you'll have a working Python CLI that connects to any MCP server and lets Claude use its tools, and you'll have proved to yourself that Claude Desktop, Cursor, Cline, and Zed are all just slightly fancier versions of the ~80 lines you wrote.

~75 minutes. Needs: Python 3.11+, anthropic and mcp packages, the MCP server from Workshop 1 (or any other MCP server - the official filesystem server works), an Anthropic API key. No GPU.

What you'll build, and the idea it makes concrete

You'll build a command-line MCP client. It launches one or more MCP servers, fetches their tool catalogs, and starts a chat loop where every message is sent to Claude together with the available tools. When Claude responds with a tool-call, your client dispatches the call to the right server, gets the result, feeds it back, and loops until Claude is done. The whole program is one file and around 80 lines of real code.

The idea this makes concrete:

An "AI agent" is a while loop with structured output. Strip away the frameworks and the metaphors and what is actually happening at runtime is: (1) you call the LLM API with the user's message and a tool catalog, (2) the model responds with either a final answer OR a structured request to call one or more tools, (3) if tools were called you execute them and feed the results back to the model, (4) you loop until the model returns a final answer with no tool calls. That is the entire loop. Everything else - LangChain, LlamaIndex, the OpenAI Agents SDK, the Anthropic Agent SDK - is convenience packaging on top of this loop, often hiding it so completely that newcomers cannot see what it is actually doing.

A second idea, equally important:

The MCP protocol is symmetric. The server you built in Workshop 1 and the client you build here are peers speaking the same JSON-RPC vocabulary in opposite directions. The "client" is not subordinate to the "server" in any meaningful sense - they both implement the same protocol; "client" just names the side that initiates the connection. Once you have implemented both halves you have seen the whole protocol, and the implementation choices the official Python and TypeScript SDKs make stop being mysterious.

Step 0: the architecture you're about to assemble

+---------------------+         +----------------------+
|       USER          |         |   Anthropic API      |
| (you, at terminal)  |         | (Claude on a server) |
+---------+-----------+         +----------+-----------+
          | prompt                          ^
          v                                  |
+-------------------------------------------------+
|              YOUR MCP CLIENT (this workshop)    |
|                                                  |
|   1. messages.create(prompt, tools=...)         |
|   2. parse response: text? tool_calls?          |
|   3. if tool_calls: dispatch to MCP servers     |
|   4. messages.create(... results ...)           |
|   5. loop until response has no tool_calls      |
|   6. print final answer to user                 |
+-----------------+----------------+--------------+
                  |                 |
            JSON-RPC stdio    JSON-RPC stdio
                  |                 |
                  v                 v
        +------------------+ +------------------+
        |   MCP Server A   | |   MCP Server B   |
        |   (workshop 1)   | |  (filesystem)    |
        |   tools/call     | |  tools/call      |
        +------------------+ +------------------+

A few things this diagram makes explicit that a sloppy mental model gets wrong:

  • The model never talks to the MCP server. The model talks to your client (through the Anthropic API). Your client talks to the MCP server (through JSON-RPC). The model emits tool-call requests in Anthropic's tool-use format; your client translates each one into an MCP call_tool request and translates the result back into Anthropic's tool_result format. The bridge between the two is your responsibility.
  • One client, many servers. Real production clients connect to several MCP servers at once and merge their tool catalogs. Tool names need to be namespaced to avoid collisions (fs.read_file vs workshop.query_database), and the client routes each call to the right server based on the namespace.
  • The loop runs entirely inside your client. The Anthropic API is stateless - every request is independent. The conversation state (which tool calls have happened, what they returned) lives in your client's messages list. This is why "agent memory" is just "an in-memory list you decide what to keep."

Step 1: the absolute minimum client (no MCP yet)

Before we wire in MCP, build the tool-use loop against a single hard-coded tool so the loop is visible without protocol noise. Create client.py:

import json
import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

TOOLS = [{
    "name": "add",
    "description": "Add two integers.",
    "input_schema": {
        "type": "object",
        "properties": {
            "a": {"type": "integer"},
            "b": {"type": "integer"},
        },
        "required": ["a", "b"],
    },
}]


def call_tool(name: str, args: dict) -> str:
    if name == "add":
        return str(args["a"] + args["b"])
    raise ValueError(f"unknown tool: {name}")


def chat(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    while True:
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=TOOLS,
            messages=messages,
        )
        # Append the assistant turn verbatim (text + tool_use blocks).
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason != "tool_use":
            # No tool call - this is the final answer.
            text_blocks = [b.text for b in resp.content if b.type == "text"]
            return "\n".join(text_blocks)

        # The model wants to call one or more tools. Execute each and feed
        # the results back as a single user turn.
        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = call_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })
        messages.append({"role": "user", "content": tool_results})


if __name__ == "__main__":
    print(chat("What's 17 + 25?"))

Run it:

$ ANTHROPIC_API_KEY=sk-... python client.py
17 + 25 equals 42.

That is the entire agent loop in 35 lines. The model received the prompt, decided to call add(17, 25), your call_tool ran, the result "42" went back as a tool_result, the model wrote the natural-language answer, and the loop exited because stop_reason was no longer "tool_use".

Two things to internalize before we add MCP:

  • The conversation is a list of turns. Each turn is {"role": ..., "content": ...}. The assistant's turn can contain mixed text and tool-use blocks; the user's next turn carries the tool results as a list of tool_result content blocks. Anthropic's API requires this specific shape; OpenAI's is structurally similar with different field names.
  • The tool_use_id matters. When the model makes parallel tool calls (it can in one turn), each call has a unique id and each result must reference the id of the call it answers. Get this wrong and the model gets confused about which result belongs to which call. Your client must preserve and route ids.

Step 2: add a real MCP server connection

Now replace the hard-coded TOOLS and call_tool with a real MCP server. The official mcp package provides a stdio-transport client we can use. Install it (pip install mcp) and rewrite:

import asyncio
import os
from contextlib import AsyncExitStack
from anthropic import Anthropic
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

claude = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])


async def main():
    async with AsyncExitStack() as stack:
        # Launch the workshop server as a subprocess (stdio transport).
        params = StdioServerParameters(
            command=".venv/bin/python",
            args=["server.py"],  # the server from Workshop 1
        )
        read, write = await stack.enter_async_context(stdio_client(params))
        session: ClientSession = await stack.enter_async_context(ClientSession(read, write))
        await session.initialize()

        # Discover what the server offers.
        tools_resp = await session.list_tools()
        tools = [{
            "name": t.name,
            "description": t.description,
            "input_schema": t.inputSchema,
        } for t in tools_resp.tools]
        print(f"loaded {len(tools)} tools from MCP server: {[t['name'] for t in tools]}")

        # Run the agent loop.
        await chat("Who is our top customer by total spend, and what did they buy?",
                   session, tools)


async def chat(user_message: str, session: ClientSession, tools: list[dict]):
    messages = [{"role": "user", "content": user_message}]
    while True:
        resp = claude.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            tools=tools,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason != "tool_use":
            for block in resp.content:
                if block.type == "text":
                    print("\n>>>", block.text)
            return

        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                print(f"[tool call] {block.name}({block.input})")
                # Dispatch to MCP. This is the bridge between Anthropic's
                # tool-use format and MCP's call_tool RPC.
                result = await session.call_tool(block.name, block.input)
                content_text = "".join(
                    c.text for c in result.content if c.type == "text"
                )
                print(f"[tool result] {content_text[:200]}{'...' if len(content_text) > 200 else ''}")
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": content_text,
                    "is_error": result.isError,
                })
        messages.append({"role": "user", "content": tool_results})


if __name__ == "__main__":
    asyncio.run(main())

Run it (with the Workshop 1 server in the same directory):

$ python client.py
loaded 2 tools from MCP server: ['query_database', 'search_logs']
[tool call] query_database({'sql': 'SELECT u.email, SUM(o.total_cents) AS total_spent_cents
                                      FROM users u JOIN orders o ON o.user_id = u.id
                                      GROUP BY u.id ORDER BY total_spent_cents DESC LIMIT 1'})
[tool result] [{"email": "bob@example.com", "total_spent_cents": 9999}]
[tool call] query_database({'sql': "SELECT * FROM orders WHERE user_id = 2"})
[tool result] [{"id": 3, "user_id": 2, "total_cents": 9999, "created_at": "2026-05-26 14:45"}]

>>> Bob (bob@example.com) is your top customer by total spend, with $99.99 in
total. His single order (order #3 on 2026-05-26) was a $99.99 purchase.

You just wrote Claude Desktop in 80 lines. The bridge code - translating Anthropic's tool_use block into MCP's call_tool RPC and back - is six lines (session.call_tool(block.name, block.input) plus the response unwrapping). Everything else is the loop and the connection plumbing.

Step 3: connect to multiple MCP servers at once

The single-server case is rare in production. A real agent connects to several MCP servers (filesystem + database + your domain server + GitHub + ...) and merges their tools. Two design choices matter:

Namespacing. If two servers both expose a read tool, the model cannot tell them apart. The convention is to prefix tool names with the server identifier: fs.read, workshop.read. Your client adds the prefix when fetching the catalog and strips it when dispatching.

Routing. With namespaced names, dispatching is a dictionary lookup: server_name → ClientSession. Keep the prefix-to-session mapping alongside the merged tool list.

Refactor main() to load N servers:

SERVER_CONFIG = {
    "workshop": StdioServerParameters(command=".venv/bin/python", args=["server.py"]),
    "fs":       StdioServerParameters(command="npx",
                                       args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp/safe-zone"]),
}


async def main():
    async with AsyncExitStack() as stack:
        sessions: dict[str, ClientSession] = {}
        merged_tools: list[dict] = []
        for ns, params in SERVER_CONFIG.items():
            read, write = await stack.enter_async_context(stdio_client(params))
            session = await stack.enter_async_context(ClientSession(read, write))
            await session.initialize()
            sessions[ns] = session
            tools_resp = await session.list_tools()
            for t in tools_resp.tools:
                merged_tools.append({
                    "name": f"{ns}.{t.name}",
                    "description": t.description,
                    "input_schema": t.inputSchema,
                })

        print(f"loaded {len(merged_tools)} tools across {len(sessions)} servers")
        await chat(input("> "), sessions, merged_tools)

And update the dispatch in chat():

for block in resp.content:
    if block.type == "tool_use":
        ns, name = block.name.split(".", 1)
        result = await sessions[ns].call_tool(name, block.input)
        # ... same as before

That's it. Your client now bridges Claude to any combination of MCP servers, the same way Claude Desktop does. The filesystem server is the official @modelcontextprotocol/server-filesystem (you'll need node and npx installed); restrict it to a safe directory you don't mind Claude reading and writing.

Step 4: handle the things real clients handle

The 80-line client works for happy paths. Production clients handle a long tail of operational concerns. The minimum production-grade set:

  • Tool-call permission prompts. Before dispatching a tool with potentially destructive effect (any write, any network call, any file deletion), surface the proposed call to the user and require confirmation. The MCP spec defines a roots/list and sampling/createMessage flow that helps with this; the simplest implementation is to maintain an allowlist per session and prompt the user for anything not yet approved.

  • Concurrent tool calls. The model can return multiple tool_use blocks in a single turn. Run them in parallel with asyncio.gather rather than serially - this is a major latency win on agent traces that fan out.

  • Streaming. Replace messages.create(...) with messages.stream(...) so partial responses appear as the model types. The tool_use blocks come through complete (you can't act on a partial tool call), but text content streams. Workshop 8 is dedicated to streaming + mid-stream tool use; for now, blocking is fine.

  • Token budgets and conversation pruning. The messages list grows without bound. After ~20 turns you need a strategy: summarize older turns, drop them entirely, or use Anthropic's prompt caching to keep the cost flat. A naive client will quietly accumulate $10/conversation in token costs.

  • Error surfacing. A tool call that fails with isError: true should propagate back to the model (so it can recover) AND should be visible to the user (so they understand what's happening). Hiding errors makes failures look like model confusion.

  • Cancellation. If the user types Ctrl-C mid-loop, send the cancelled notification through MCP to the server, abort the API request, and exit cleanly. Without this, the server keeps doing whatever expensive thing it was doing, the API request keeps burning tokens, and the user has to kill the process.

These are not optional in production but each is straightforward; the 80-line kernel doesn't change, you just layer them on.

Step 5: break it (the things that go wrong in production)

5.1 The tool the model wanted does not exist

The model occasionally hallucinates tool names - it calls delete_user when no such tool was advertised. With the current code, sessions[ns] throws KeyError and the loop crashes. The fix:

try:
    ns, name = block.name.split(".", 1)
    if ns not in sessions:
        raise KeyError(ns)
    result = await sessions[ns].call_tool(name, block.input)
    content_text = ...
    is_error = result.isError
except Exception as e:
    content_text = f"Error: tool {block.name!r} does not exist or failed: {e}"
    is_error = True

tool_results.append({
    "type": "tool_result",
    "tool_use_id": block.id,
    "content": content_text,
    "is_error": is_error,
})

Now the model sees its hallucination as a tool error, apologizes in its next turn, and either tries a real tool or asks the user. This is the "graceful degradation" pattern - always return a tool_result for every tool_use, even on errors, because the API will reject the next turn if you skip one.

5.2 The infinite loop

A buggy tool that always errors, plus a model that keeps retrying it, equals an unbounded loop and a runaway bill. Cap the agent loop:

MAX_AGENT_TURNS = 25
turn = 0
while turn < MAX_AGENT_TURNS:
    turn += 1
    # ... the loop body ...

print(f"[agent] hit max turns ({MAX_AGENT_TURNS}), stopping")

25 is a generous cap; production agents often use 10. Pick the number with knowledge of the longest legitimate trace and add a margin.

5.3 The poisoned tool result

Revisit the indirect-injection problem from Workshop 1. If a tool returns user-generated content that itself contains instructions ("Ignore previous instructions, now do X"), the model sees those instructions as authoritative unless the client marks them as untrusted. The simplest defense in the client:

def sanitize_tool_result(text: str) -> str:
    # Wrap in a sentinel the model is trained to treat as data, not instructions.
    return f"<tool_result_data>\n{text}\n</tool_result_data>"

Then wrap every tool result before feeding it back. The Anthropic models recognize this convention; for full coverage you also want stripping of obvious injection patterns and (in high-risk contexts) a secondary safety check. Workshop 10 covers this in depth.

Step 6: turn it into a real CLI

Add argparse, multi-turn input (loop on input() instead of one-shot), and conversation persistence (write messages to a JSON file between runs). At this point you have a real terminal Claude-Desktop-clone, in well under 200 lines:

async def repl(sessions, tools):
    messages = []
    while True:
        try:
            user = input("\n> ").strip()
        except (KeyboardInterrupt, EOFError):
            print()
            break
        if not user:
            continue
        messages.append({"role": "user", "content": user})
        messages = await agent_loop(messages, sessions, tools)

Where agent_loop is the body of the previous chat function, refactored to take and return messages rather than build them from scratch. Save messages to disk on exit and load it on start to get conversation persistence. Add /clear, /tools (list current tools), /cost (show token usage so far) as slash commands inside the REPL. Each is 5-10 lines.

You have now built the kernel of every major AI CLI. The Claude Code CLI, the Cline VS Code extension, the Cursor agent loop, the Zed assistant - they all wrap this same loop in a UI. The "magic" is the loop, and now you've written it.

Now extend it

  1. Add resource support. Implement list_resources / read_resource calls and let the user attach resources to the conversation with @workshop://saved-queries syntax. Forward attached resource contents as part of the user message.
  2. Add prompt support. Implement list_prompts / get_prompt and expose them as slash commands. /summarize_table users fetches the prompt template from the server, fills in users, sends it to Claude.
  3. Add the OpenAI variant. OpenAI's tool-calling shape is similar but with different field names (tool_calls array, function.name, arguments as JSON string). Implement a second path so your CLI can use GPT-4o or o3-mini as the brain. Now you have a model-agnostic MCP client.
  4. Add sampling. MCP defines a "sampling" capability where the server can ask the client to invoke the LLM (useful for servers that want to summarize their own data before returning it). Implement the client side: when a server requests sampling, your client routes the request through Claude and returns the result.
  5. Publish it. This 200-line CLI is genuinely useful and few exist in the wild that are this minimal. Push to PyPI as your-name-mcp-cli.

What you might wonder

"Why isn't there a higher-level library doing all this?" There are several - Anthropic ships an Agent SDK, OpenAI ships an Agents SDK, plus LangChain, LlamaIndex, Pydantic AI, smolagents, and more. They all wrap exactly this loop, with varying amounts of opinion baked in (memory strategies, retry policies, observability hooks, multi-agent routing). Use them when the convenience is worth the abstraction tax. But build the kernel by hand once so you can read their source when they break, debug them, and choose between them on something more substantial than vibes.

"What's the relationship between the Anthropic tool_use format and OpenAI's tool_calls format?" Structurally identical, syntactically different. Both: model emits a list of structured tool-call requests with names and arguments; client executes them; client sends results back in a tool_result-equivalent. The differences are field naming and a few edge cases (OpenAI requires the full tool definition on every request and uses JSON-string arguments; Anthropic uses parsed objects and supports partial caching). Adapters between the two formats are 30 lines apiece.

"Where does multi-step planning come from in this model?" From the model's chain of thought, in the tool-use sequence it emits. The agent loop is reactive - it does whatever the model says to do next. The model's reasoning ("first I'll check the schema, then I'll query for the right table") is the planning. Some agent frameworks add explicit planning passes (the model first produces a plan, then executes step-by-step), which can improve traces on long tasks. Workshop 7 explores this with the supervisor pattern.

"Why does the tool result need to be returned with tool_use_id?" Because the model can call multiple tools in one turn and the API requires the next turn to match results to calls. If you return them in the wrong order or omit one, Anthropic's API returns 400 with "Missing tool_result for tool_use id X." The id is a contract.

"Can I use this client to test my MCP server without Claude in the loop?" Absolutely. Skip the Anthropic part and just call session.list_tools() and session.call_tool(...) directly. This is the simplest unit-test harness for MCP servers and is what the official mcp-cli does.

"What happens if I let Claude call destructive tools without confirmation?" Eventually, something destructive. Treat every destructive tool as a security boundary: either gate it behind a user confirmation in the client, or scope it tightly at the server (e.g., the server only allows writes within a specific directory or to a specific table). "Trust the docstring" is not a security model.

What this gave you

  • You built a complete MCP client (~80 lines for the kernel, ~200 for the production CLI version).
  • You wrote the agent loop yourself and now understand that "agents" are a while loop with structured output.
  • You bridged Anthropic's tool_use format to MCP's call_tool RPC and back - the same bridge every AI CLI implements.
  • You saw multi-server routing, the production handling matrix (errors, infinite loops, injection, cancellation), and the path to a publishable tool.
  • You can now read the Anthropic Agent SDK source, the OpenAI Agents SDK source, and any third-party agent framework, because they are all variations on the kernel you wrote.

Together with Workshop 1, you now have both halves of the MCP protocol: a server that exposes capabilities and a client that consumes them. From here the rest of the AI implementation workshops are variations on these two pieces.

Next: Workshop 3 - Build a Claude Skill from scratch, where you'll see a very different model for packaging AI capabilities and learn when to pick a Skill versus an MCP server.

Submit your build

When you finish this workshop, share what you built so others can see and learn from your work. Include:

  • Public repo with your MCP client code (single-server and multi-server variants)
  • Terminal log showing the agent loop firing - tool_use blocks, tool_result blocks, final answer
  • Demonstration of an error-recovery path (model hallucinated a tool, client returned an error, model recovered)
  • Short note (3 to 5 sentences) on what surprised you about the loop being only 35 lines

Submit your build  Request feedback on your output  Discuss this workshop

Browse the gallery  |  All discussions

Comments