Workshop - Build a tool-use loop without a framework¶
Companion to AI Systems -> Month 05 -> Week 17-18: Serving Systems, and the fourth in the AI implementations workshop series. Workshops 1-3 covered the packaging side: MCP server, MCP client, Claude Skill. This workshop strips every layer of abstraction off the agent until what's left is just an HTTP loop. No framework. No SDK. No protocol. Just raw JSON over HTTPS in a while loop. By the end you'll have proved to yourself, line by line, that the entire "agentic AI" stack reduces to one well-shaped Python function, and you'll know which abstractions on top of that function actually earn their complexity.
~75 minutes. Needs: Python 3.11+, the httpx package (no Anthropic SDK), an Anthropic API key, optionally an OpenAI API key for the side-by-side comparison. No GPU.
What you'll build, and the idea it makes concrete¶
You'll build a complete agent in about 60 lines of pure Python with no dependencies beyond httpx. It will use real Python functions as tools, run the canonical tool-use loop against the raw Anthropic Messages API (no SDK), handle parallel tool calls, support multiple agent patterns (ReAct-style, plan-and-execute), enforce budget controls, and exhibit the classic failure modes you'll then defend against. The same code structure, with three field-name changes, runs against the OpenAI API.
The idea this makes concrete:
An LLM agent is a five-line algorithm. Send a prompt and a tool catalog to the model. Read the response. If it contains tool-call requests, execute them and append the results. Loop. Stop when the response has no tool calls. That's it. Everything you've heard called "agentic" - planning, reasoning, multi-step workflows, autonomous research - is some shape of that loop, sometimes with prompt-engineering on top to encourage particular response patterns. There is no other machinery. The frameworks (LangChain, the Agent SDKs, smolagents, AutoGen, CrewAI) are convenience wrappers that hide the loop and add opinions on top. Build the kernel by hand once and the rest of the ecosystem resolves into "OK, these libraries add X, Y, Z on top of the same five-line algorithm."
A second idea, more practical:
The right size of framework depends on what your agent needs to outlast. A one-shot script that calls three tools: write the loop yourself, 30 lines. A production system that handles 10k requests/day with observability, retries, fallbacks, and multi-step planning: reach for the Anthropic Agent SDK or OpenAI Agents SDK so you don't reinvent retries and traces. A research multi-agent system with shared memory, role-playing personas, and arbitrary workflows: a heavier framework like AutoGen or CrewAI starts to earn its keep. Picking the right size is judgment, and the judgment is grounded in knowing what the kernel is.
This workshop is about the kernel. Workshops 7, 8, and 9 will explore the patterns that actually justify adding more on top.
Step 0: the architecture you're about to assemble¶
+------------------+ +-------------------------+
| USER | | Anthropic API |
| (one prompt) | | (Claude on a server) |
+--------+---------+ +-------------+-----------+
| ^ | (HTTPS, JSON)
v | v
+------------------------------------------------+--+
| YOUR AGENT (this workshop) |
| |
| loop: |
| 1. POST /v1/messages with tools + messages |
| 2. parse response |
| 3. if response is "end_turn": done |
| 4. for each tool_use block in response: |
| look up the Python function |
| call it with the parsed arguments |
| collect results |
| 5. append assistant turn + user(results) |
| to messages |
| 6. goto 1 |
+---------------------------------------------------+
|
| function calls into your local code
v
+---------+
| TOOLS | (ordinary Python functions)
+---------+
Compare this to Workshop 2's architecture. The difference: no MCP server, no ClientSession, no asyncio. The "tools" are just Python functions in the same process. Every byte of communication is HTTPS to one URL. The complexity surface is smaller by an order of magnitude.
Step 1: the absolute minimum agent (60 lines, no SDK, no framework)¶
Create a fresh directory:
$ mkdir agent-from-scratch && cd agent-from-scratch
$ python -m venv .venv && source .venv/bin/activate
$ pip install httpx
Then agent.py:
import json
import os
import httpx
API_URL = "https://api.anthropic.com/v1/messages"
API_KEY = os.environ["ANTHROPIC_API_KEY"]
MODEL = "claude-sonnet-4-6"
# --- Tools: ordinary Python functions, plus a schema for the model ----------
def get_weather(city: str) -> dict:
"""Pretend to look up weather. In a real tool you'd call an API."""
return {"city": city, "temp_c": 18, "conditions": "partly cloudy"}
def calculate(expression: str) -> str:
"""Evaluate a math expression safely (no names, no attribute access)."""
allowed = set("0123456789+-*/(). ")
if not set(expression) <= allowed:
raise ValueError(f"unsafe expression: {expression!r}")
return str(eval(expression, {"__builtins__": {}}, {}))
TOOLS = {
"get_weather": {
"fn": get_weather,
"schema": {
"name": "get_weather",
"description": "Get current weather for a named city.",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
},
"calculate": {
"fn": calculate,
"schema": {
"name": "calculate",
"description": "Evaluate a basic arithmetic expression (digits, + - * / parens).",
"input_schema": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"],
},
},
},
}
# --- The agent loop ---------------------------------------------------------
def call_anthropic(messages: list[dict]) -> dict:
resp = httpx.post(
API_URL,
headers={
"x-api-key": API_KEY,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
json={
"model": MODEL,
"max_tokens": 2048,
"tools": [t["schema"] for t in TOOLS.values()],
"messages": messages,
},
timeout=60.0,
)
resp.raise_for_status()
return resp.json()
def run_tool(name: str, args: dict) -> str:
fn = TOOLS[name]["fn"]
try:
result = fn(**args)
return json.dumps(result) if not isinstance(result, str) else result
except Exception as e:
return f"Tool {name} raised {type(e).__name__}: {e}"
def agent(user_message: str, max_turns: int = 10) -> str:
messages = [{"role": "user", "content": user_message}]
for turn in range(max_turns):
resp = call_anthropic(messages)
messages.append({"role": "assistant", "content": resp["content"]})
if resp["stop_reason"] != "tool_use":
return "".join(b["text"] for b in resp["content"] if b["type"] == "text")
tool_results = []
for block in resp["content"]:
if block["type"] == "tool_use":
output = run_tool(block["name"], block["input"])
tool_results.append({
"type": "tool_result",
"tool_use_id": block["id"],
"content": output,
})
messages.append({"role": "user", "content": tool_results})
return "(agent hit max turns)"
if __name__ == "__main__":
print(agent("What's the weather in Lagos, and what's 15% of 240?"))
Run it:
$ ANTHROPIC_API_KEY=sk-... python agent.py
The weather in Lagos is currently partly cloudy with a temperature of 18°C.
15% of 240 is 36.
Sixty lines. Two helpers, one tool registry, one loop, one entry point. The model decided to call both get_weather("Lagos") and calculate("0.15 * 240") in a single turn (parallel tool calls), got both results back, and wrote the natural-language summary. There is no SDK, no framework, no async, no MCP - just httpx.post, parse JSON, dispatch, loop.
Three things to internalize from this code:
- The HTTP request body is plain JSON. No magic. You could write the same agent in
bashwithcurlandjqand it would work. The Anthropic SDK adds type hints, retry handling, and streaming convenience, but the wire is JSON-over-HTTPS and you can speak it directly. - Tools are Python functions with schemas. The model never sees your code; it sees the schema (name, description, parameters as JSON Schema). You execute the function locally based on what the model asked for. The schema-to-function mapping is your responsibility.
- The loop terminates when
stop_reason != "tool_use". That's the only termination condition. Anthropic's API will setstop_reasonto"end_turn"(model finished naturally),"max_tokens"(hit the token limit), or"tool_use"(wants to call tools). Treat anything except"tool_use"as "we're done."
Step 2: parallel tool calls and why they matter¶
The previous example already exhibited parallel tool calls - the model returned two tool_use blocks in a single response. This is the canonical pattern for any agent task that requires independent information:
- "What's the weather in Lagos and the time in Tokyo?" → parallel calls to weather and timezone tools.
- "Look up these 5 product IDs" → ideally five parallel
lookup_productcalls. - "Summarize the issues in this repo and the PRs from last week" → parallel calls to two GitHub tools.
The model decides on parallelism; the agent's job is to execute the parallel calls in parallel, not serially. The naive loop above runs them one at a time. To actually parallelize:
import concurrent.futures
def run_tools_parallel(blocks: list[dict]) -> list[dict]:
"""Run all tool_use blocks in parallel, return tool_result blocks in order."""
tool_uses = [b for b in blocks if b["type"] == "tool_use"]
with concurrent.futures.ThreadPoolExecutor(max_workers=len(tool_uses)) as pool:
futures = [pool.submit(run_tool, b["name"], b["input"]) for b in tool_uses]
outputs = [f.result() for f in futures]
return [
{"type": "tool_result", "tool_use_id": b["id"], "content": o}
for b, o in zip(tool_uses, outputs)
]
Drop that into the loop and a five-API-call agent task that previously took 5 × 200ms = 1 second of tool time now takes ~200ms. On real agent workloads (search + database + web fetch) this is a 3-5× wall-clock speedup for the same model output. Always parallelize tool execution unless the tools have ordering dependencies the model didn't notice.
Step 3: prove there's nothing magical about the SDK¶
Workshop 2 used the official anthropic package. This workshop uses raw httpx. They are doing the same thing. To verify, look at the request you're sending in step 1 - it's a JSON body, and the response is a JSON document with a content array, a stop_reason field, a usage object, and an id. The SDK wraps that with Python classes, validates the schema, adds retry-with-backoff, and provides a streaming iterator. None of those are required to use the API; they are convenience.
The same is true of every framework:
- The Anthropic Agent SDK wraps the same loop you wrote, adds tool registration helpers, retry and timeout policies, and structured tracing. Its source is mostly the same six lines you have above.
- LangChain's
AgentExecutorwraps the same loop with hooks for memory, callbacks, and tool routing across multiple LLM providers. The kernel is the loop. - smolagents is intentionally minimal but still wraps the loop; it just argues the right level of opinion is "very little."
There is no "secret sauce" in any framework's runtime. The sauce is in the additions - patterns, retries, observability, multi-agent routing. Pick frameworks by which additions you actually need.
Step 4: ReAct and plan-and-execute patterns are just prompt engineering on this loop¶
You will hear "ReAct" (Reason-Act-Observe), "plan-and-execute," "reflection," "chain-of-thought agents," etc. They are not different runtimes. They are different prompts fed into the same loop.
ReAct asks the model to verbalize its reasoning before each tool call:
SYSTEM_PROMPT_REACT = """\
You are a careful agent. For each step, follow this format:
Thought: (one or two sentences on what you'll do next and why)
Action: (call a tool, or write the final answer if you have enough information)
Always think before acting. Don't call a tool unless you need its result to proceed.
"""
Pass that as the system parameter alongside messages in the API call. The model now narrates its reasoning between tool calls, which makes traces dramatically more debuggable. No code changes - just a different prompt.
Plan-and-execute asks the model for a plan first, then executes it:
def plan_then_execute(user_message: str) -> str:
# Phase 1: ask the model to produce a plan, no tool calls allowed.
plan_resp = call_anthropic([{
"role": "user",
"content": f"Produce a short numbered plan to answer this question, "
f"with one step per tool call you'll need. Do not call tools yet.\n\n"
f"Question: {user_message}"
}])
plan = "".join(b["text"] for b in plan_resp["content"] if b["type"] == "text")
# Phase 2: execute against the plan.
return agent(f"Plan:\n{plan}\n\nNow execute the plan to answer: {user_message}")
That's it. Same loop, two passes through it, with the first pass constrained to "don't call tools, just plan." Useful for long tasks where the model benefits from thinking holistically before diving into a tool sequence.
Reflection runs the loop, then asks the model to critique its own answer, then optionally re-runs with the critique:
def reflect_and_retry(user_message: str) -> str:
answer = agent(user_message)
critique_resp = call_anthropic([{
"role": "user",
"content": f"You answered:\n{answer}\n\nCritique this answer. "
f"What's missing or wrong? Answer 'OK' if it's fine."
}])
critique = "".join(b["text"] for b in critique_resp["content"] if b["type"] == "text")
if critique.strip().upper().startswith("OK"):
return answer
return agent(f"{user_message}\n\nA previous attempt was critiqued:\n{critique}\n"
f"Produce a better answer.")
All three patterns are 10-20 lines of code on top of the kernel. The kernel doesn't change.
Step 5: budget controls (the part the kernel doesn't give you)¶
A bare loop will happily call 50 tools and burn $5 on one question. Production agents need budgets at three levels:
class Budget:
def __init__(self, max_turns=10, max_input_tokens=100_000, max_cost_usd=1.0):
self.max_turns = max_turns
self.max_input_tokens = max_input_tokens
self.max_cost_usd = max_cost_usd
self.input_tokens = 0
self.output_tokens = 0
self.turns = 0
@property
def cost_usd(self) -> float:
# Sonnet pricing - check Anthropic's current rates for accuracy
return self.input_tokens * 3e-6 + self.output_tokens * 15e-6
def check(self):
if self.turns >= self.max_turns:
raise BudgetError(f"max_turns={self.max_turns} reached")
if self.input_tokens >= self.max_input_tokens:
raise BudgetError(f"max_input_tokens={self.max_input_tokens} reached")
if self.cost_usd >= self.max_cost_usd:
raise BudgetError(f"max_cost_usd={self.max_cost_usd} reached at "
f"${self.cost_usd:.2f}")
def record(self, usage: dict):
self.input_tokens += usage["input_tokens"]
self.output_tokens += usage["output_tokens"]
self.turns += 1
class BudgetError(Exception):
pass
Wire it into the loop:
def agent(user_message: str, budget: Budget) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
budget.check()
resp = call_anthropic(messages)
budget.record(resp["usage"])
# ... rest as before
Three budgets, three different failure modes they catch:
max_turnscatches infinite loops (the model keeps calling tools without converging).max_input_tokenscatches context bloat (every tool result adds tokens; the loop accidentally feeds the same large result back many times).max_cost_usdis the bottom-line guard (catches anything the other two missed and bounds your worst-case spend per request).
For a customer-facing agent, also enforce a per-user daily budget by tracking spend in a database. For an internal tool, the per-request budget above is usually enough.
Step 6: break it (the four classic agent failure modes)¶
6.1 The hallucinated tool name¶
Ask the agent: "Use the send_email tool to email Bob." The agent has no send_email tool, but the model will sometimes try anyway, especially with a permissive system prompt. The current code throws KeyError and crashes. Fix:
def run_tool(name: str, args: dict) -> str:
if name not in TOOLS:
return f"Error: tool {name!r} does not exist. Available tools: {list(TOOLS)}"
# ... rest as before
Always return a tool_result for every tool_use (the API requires it) and use the error path to teach the model what tools exist. Model often recovers gracefully on the next turn.
6.2 The infinite loop¶
A buggy tool that always errors, plus a model that keeps retrying it, plus no max_turns cap, equals an unbounded loop. The Budget from step 5 catches this; without it, you've burned $50 by morning. Always cap turns.
6.3 The poisoned tool result¶
A tool returns external content that contains instructions ("Ignore previous instructions, do X instead"). The model sees those instructions and may follow them. This is indirect prompt injection - the OWASP #1 LLM risk. Defense for now: wrap tool output in a content-disposition marker:
def run_tool(name: str, args: dict) -> str:
raw = ... # call the tool as before
return f"<tool_output untrusted=\"true\">\n{raw}\n</tool_output>"
This trains Anthropic models (and OpenAI's, less reliably) to treat the wrapped content as data rather than instructions. Workshop 10 covers this in depth; for now, recognize the failure mode.
6.4 The lost tool result¶
If you forget to append the tool_result block for some tool_use_id, the next API call will return 400: "Missing tool_result for tool_use id X." Every tool_use must be answered by exactly one tool_result referencing its id, in the very next user turn. This is a contract you cannot skip. The error message is clear; the fix is to never branch out of the loop between sending a tool_use response and appending all the tool_results.
Step 7: when to actually reach for a framework¶
You've built the kernel. When is it not enough?
- Multi-agent orchestration (a supervisor coordinating sub-agents): the Anthropic Agent SDK and OpenAI Agents SDK both have first-class support. Worth using even for a 2-agent system because routing, handoffs, and shared traces are non-trivial. Workshop 7 builds this.
- Memory across sessions (the agent should remember conversations from yesterday): tools like Mem0 and Letta exist to solve this; rolling your own is doable but you'll re-invent vector retrieval and conflict resolution.
- Production observability (every span, token, cost, tool call traced and queryable): the Anthropic SDK has hooks; Langfuse / LangSmith give you a UI. Workshop 9 builds this with raw OpenTelemetry, which is also fine.
- Retries and fallbacks (model A times out, retry with model B at lower quality): the SDKs ship with these; rolling them yourself is 30 lines of code but tedious to get right.
- Streaming UI (tokens appear as they're generated): the SDKs handle SSE parsing. Doable by hand but tedious. Workshop 8 covers it.
If you need none of those: write the loop. You'll spend less time fighting abstraction than you would learning a framework. If you need one of those: write the loop, add that one thing. If you need several: pick a framework, but pick the smallest one that covers your needs. The Anthropic and OpenAI SDKs are usually the right answer.
Now extend it¶
- Add the OpenAI variant. OpenAI's API has
tool_calls(array of{id, type: "function", function: {name, arguments: <json-string>}}) instead oftool_useblocks. Tool results are returned as messages withrole: "tool"and atool_call_id. The shape is structurally identical with three field-name changes. Add aprovider="openai"flag to your agent and route accordingly. - Add streaming. Use the
/v1/messagesSSE endpoint (stream: truein the request body). Parse the event stream, accumulate text content, and emit token events to a callback. Tool-use blocks arrive complete (no partial dispatch). - Add observability. Wrap
call_anthropicandrun_toolin functions that log to a JSONL file: timestamp, request id, model, tools called, tokens used, cost, duration. You now have a primitive trace store that you can grep. - Add a tool-call cache. If the agent calls
get_weather("Lagos")twice in one session, return the cached result without re-running. Useful for development iteration; risky in production (the world changes). - Run the same agent against three models (Sonnet, GPT-4o, Gemini Pro). Compare tool-selection accuracy, parallel-call frequency, and total tokens used on the same prompt. The differences are real and worth knowing.
What you might wonder¶
"How is this different from Workshop 2?" Workshop 2 has the same loop but bridges to MCP - the tools live in a separate process speaking JSON-RPC. This workshop's tools are Python functions in the same process. The pedagogical contrast: Workshop 2 teaches the protocol bridge; this one teaches the kernel without any protocol at all. In real production code, you typically have both - some tools as in-process Python functions (fast, no auth needed) and others as MCP servers (shared, authenticated, multi-client).
"Why use the raw HTTP API instead of the Anthropic SDK?" For learning. The SDK is a fine production choice; using httpx here proves the API is just JSON over HTTPS, no magic. Once you've seen that, you can use the SDK without it feeling like a black box. For production code, use the SDK - it handles retries, streaming, and edge cases the simple httpx.post doesn't.
"What's the right max_turns?" Empirically, most agent tasks converge in 1-5 turns. A max_turns of 10 catches accidents without limiting legitimate work. Long-horizon research agents may legitimately need 20-50 turns; in that case explicitly raise the cap and add a per-turn checkpoint so you can see what the agent is doing. Anything over 50 is usually a sign that the agent is stuck and the prompt or tools need work.
"How do I make the model less chatty between tool calls?" A system prompt: "Call tools when needed. Do not explain what you're about to do before doing it. Do not announce 'I'll now call X' - just call X. Reserve your text for the final answer." Saves 50-100 tokens per turn on long traces. Use sparingly; some "narration" makes traces debuggable.
"What if a tool needs to ask the user a clarifying question?" Either: (a) the tool returns a text result asking the question, and the model relays it; (b) the agent breaks out of the loop, asks the user, and resumes with the answer. Pattern (a) is simpler; pattern (b) is more natural for conversational UX but requires a longer-lived agent state.
"Should I use threads or asyncio for parallel tool calls?" Either works. Threads (the example above) are simpler and fine for I/O-bound tools (HTTP calls, DB queries) because the GIL doesn't matter for I/O wait. Asyncio is more efficient for high concurrency (1000+ parallel tools) but adds complexity (every tool must be async, every wrapping function must be async). For typical agents calling 2-10 tools per turn, threads are the right answer.
What this gave you¶
- You built a complete agent in 60 lines of pure Python with no SDK or framework.
- You proved the agent loop is a 5-line algorithm by writing those 5 lines.
- You implemented parallel tool calls and saw the 3-5× wall-clock speedup.
- You implemented three "agent patterns" (ReAct, plan-and-execute, reflection) as small prompt-engineering layers on the same kernel.
- You added the three production-required budget controls and exhibited the four classic failure modes (hallucinated tool, infinite loop, poisoned result, lost result).
- You have a concrete decision framework for "when do I reach for a framework?" - and the answer is honest: most of the time, write the loop and add only what you need.
The kernel is now in your hands. Every subsequent workshop in this series adds something on top: production-grade RAG (Workshop 5), structured output (Workshop 6), multi-agent supervisor (Workshop 7), streaming (Workshop 8), observability (Workshop 9), and prompt-injection defenses (Workshop 10). Each one is one more piece you'll know you need - or don't - because you've seen what's underneath.
Next: Workshop 5 - Production-grade RAG with hybrid retrieval + reranking + eval, where you'll measure the gap between naive RAG and what actually ships.
Submit your build¶
When you finish this workshop, share what you built so others can see and learn from your work. Include:
- Public repo with your 60-line agent (raw httpx, no SDK)
- Terminal log showing parallel tool calls and the wall-clock speedup
- One ReAct or plan-and-execute or reflection variant of your agent
- Demonstration that your Budget catches an infinite-loop scenario
- Short note (3 to 5 sentences) on which abstractions you'd add for your next agent and why
Submit your build Request feedback on your output Discuss this workshop