Month 8-Week 4: Self-host economics blog post + GRPO preview¶

Week summary¶

Goal: Publish "What it actually costs to self-host a 7B model in 2026" with real numbers from your inference work. Read DeepSeek-R1 + GRPO methodology to stay current.
Time: ~9 h over 3 sessions.
Output: Eighth public blog post; GRPO paper notes; month-8 retrospective.

Why this week matters¶

The cost-of-self-hosting analysis is one of the highest-engagement post types in 2025–2026-exactly the kind of content AI-infra hiring managers screen for. GRPO is the post-training technique behind DeepSeek-R1 and the most exciting RL-fine-tuning advance of the period; staying current with the frontier means knowing it.

Prerequisites¶

M08-W01–W03 complete.

Recommended cadence¶

Session A-Tue/Wed evening (~3 h): cost modeling + post outline
Session B-Sat morning (~3.5 h): post draft + edit
Session C-Sun afternoon (~2.5 h): publish + GRPO read + month retro

Session A-Cost modeling¶

Goal: Combine M08-W01 benchmarks with API pricing for an apples-to-apples cost comparison.

Part 1-Build the workload model (60 min)¶

Define a hypothetical workload that's realistic for your domain: - Volume: 10M tokens/day (mix of input + output). Choose based on what your project's traffic might look like. - Latency target: p95 TTFT < 1.5s. - Quality target: Equivalent quality to API for the workload.

Part 2-Cost components (60 min)¶

Self-hosted: - GPU: A10 24GB ~$0.79/hr at RunPod, A100 80GB ~$1.89/hr. - Storage + egress: ~$0.05/hr. - Effort tax: 10% of an engineer's time (~$200/week internalized).

API: - Anthropic Claude Haiku 4.5: $1/M input, $5/M output. - OpenAI GPT-4o: $2.50/M input, $10/M output. - For 10M tokens/day at, say, 70/30 input/output split: API cost $(7M × $1 + 3M × $5)/day = $22/day = $660/month.

Self-hosted at 30 tokens/sec/concurrent-request with 4 concurrent: throughput ~120 tokens/sec sustained. To handle 10M tokens/day = ~115 tokens/sec average peaks higher. So 1 A10 saturated: - $0.79 × 24 × 30 = $569/month.

Crossover: depends on quality vs API. For a 7B model the quality may not match Claude Haiku-so this isn't apples-to-apples in quality, only in throughput.

Part 3-Outline the post (60 min)¶

1. Hook (200 w)
   "I rented a GPU and ran a real workload. Here are the numbers-and the costs that don't show up on the GPU pricing page."
2. The workload (300 w)
3. The numbers-self-hosted (500 w)
   - Setup time (real numbers).
   - Throughput at the latency budget.
   - Cost per million tokens (self-hosted, including effort tax).
4. The numbers-API (300 w)
   - Same workload through Claude / OpenAI.
   - Cost per million tokens.
5. Quality comparison (400 w)
   - 7B-self-hosted vs Haiku on a small eval. Honest.
6. The break-even (200 w)
   - Where self-hosting wins. Where it doesn't.
7. The hidden costs (300 w)
   - Updates, reliability, multi-tenant scheduling, on-call.
8. What I'd do (200 w)
   - Hybrid: API for most; self-host for X (e.g., latency-sensitive stream that's PII-redacted).

Output of Session A¶

Cost model with numbers.
Outline.

Session B-Draft + edit¶

Goal: Write the full ~2500 words. Edit twice.

Part 1-Draft (180 min)¶

Write. Use real numbers. Embed code, screenshots, charts.

Part 2-Edit (60 min)¶

Read aloud. Tighten. Verify all numbers are accurate.

Output of Session B¶

Drafted + edited blog post.

Session C-Publish + GRPO + month retro¶

Goal: Publish broadly. Read GRPO paper. Run month retro.

Part 1-Publish (60 min)¶

Personal blog.
Cross-post: HN (Show HN), r/LocalLLaMA (this audience will love it; could go viral), r/MachineLearning, X, LinkedIn.
Tag the vLLM team, Modal, RunPod, Lambda Labs politely.

Part 2-GRPO + DeepSeek-R1 (60 min)¶

Read: DeepSeek-R1 technical report (search "DeepSeek-R1 technical report arxiv"). Sections on the post-training pipeline.

GRPO (Group Relative Policy Optimization)-key idea: - Generate K completions per prompt. - Score each with a reward signal (could be programmatic, like passing tests). - Compute advantage as completion-score minus group-mean. - Optimize policy with clipped objective like PPO, but without separate value model (the group mean is the implicit baseline).

GRPO removes another piece of complexity from PPO, much like DPO did. The lineage: PPO → DPO → GRPO.

Read the HF TRL GRPOTrainer docs to see how it's used. Even if you don't run it this week (compute-intensive), the awareness matters.

Part 3-Month-8 retro (45 min)¶

MONTH_8_RETRO.md:

# Month 8 retro

## Artifacts shipped
- vLLM benchmarks across concurrency
- LoRA + QLoRA adapters
- DPO adapter + 3-way eval
- Self-host economics post: <link>
- GRPO paper notes

## KPIs vs Q3 targets
| Metric | Target Q3 | End of M08 |
|---|---|---|
| Public repos | 1 | 1 (specialty) + 1 (fine-tuning experiments)
| Blog posts | 2 | 2 ✓
| Papers read deeply | 12 | 8 (need 4 more in M09)
| OSS PRs | 1+ | 0 (M09 target)

## Lessons
1. Quantization is a real lever for self-hosting economics.
2. DPO's reward gap doesn't always translate to eval wins-need to look at multiple metrics.
3. Self-host vs API is workload-specific; quality comparison is the missing factor in most "cost" debates.

## M09 plan
- Distributed-training literacy (FSDP).
- OSS PR upstream (in track project).
- Track final push.
- Specialty post (Q3 closing).

Output of Session C¶

Eighth public blog post live, ≥3 channels.
GRPO paper notes.
Month-8 retrospective.

End-of-week artifact¶

Eighth public blog post published, ≥3 channels
GRPO paper notes
Month-8 retrospective

End-of-week self-assessment¶

I can defend a self-host vs API decision with numbers.
I have at least one post that could plausibly trend on r/LocalLLaMA.
I'm aware of GRPO's place in the post-training landscape.

Common failure modes for this week¶

Numbers without quality. Self-host cheaper means nothing if quality doesn't match.
Skipping GRPO because "it's research." Awareness is cheap; ignorance is expensive.
Vague hidden-cost section. Be specific about effort tax.

What's next (preview of M09-W01)¶

Distributed training fundamentals-DDP, FSDP, ZeRO. Multi-GPU run on rented hardware.