06 - The Hugging Face ecosystem map¶
What this session is¶
The Hugging Face universe is huge and overwhelming on first contact. This page is the map: what each piece does, what it competes with, which to actually use.
Why HF matters¶
HF is the GitHub of AI. Hundreds of thousands of models, tens of thousands of datasets, the leaderboard everyone watches, and the libraries that wire them together. If you're doing applied AI today, you're using HF for something, even if you don't know it.
The pieces¶
transformers¶
The main library. Loads thousands of pretrained models. Same API for GPT, Llama, BERT, T5, vision models, audio models. Use it to:
- Run inference (
pipeline,AutoModelForX). - Fine-tune (
Trainer). - Mix and match tokenizers, models, configurations.
If you only learn one HF library, learn this one.
datasets¶
Library to load and stream datasets. ~100K datasets available. Standardized format. Memory-mapped - works with bigger-than-RAM data. Use for:
- Loading common datasets (
load_dataset("squad")). - Streaming huge datasets (
streaming=True). - Tokenizing pipelines via
.map().
tokenizers¶
Fast tokenizer implementations (Rust core, Python bindings). Most users access tokenizers via transformers - but tokenizers directly is what you need if you're training a new tokenizer or doing volume processing.
accelerate¶
Wraps PyTorch training to run on CPU / single GPU / multi-GPU / TPU with the same code. Handles distributed training boilerplate. Use for any training that won't fit on one GPU.
peft¶
Parameter-efficient fine-tuning. LoRA, QLoRA, prefix tuning, IA3. Lets you fine-tune big models on small GPUs by only training tiny adapters. This is the right way to fine-tune almost always.
trl¶
Transformer reinforcement learning. RLHF, DPO, ORPO, KTO - all the modern preference alignment algorithms. Higher-level than transformers. Use for fine-tuning chat models with human feedback or synthetic preferences.
bitsandbytes¶
Quantization library. 8-bit and 4-bit weight quantization. Lets you load a 70B model on 24GB of VRAM. Used heavily by peft for QLoRA.
diffusers¶
For image, video, and audio generation models. Stable Diffusion, FLUX, AnimateDiff, music gen. Same pipeline UX as transformers but for diffusion models.
sentence-transformers¶
For embeddings. Vector representations of text. Used everywhere in RAG. Different from transformers because optimized for embedding production, not generation.
evaluate¶
Library for computing metrics (BLEU, ROUGE, accuracy, perplexity, plus custom). Used in training loops and benchmarks.
Spaces¶
Hosted demos. Build a Gradio or Streamlit app, push to Spaces, free hosting (with limits). Great for portfolio.
Hub (the website)¶
Where models, datasets, and Spaces live. Free for public, paid for private and Pro features. You'll spend hours here.
Inference Endpoints / Inference API / TGI¶
Hosted inference services. Useful for prototyping, expensive for production.
What competes with what¶
You'll see overlapping options. Honest comparison:
| For... | HF tool | Competition | Pick |
|---|---|---|---|
| Model serving | text-generation-inference |
vLLM, Ollama, llama.cpp | vLLM for prod, Ollama for local |
| Fine-tuning UI | Trainer |
axolotl, unsloth, lit-gpt | Trainer + trl for flexibility; axolotl for less code |
| Embeddings | sentence-transformers |
OpenAI embeddings API, Cohere | sentence-transformers if self-host; APIs if not |
| Datasets | datasets |
Pandas, polars, raw files | datasets for ML workflows |
| Vector DB | - | Chroma, Qdrant, Weaviate, pgvector | Qdrant or pgvector for prod |
| Eval | evaluate, lm-eval-harness |
Promptfoo, Ragas, custom | lm-eval-harness for benchmarks; Promptfoo for app eval |
The pattern: HF's libraries are usually a strong default for model-side work. For infrastructure (serving, vector DBs), specialized tools tend to win.
Minimum-viable HF skill¶
You should be able to:
- Load any model from the Hub:
AutoModel.from_pretrained("org/model-name"). - Use
pipeline("text-generation", model=...)for quick inference. - Fine-tune with
Traineron a dataset fromdatasets. - LoRA fine-tune with
peft + trl. - Push a model or Space to your account.
If you can do all five, you're past the on-ramp.
The HF version pain¶
HF moves fast and breaks things. A tutorial from 6 months ago might not work. Solutions:
- Pin versions in
requirements.txtwhen you find a working combo. - Check release notes for breaking changes before upgrading.
- Use
transformers[testing]extras when you need the full test toolchain.
This isn't HF being careless; it's that the field is moving and they're tracking it. Plan for churn.
How to keep up¶
- HF blog (huggingface.co/blog) - solid technical posts.
@huggingfaceon Twitter/X - release announcements.- The Hub leaderboards - open LLM leaderboard, embedding leaderboard.
- Daily papers (huggingface.co/papers) - curated arXiv. Use this instead of trying to read all of arXiv.
What you might wonder¶
"Does everyone in production use HF?" For training and prototyping: very often. For serving: usually not their inference services (cost). The libraries are widely used; the hosted services less so.
"Are HF models commercially usable?" Depends on the license. Some yes (Apache 2.0, MIT, Llama license with caveats), some no (research-only). Always check the model card.
"What about Anthropic / OpenAI / Google models?" Closed-source. Different mental model: you call an API, you don't own the weights. Often the right answer for product features; the wrong answer when you need control, customization, or low cost at scale.
Done¶
- Mapped the HF ecosystem.
- Know which HF tools to use and which to swap.
- Have a "minimum-viable HF skill" target.
Next: Picking a specialization →