Saltar a contenido

06 - The Hugging Face ecosystem map

What this session is

The Hugging Face universe is huge and overwhelming on first contact. This page is the map: what each piece does, what it competes with, which to actually use.

Why HF matters

HF is the GitHub of AI. Hundreds of thousands of models, tens of thousands of datasets, the leaderboard everyone watches, and the libraries that wire them together. If you're doing applied AI today, you're using HF for something, even if you don't know it.

The pieces

transformers

The main library. Loads thousands of pretrained models. Same API for GPT, Llama, BERT, T5, vision models, audio models. Use it to:

  • Run inference (pipeline, AutoModelForX).
  • Fine-tune (Trainer).
  • Mix and match tokenizers, models, configurations.

If you only learn one HF library, learn this one.

datasets

Library to load and stream datasets. ~100K datasets available. Standardized format. Memory-mapped - works with bigger-than-RAM data. Use for:

  • Loading common datasets (load_dataset("squad")).
  • Streaming huge datasets (streaming=True).
  • Tokenizing pipelines via .map().

tokenizers

Fast tokenizer implementations (Rust core, Python bindings). Most users access tokenizers via transformers - but tokenizers directly is what you need if you're training a new tokenizer or doing volume processing.

accelerate

Wraps PyTorch training to run on CPU / single GPU / multi-GPU / TPU with the same code. Handles distributed training boilerplate. Use for any training that won't fit on one GPU.

peft

Parameter-efficient fine-tuning. LoRA, QLoRA, prefix tuning, IA3. Lets you fine-tune big models on small GPUs by only training tiny adapters. This is the right way to fine-tune almost always.

trl

Transformer reinforcement learning. RLHF, DPO, ORPO, KTO - all the modern preference alignment algorithms. Higher-level than transformers. Use for fine-tuning chat models with human feedback or synthetic preferences.

bitsandbytes

Quantization library. 8-bit and 4-bit weight quantization. Lets you load a 70B model on 24GB of VRAM. Used heavily by peft for QLoRA.

diffusers

For image, video, and audio generation models. Stable Diffusion, FLUX, AnimateDiff, music gen. Same pipeline UX as transformers but for diffusion models.

sentence-transformers

For embeddings. Vector representations of text. Used everywhere in RAG. Different from transformers because optimized for embedding production, not generation.

evaluate

Library for computing metrics (BLEU, ROUGE, accuracy, perplexity, plus custom). Used in training loops and benchmarks.

Spaces

Hosted demos. Build a Gradio or Streamlit app, push to Spaces, free hosting (with limits). Great for portfolio.

Hub (the website)

Where models, datasets, and Spaces live. Free for public, paid for private and Pro features. You'll spend hours here.

Inference Endpoints / Inference API / TGI

Hosted inference services. Useful for prototyping, expensive for production.

What competes with what

You'll see overlapping options. Honest comparison:

For... HF tool Competition Pick
Model serving text-generation-inference vLLM, Ollama, llama.cpp vLLM for prod, Ollama for local
Fine-tuning UI Trainer axolotl, unsloth, lit-gpt Trainer + trl for flexibility; axolotl for less code
Embeddings sentence-transformers OpenAI embeddings API, Cohere sentence-transformers if self-host; APIs if not
Datasets datasets Pandas, polars, raw files datasets for ML workflows
Vector DB - Chroma, Qdrant, Weaviate, pgvector Qdrant or pgvector for prod
Eval evaluate, lm-eval-harness Promptfoo, Ragas, custom lm-eval-harness for benchmarks; Promptfoo for app eval

The pattern: HF's libraries are usually a strong default for model-side work. For infrastructure (serving, vector DBs), specialized tools tend to win.

Minimum-viable HF skill

You should be able to:

  1. Load any model from the Hub: AutoModel.from_pretrained("org/model-name").
  2. Use pipeline("text-generation", model=...) for quick inference.
  3. Fine-tune with Trainer on a dataset from datasets.
  4. LoRA fine-tune with peft + trl.
  5. Push a model or Space to your account.

If you can do all five, you're past the on-ramp.

The HF version pain

HF moves fast and breaks things. A tutorial from 6 months ago might not work. Solutions:

  • Pin versions in requirements.txt when you find a working combo.
  • Check release notes for breaking changes before upgrading.
  • Use transformers[testing] extras when you need the full test toolchain.

This isn't HF being careless; it's that the field is moving and they're tracking it. Plan for churn.

How to keep up

  • HF blog (huggingface.co/blog) - solid technical posts.
  • @huggingface on Twitter/X - release announcements.
  • The Hub leaderboards - open LLM leaderboard, embedding leaderboard.
  • Daily papers (huggingface.co/papers) - curated arXiv. Use this instead of trying to read all of arXiv.

What you might wonder

"Does everyone in production use HF?" For training and prototyping: very often. For serving: usually not their inference services (cost). The libraries are widely used; the hosted services less so.

"Are HF models commercially usable?" Depends on the license. Some yes (Apache 2.0, MIT, Llama license with caveats), some no (research-only). Always check the model card.

"What about Anthropic / OpenAI / Google models?" Closed-source. Different mental model: you call an API, you don't own the weights. Often the right answer for product features; the wrong answer when you need control, customization, or low cost at scale.

Done

  • Mapped the HF ecosystem.
  • Know which HF tools to use and which to swap.
  • Have a "minimum-viable HF skill" target.

Next: Picking a specialization →

Comments