Month 8-Week 2: LoRA + QLoRA-first fine-tune¶

Week summary¶

Goal: Read LoRA and QLoRA papers. SFT a small model with LoRA. Eval before / after. Internalize when fine-tuning is the right tool.
Time: ~10 h over 3 sessions.
Output: Fine-tuned LoRA adapter; before/after eval; notebook documenting the process.
Sequences relied on: 15-fine-tuning rungs 01–05.

Why this week matters¶

Fine-tuning is the bridge from "user of frontier models" to "shaper of model behavior." LoRA (and QLoRA-quantized LoRA) made it accessible on a single GPU. Knowing when fine-tuning is the right tool-and especially when it isn't-is core literacy.

Prerequisites¶

M08-W01 complete.
GPU access continued.

Recommended cadence¶

Session A-Tue/Wed evening (~3.5 h): papers + when not to fine-tune
Session B-Sat morning (~4 h): first SFT with LoRA
Session C-Sun afternoon (~2.5 h): QLoRA on bigger model + eval

Session A-LoRA + QLoRA papers + when not to fine-tune¶

Goal: Read both papers. Internalize when fine-tuning is the right tool.

Part 1-When NOT to fine-tune (45 min)¶

Common mistakes: - "Add knowledge"-use RAG instead. Fine-tuning bakes facts into weights but doesn't update easily. - "Improve at long-context tasks"-usually a context-length / prompt issue, not a weights issue. - "Make the model good at my niche domain"-try few-shot first; fine-tune only if few-shot insufficient.

When fine-tuning IS right: - Change behavior, format, tone-not knowledge. - Specialize on a narrow output structure. - Compress a working long prompt into a smaller, faster model. - Distill a strong model's behavior into a cheaper deployment.

Read: OpenAI's fine-tuning guide. Plus Sebastian Raschka's blog on fine-tuning practical advice (sebastianraschka.com).

Part 2-LoRA paper (60 min)¶

Read: LoRA (arxiv.org/abs/2106.09685). Sections 1, 4, 5.

Key idea: instead of fine-tuning all weights, freeze them and add small low-rank update matrices. Trainable params drop 100–1000×.

Math: a weight matrix W (large) is replaced (additively) by W + B·A where B is d × r, A is r × k, with r << d, k. Often r = 8–32.

Part 3-QLoRA paper (75 min)¶

Read: QLoRA (arxiv.org/abs/2305.14314). Sections 1, 3, 4.

Key contributions: - Quantize the base model to 4-bit using the NF4 format (information-theoretically optimal for normally distributed weights). - Adapter weights stay in fp16/bf16. - Double-quantization for further memory savings. - Paged optimizers for handling memory spikes.

Result: can fine-tune a 70B model on a 48GB GPU. Or a 7B model on a 16GB GPU.

Output of Session A¶

Notes on when (not) to fine-tune.
LoRA + QLoRA paper notes.

Session B-First fine-tune with TRL + PEFT¶

Goal: SFT Qwen2.5-0.5B (or similar small model) on a domain dataset using LoRA.

Part 1-Setup (30 min)¶

uv pip install transformers trl peft datasets accelerate bitsandbytes wandb
huggingface-cli login  # for any gated models
wandb login

Part 2-Pick a dataset and a small model (45 min)¶

Model: Qwen/Qwen2.5-0.5B-Instruct (small, fits even on Colab T4).

Dataset: Either: - databricks/databricks-dolly-15k (general). - A synthetic dataset for your domain (e.g., generate 500 incident-report-to-triage pairs with Claude). - HuggingFaceH4/no_robots.

Format conversation-style:

{"conversation": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Part 3-Training script (165 min)¶

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from trl import SFTConfig, SFTTrainer

model_id = "Qwen/Qwen2.5-0.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")

peft_config = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05, bias="none", task_type="CAUSAL_LM",
)

ds = load_dataset("HuggingFaceH4/no_robots", split="train_sft").select(range(500))

cfg = SFTConfig(
    output_dir="ft-out",
    num_train_epochs=2,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    warmup_ratio=0.03,
    logging_steps=10,
    bf16=True,
    report_to="wandb",
)

trainer = SFTTrainer(
    model=model, args=cfg, train_dataset=ds,
    peft_config=peft_config,
    tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("ft-out/final")

Watch loss. Should decrease.

Output of Session B¶

Trained adapter at ft-out/final/.
W&B run with loss curve.

Session C-QLoRA on a bigger model + eval¶

Goal: QLoRA-fine-tune a 7B model. Compare base vs fine-tuned on your eval set.

Part 1-QLoRA config (30 min)¶

from transformers import BitsAndBytesConfig
bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16",
    bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", quantization_config=bnb)

Adjust LoRA config for the larger model (same r, more target_modules including FFN).

Part 2-Train + save (90 min)¶

Same SFTTrainer setup. Train for 1 epoch (don't overfit small datasets). ~30–60 min on a single A10.

Part 3-Eval before / after (60 min)¶

Use your M04-W03 / M06-W03 eval setup. Run on 30 examples: - Base model. - Fine-tuned model (with adapter loaded).

Compare: | Metric | Base | Fine-tuned | Δ | |---|---|---|---| | Format-conformance | 0.66 | 0.92 | +0.26 | | Severity match | 0.71 | 0.78 | +0.07 | | Faithfulness (judge) | 4.0 | 3.9 | -0.1 |

Common pattern: fine-tuning helps format/structure dramatically; helps factual quality less; can hurt if dataset is too narrow ("catastrophic forgetting").

Honest write-up in repo.

Output of Session C¶

7B QLoRA adapter trained.
Before/after eval committed.

End-of-week artifact¶

LoRA + QLoRA paper notes
Small-model LoRA fine-tune
7B-model QLoRA fine-tune
Before/after eval with delta documented

End-of-week self-assessment¶

I can articulate when to fine-tune vs RAG vs prompt-tune.
I can write a TRL SFTTrainer config from a blank file.
I have measured my fine-tune's effect on a real eval set.

Common failure modes for this week¶

Fine-tuning to "improve quality" without specifying what you're improving. Always specify.
No before/after eval. Without it, you don't know if fine-tuning helped.
Too small dataset (< 100). Generally insufficient unless domain is very narrow.

What's next (preview of M08-W03)¶

DPO-direct preference optimization. The simpler, more elegant successor to RLHF/PPO.