Material for the homonymous talk.
📝 Abstract
In 2025, AI is still evolving rapidly. While closed LLMs are continuously improving, open Small Language Models are emerging as powerful alternatives for specific use cases, consuming only a fraction of the resources.Working in AI engineering, I often find it refreshing to step away from orchestration and get hands-on with fine-tuning, customizing, and optimizing Small Models. In this talk, I'll share my journey working with Post-Training Small Language Models, full of joys, frustrations, and many lessons learned.
We'll see together:
- How generative Language Models are trained and how we can further customize them
- Tips for collecting and generating data for fine-tuning
- Instruction Fine-Tuning and Preference Tuning (DPO)
- Key training libraries, with a focus on Hugging Face TRL.
- Low-resource fine-tuning methods (QLoRA, Spectrum).
- A look at quantization and model merging.
By the end, you'll learn how to customize Small Language Models for your needs and potentially run them on your smartphone.
I'll also share practical examples from my experience improving open models for the Italian language.
UPDATE This world changed a bit since when I proposed the talk, so I added a section about Reasoning models and GRPO!
- 🌱 Intro
- 👣 Common Post Training steps
- ⚙️💰 Memory-efficient training
- 🧩 Model merging
- 🧠💭 Reasoning models and GRPO
- 📱 Small Language Models on a phone
- lm-evaluation-harness: common framework for evaluating Language Models.
- 🤗 YourBench: open-source framework for generating domain-specific benchmarks.
- Fine-Tune Your Own Llama 2 Model in a Colab Notebook by Maxime Labonne
- Fine-Tune Phi 3.5 mini on Italian
- DPO only: Fine-tune Mistral-7b with Direct Preference Optimization by Maxime Labonne
- DPO + SFT: Post-training Gemma for Italian and beyond
- Distilabel: unmaintained project but good for inspiration on several techniques to generate synthetic data.
- Setting
max_seq_length,max_prompt_length, andmax_length: good explanation on this article by Philipp Schmid.
- LoRA: Low-Rank Adaptation of Large Language Models
- QLoRA: Efficient Finetuning of Quantized LLMs
- QLoRA with TRL: code snippet
- QLoRA tutorial by Philipp Schmid
- QLoRA on Gemma from Google documentation
- Selective fine-tuning of Language Models with Spectrum - tutorial
- Model merging with Mergekit: code snippet
- Merge Large Language Models with MergeKit: blogpost by Maxime Labonne
- Series of articles on reasoning models by Sebastian Raschka: 1, 2, 3.
- Build Reasoning models: chapter from Hugging Face LLM course
- GRPO with TRL: docs
- GRPO Llama-1B (GSM8K): gist by William Brown
- Qwen Scheduler GRPO: detailed walkthrough on training a reasoning model to solve a scheduling problem
- GGUF My Repo: Hugging Face space to convert your model to GGUF format.
- llama.cpp script for conversion.