Post-Training Small Language Models: the adventures of a practitioner

Material for the homonymous talk.

🍿 Talk - PyCon Italy 2025

📝 Abstract

In 2025, AI is still evolving rapidly. While closed LLMs are continuously improving, open Small Language Models are emerging as powerful alternatives for specific use cases, consuming only a fraction of the resources.

Working in AI engineering, I often find it refreshing to step away from orchestration and get hands-on with fine-tuning, customizing, and optimizing Small Models. In this talk, I'll share my journey working with Post-Training Small Language Models, full of joys, frustrations, and many lessons learned.

We'll see together:

How generative Language Models are trained and how we can further customize them
Tips for collecting and generating data for fine-tuning
Instruction Fine-Tuning and Preference Tuning (DPO)
Key training libraries, with a focus on Hugging Face TRL.
Low-resource fine-tuning methods (QLoRA, Spectrum).
A look at quantization and model merging.

By the end, you'll learn how to customize Small Language Models for your needs and potentially run them on your smartphone.

I'll also share practical examples from my experience improving open models for the Italian language.

UPDATE This world changed a bit since when I proposed the talk, so I added a section about Reasoning models and GRPO!

🌱 Intro

Evaluation ⚖️

lm-evaluation-harness: common framework for evaluating Language Models.
🤗 YourBench: open-source framework for generating domain-specific benchmarks.

Choosing the model to train

How to approach post-training for AI applications by Nathan Lambert: slides; video.

👣 Common Post Training steps

Supervised Fine-Tuning (SFT)

SFT with TRL

SFT projects

Preference Alignment

Direct Preference Optimization (DPO) with TRL

DPO projects

DPO only: Fine-tune Mistral-7b with Direct Preference Optimization by Maxime Labonne
DPO + SFT: Post-training Gemma for Italian and beyond

Supervised Fine-Tuning vs Preference Alignment

Disentangling Post-training performance elicitation from data by Mohit Raghavendra

Tips from practice

Distilabel: unmaintained project but good for inspiration on several techniques to generate synthetic data.
Setting max_seq_length, max_prompt_length, and max_length: good explanation on this article by Philipp Schmid.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
code_snippets		code_snippets
kaggle_practical_ai		kaggle_practical_ai
.gitignore		.gitignore
README.md		README.md
slides.pdf		slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Post-Training Small Language Models: the adventures of a practitioner

📚 Resources and code ‍💻

🌱 Intro

Evaluation ⚖️

Choosing the model to train

👣 Common Post Training steps

Supervised Fine-Tuning (SFT)

SFT with TRL

SFT projects

Preference Alignment

Direct Preference Optimization (DPO) with TRL

DPO projects

Supervised Fine-Tuning vs Preference Alignment

Tips from practice

⚙️💰 Memory-efficient training

LoRA and QLoRA

Spectrum

Projects on memory-efficient training

🧩 Model merging

🧠💭 Reasoning models and GRPO

GRPO projects

📱 Small Language Models on a phone

GGUF

About

Uh oh!

Releases

Packages

Languages

anakin87/posttraining-small-language-models-talk

Folders and files

Latest commit

History

Repository files navigation

Post-Training Small Language Models: the adventures of a practitioner

📚 Resources and code ‍💻

🌱 Intro

Evaluation ⚖️

Choosing the model to train

👣 Common Post Training steps

Supervised Fine-Tuning (SFT)

SFT with TRL

SFT projects

Preference Alignment

Direct Preference Optimization (DPO) with TRL

DPO projects

Supervised Fine-Tuning vs Preference Alignment

Tips from practice

⚙️💰 Memory-efficient training

LoRA and QLoRA

Spectrum

Projects on memory-efficient training

🧩 Model merging

🧠💭 Reasoning models and GRPO

GRPO projects

📱 Small Language Models on a phone

GGUF

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages