Skip to content

anakin87/posttraining-small-language-models-talk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Post-Training Small Language Models: the adventures of a practitioner

Material for the homonymous talk.

🍿 Talk - PyCon Italy 2025

🧑‍🏫 Slide deck

📝 Abstract In 2025, AI is still evolving rapidly. While closed LLMs are continuously improving, open Small Language Models are emerging as powerful alternatives for specific use cases, consuming only a fraction of the resources.

Working in AI engineering, I often find it refreshing to step away from orchestration and get hands-on with fine-tuning, customizing, and optimizing Small Models. In this talk, I'll share my journey working with Post-Training Small Language Models, full of joys, frustrations, and many lessons learned.

We'll see together:

  • How generative Language Models are trained and how we can further customize them
  • Tips for collecting and generating data for fine-tuning
  • Instruction Fine-Tuning and Preference Tuning (DPO)
  • Key training libraries, with a focus on Hugging Face TRL.
  • Low-resource fine-tuning methods (QLoRA, Spectrum).
  • A look at quantization and model merging.

By the end, you'll learn how to customize Small Language Models for your needs and potentially run them on your smartphone.

I'll also share practical examples from my experience improving open models for the Italian language.

UPDATE This world changed a bit since when I proposed the talk, so I added a section about Reasoning models and GRPO!

📚 Resources and code ‍💻

🌱 Intro

Evaluation ⚖️

Choosing the model to train

  • How to approach post-training for AI applications by Nathan Lambert: slides; video.

👣 Common Post Training steps

Supervised Fine-Tuning (SFT)

SFT with TRL
SFT projects

Preference Alignment

Direct Preference Optimization (DPO) with TRL
DPO projects

Supervised Fine-Tuning vs Preference Alignment

Tips from practice

  • Distilabel: unmaintained project but good for inspiration on several techniques to generate synthetic data.
  • Setting max_seq_length, max_prompt_length, and max_length: good explanation on this article by Philipp Schmid.

⚙️💰 Memory-efficient training

LoRA and QLoRA

Spectrum

Projects on memory-efficient training

🧩 Model merging

🧠💭 Reasoning models and GRPO

GRPO projects

📱 Small Language Models on a phone

GGUF

About

Materials for the PyCon Italy 2025 talk

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages