Skip to content

Add QAT (Quantization-Aware Training) Support Callback#8042

Open
y2logic wants to merge 4 commits intomodelscope:mainfrom
y2logic:qat_support
Open

Add QAT (Quantization-Aware Training) Support Callback#8042
y2logic wants to merge 4 commits intomodelscope:mainfrom
y2logic:qat_support

Conversation

@y2logic
Copy link
Contributor

@y2logic y2logic commented Feb 12, 2026

PR type

  • New Feature

PR information

This PR introduces a new QatCallback implementation to support Quantization-Aware Training (QAT) using TorchAO .

TorchAO is a PyTorch native library with support for custom high performance data types, quantization, and sparsity.
Quantization-Aware Training significantly improves post-training quantized model performance compared to PTQ by simulating quantization effects during training.

This callback integrates seamlessly with the existing Trainer framework and enables:

Automatic fake-quant insertion at training start
Post-training quantized model export
Support for several quantization configurations

Experiment results

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @y2logic, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the training framework by introducing a new callback for Quantization-Aware Training (QAT). This feature leverages the TorchAO library to seamlessly integrate quantization effects during the training process, leading to more efficient and performant models. It automates the setup for QAT and handles the export of the final quantized model.

Highlights

  • New QAT Callback: Introduced a new QatCallback class to enable Quantization-Aware Training (QAT) using the TorchAO library.
  • Callback Integration: Integrated the QatCallback into the system's callbacks_map for seamless activation within the training framework.
  • Automatic QAT Setup: Implemented automatic fake-quantizer insertion at the beginning of training to simulate quantization effects.
  • Quantized Model Export: Added functionality to export the post-training quantized model at the end of the training process.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • swift/callbacks/mapping.py
    • Added 'qat' entry to the callbacks_map dictionary, linking to the new QatCallback.
  • swift/callbacks/qat.py
    • Created a new file implementing the QatCallback class, which manages QAT initialization and quantized model saving.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a QatCallback for Quantization-Aware Training, which is a valuable addition. The implementation provides a solid foundation. My review focuses on improving flexibility and correctness. Specifically, the quantization configuration is currently hardcoded, which limits the callback's utility. I've suggested making this configurable. Additionally, there's a minor issue with a log message. Addressing these points will make the new callback more robust and user-friendly.

Comment on lines +20 to +21
from torchao.quantization import Int4WeightOnlyConfig
self.quant_config = Int4WeightOnlyConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This section can be improved in two ways:

  1. Flexibility: The quantization configuration is hardcoded to Int4WeightOnlyConfig. This limits the callback's utility and contradicts the PR description's goal of supporting 'several quantization configurations'. It should be made configurable via training arguments to allow users to select different torchao quantization schemes.
  2. Code Style: The import statement is inside a method. For better readability and consistency with PEP 8, all imports (including those in on_train_begin and on_train_end) should be at the top of the file.

Here's a suggestion for the logic inside __init__, assuming import torchao.quantization as ao_quant is moved to the top:

Suggested change
from torchao.quantization import Int4WeightOnlyConfig
self.quant_config = Int4WeightOnlyConfig()
import torchao.quantization as ao_quant
# This assumes a new training argument `qat_config` is added, e.g. with a value like 'Int4WeightOnlyConfig'
qat_config_name = getattr(self.args, 'qat_config', 'Int4WeightOnlyConfig')
quant_config_cls = getattr(ao_quant, qat_config_name, None)
if not quant_config_cls:
raise ValueError(f"Unknown QAT config: '{qat_config_name}'")
self.quant_config = quant_config_cls()

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@y2logic y2logic marked this pull request as ready for review February 12, 2026 08:40
@Jintao-Huang
Copy link
Collaborator

Please merge the main branch, then run the following code:

pip install pre-commit
pre-commit run --all-files

@y2logic
Copy link
Contributor Author

y2logic commented Feb 13, 2026

Code style improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants