Add QAT (Quantization-Aware Training) Support Callback#8042
Add QAT (Quantization-Aware Training) Support Callback#8042y2logic wants to merge 4 commits intomodelscope:mainfrom
Conversation
Summary of ChangesHello @y2logic, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the training framework by introducing a new callback for Quantization-Aware Training (QAT). This feature leverages the TorchAO library to seamlessly integrate quantization effects during the training process, leading to more efficient and performant models. It automates the setup for QAT and handles the export of the final quantized model. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a QatCallback for Quantization-Aware Training, which is a valuable addition. The implementation provides a solid foundation. My review focuses on improving flexibility and correctness. Specifically, the quantization configuration is currently hardcoded, which limits the callback's utility. I've suggested making this configurable. Additionally, there's a minor issue with a log message. Addressing these points will make the new callback more robust and user-friendly.
| from torchao.quantization import Int4WeightOnlyConfig | ||
| self.quant_config = Int4WeightOnlyConfig() |
There was a problem hiding this comment.
This section can be improved in two ways:
- Flexibility: The quantization configuration is hardcoded to
Int4WeightOnlyConfig. This limits the callback's utility and contradicts the PR description's goal of supporting 'several quantization configurations'. It should be made configurable via training arguments to allow users to select differenttorchaoquantization schemes. - Code Style: The
importstatement is inside a method. For better readability and consistency with PEP 8, all imports (including those inon_train_beginandon_train_end) should be at the top of the file.
Here's a suggestion for the logic inside __init__, assuming import torchao.quantization as ao_quant is moved to the top:
| from torchao.quantization import Int4WeightOnlyConfig | |
| self.quant_config = Int4WeightOnlyConfig() | |
| import torchao.quantization as ao_quant | |
| # This assumes a new training argument `qat_config` is added, e.g. with a value like 'Int4WeightOnlyConfig' | |
| qat_config_name = getattr(self.args, 'qat_config', 'Int4WeightOnlyConfig') | |
| quant_config_cls = getattr(ao_quant, qat_config_name, None) | |
| if not quant_config_cls: | |
| raise ValueError(f"Unknown QAT config: '{qat_config_name}'") | |
| self.quant_config = quant_config_cls() |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Please merge the main branch, then run the following code: pip install pre-commit
pre-commit run --all-files |
|
Code style improved. |
PR type
PR information
This PR introduces a new QatCallback implementation to support Quantization-Aware Training (QAT) using TorchAO .
TorchAO is a PyTorch native library with support for custom high performance data types, quantization, and sparsity.
Quantization-Aware Training significantly improves post-training quantized model performance compared to PTQ by simulating quantization effects during training.
This callback integrates seamlessly with the existing Trainer framework and enables:
Automatic fake-quant insertion at training start
Post-training quantized model export
Support for several quantization configurations
Experiment results