QDP Studio

QDP Studio is a comprehensive model compression framework designed to optimize deep learning models through multiple advanced techniques: Quantization, Decomposition, Pruning, and Knowledge Distillation. With support for hybrid compression, QDP Studio enables you to significantly reduce model size, accelerate inference, and maintain high accuracy—all while streamlining deployment across various devices.

Features

Quantization
Leverage different quantization strategies to convert high-precision models into lower-bit representations for faster, more efficient inference. Available modes include:
- default: Standard quantization pipeline.
- dynamic: Dynamic quantization for runtime optimizations.
- static: Static quantization using calibration data.
- qat: Quantization-aware training for higher accuracy.
Pruning
Reduce model complexity by removing redundant weights using various pruning techniques. Available modes include:
- default: Standard pruning procedure.
- unstructured: Prune individual weights without structure.
- structured: Remove entire neurons or filters for hardware efficiency.
- iterative: Apply pruning in iterative steps with fine-tuning after each step.
Decomposition
Simplify model layers by decomposing weight matrices or tensors. Available modes include:
- default: Standard decomposition approach.
- truncatedSVD: Use truncated Singular Value Decomposition to approximate layers.
- tensorDecomposition: Apply tensor-based decomposition techniques to compress multi-dimensional weights.
Knowledge Distillation
Transfer knowledge from a large pre-trained network (teacher) to a smaller network (student). Available modes include:
- default: Standard distillation procedure.
- teacher_assisted: Enhanced teacher assistance through additional supervision.
- temperature_scaling: Use temperature scaling to soften outputs and improve transfer.
Hybrid Compression Pipeline
Apply all supported compression techniques sequentially in one unified pipeline. This hybrid approach maximizes the benefits of each method, ensuring optimal trade-offs between efficiency and accuracy.
Comprehensive Evaluation
Evaluate models using key metrics—including accuracy, inference time, and model size—to directly compare the original and compressed versions.
Custom Model & Dataset Support
Import and utilize your own custom models and datasets. Provide a custom model file path or a custom Python dataset module (which must implement a get_custom_dataset() function returning (train_dataset, val_dataset)).

Getting Started

Prerequisites

Python 3.7+
PyTorch & Torchvision
TIMM for additional model support
Transformers for Hugging Face models
scikit-learn
tensorly
Additional libraries: argparse, pyyaml, logging, wandb, etc.

Installation

Clone the Repository:

git clone https://github.com/jaicdev/QDPStudio.git
cd QDPStudio

Create a Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Configuration:

Edit the config.yaml file to set model parameters, device preference, batch size, learning rate, number of epochs, and compression settings (e.g., prune ratio). Example:
```
device: "cuda"      # Options: "cuda", "cpu", "mps"
model_name: "resnet18"
pretrained: true
hf_model_name: null
timm_model_name: null
prune_ratio: 0.2
```

Usage

Command-Line Interface

QDP Studio is controlled via main.py, which provides a command-line interface to select the dataset and compression techniques.

Example Command:

python main.py --dataset CIFAR10 --prune --quantize --decompose

This command will:

Train a model (default: ResNet18) on the CIFAR10 dataset.
Apply pruning, quantization, and decomposition using the selected modes.
Evaluate and compare the performance of the original and compressed model variants.

Key Arguments:

--dataset: Specify the standard dataset (e.g., CIFAR10, MNIST, ImageNet).
--custom_dataset: Python module name for a custom dataset (must implement a get_custom_dataset() function).
--batch_size: Define the batch size for training and evaluation.
--custom_model: Path to a custom model file (overrides standard model loading via --model_name).
--model_name: Name of a torchvision model (default: "resnet18").
--prune: Apply pruning.
--quantize: Apply quantization.
--decompose: Apply decomposition.
--all: Run all compression techniques sequentially (hybrid approach).
--num_epochs: Number of training epochs.
--quantization_mode: Set quantization mode (default | dynamic | static | qat).
--pruning_mode: Set pruning mode (default | unstructured | structured | iterative).
--decomposition_mode: Set decomposition mode (default | truncatedSVD | tensorDecomposition).
--kd_mode: Set knowledge distillation mode (default | teacher_assisted | temperature_scaling).

Using a Custom Model

If you have a custom model file, use the --custom_model argument:

python main.py --custom_model path/to/your/custom_model.pth --dataset CIFAR10 --prune --quantize --decompose --num_epochs 5

Using a Custom Dataset

Create a Python module (e.g., my_dataset.py) that implements a get_custom_dataset() function. For example:

def get_custom_dataset():
    from torchvision.datasets import FakeData
    from torchvision.transforms import ToTensor
    train_dataset = FakeData(transform=ToTensor())
    val_dataset = FakeData(transform=ToTensor())
    return train_dataset, val_dataset

Then run:

python main.py --custom_dataset my_dataset --custom_model path/to/your/custom_model.pth --prune --quantize --decompose --num_epochs 5

Hybrid Compression Pipeline

Hybrid compression applies all supported techniques sequentially:

Model Training:
Train the base model on your chosen dataset to ensure a strong initial performance.
Sequential Compression:
- Pruning: Remove redundant weights using the selected pruning mode.
- Quantization: Convert model weights to lower precision with the chosen quantization strategy.
- Decomposition: Simplify model layers using the preferred decomposition method.
- Knowledge Distillation: Optionally, further compress the model by transferring knowledge using the selected distillation approach.
Post-Compression Fine-Tuning:
Fine-tune after each compression step to mitigate any loss in accuracy.
Evaluation:
Compare key metrics—including accuracy, inference time, and model size—between the original and compressed models.

Run the Hybrid Pipeline using the --all flag:

python main.py --dataset CIFAR10 --all

Logging & Evaluation

Logging is implemented via Python’s logging module, with optional integration using Weights & Biases (wandb) for comprehensive experimental tracking.
The framework evaluates models on metrics including accuracy, precision, recall, F1-score, and inference latency.
Detailed logging enables monitoring the impact of each compression technique and mode.

Troubleshooting & Tips

Configuration Issues:
Ensure your config.yaml is properly formatted. Invalid configurations may result in runtime errors.
Custom Module Integration:
Verify that any custom dataset module is on your Python path and implements the required get_custom_dataset() function.
Fusion Configurations:
For optimal quantization, consider defining a custom fusion configuration mapping if the default does not meet your model's needs.
Testing:
It is recommended to perform end-to-end tests to ensure that all components integrate seamlessly within the compression pipeline.

Acknowledgements

Supported by the Science, Technology, and Innovation (STI) Policy of Gujarat Council of Science and Technology, Department of Science and Technology, Government of Gujarat, India (Grant Number: GUJCOST/STI/2021-22/3858).
Special thanks to the communities behind PyTorch, Torchvision, TIMM, and Transformers.
If you use this repository in your work, please cite:

@misc{jaicdev2025qdpstudio,
  author       = {jaicdev},
  title        = {QDPStudio: A Comprehensive Model Compression Framework},
  year         = {2025},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/jaicdev/QDPStudio}},
  note         = {Released: February 18, 2025}
}

@ARTICLE{chaudhari2025onboard,
  author={Chaudhari, Jay N. and Galiyawala, Hiren and Sharma, Paawan and Shukla, Pancham and Raval, Mehul S.},
  journal={IEEE Access}, 
  title={Onboard Person Retrieval System With Model Compression: A Case Study on Nvidia Jetson Orin AGX}, 
  year={2025},
  volume={13},
  number={},
  pages={8257-8269},
  doi={10.1109/ACCESS.2025.3527134},
  ISSN={2169-3536},
  month={}
}

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch:
```
git checkout -b feature/my-new-feature
```
Commit your changes:
```
git commit -am 'Add new feature'
```
Push the branch:
```
git push origin feature/my-new-feature
```
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QDP Studio

Features

Getting Started

Prerequisites

Installation

Usage

Command-Line Interface

Using a Custom Model

Using a Custom Dataset

Hybrid Compression Pipeline

Logging & Evaluation

Troubleshooting & Tips

Acknowledgements

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

QDP Studio

Features

Getting Started

Prerequisites

Installation

Usage

Command-Line Interface

Using a Custom Model

Using a Custom Dataset

Hybrid Compression Pipeline

Logging & Evaluation

Troubleshooting & Tips

Acknowledgements

Contributing

License