HawAda: Hardware-aware Adaptation

HawAda is an open-source universal framework for hardware-aware optimization of machine learning models, enabling models to be automatically adapted to a specific target GPU.

Inspired by hardware-aware pruning and efficiency methods [1, 2, 3, 4 ], HawAda provides a simple, user-friendly, and model-agnostic interface for optimizing neural networks with respect to real hardware latency, while preserving model quality.

High-level overview

HawAda follows a teacher–student, gate-based optimization pipeline:

Base model: Start from an existing pretrained model (the teacher).
HawAda adapter: Trainable gates are attached to selected layers or components of the base model.Each gate modulates its corresponding output via a learnable sigmoid, enabling fine-grained structural control.
Training objectives: The gates are optimized using two complementary objectives:
- Hardware-aware latency objectiveA differentiable proxy of model latency, periodically calibrated with real measured latency on the target GPU and incorporated via a Lagrange multiplier.
- Teacher agreement objectiveKL divergence (and optionally MSE) between the student and the teacher model outputs, ensuring minimal accuracy degradation.
Pruning / export: After training (or by loading a pretrained HawAda checkpoint, e.g. from Hugging Face), the learned gate values are used to prune the base model.Only components that are critical for maintaining agreement with the teacher are retained.
GPU-aware slimming: During export, HawAda searches for the fastest valid architecture by iterating over GPU-friendly structural constraints (e.g. head multiples), ensuring optimal performance on the target hardware.
Final fine-tuning: The resulting slim model can be fine-tuned on any GPU (not necessarily the one used for hardware-aware optimization) to recover or improve final accuracy.Alternatively, pretrained slim models can be downloaded from the HawAda model zoo.

Speedup

The speedup depends on the target latency and on the level of accuracy you want to reserve. The typical results for ResNet18 and ViT-base are:

Current support

HawAda currently includes reference implementations for:

ResNet-18 optimized for RTX 4090
- Gated model
- Slim model
ViT-Base optimized for RTX 4090
- Gated model
- Slim model
LLaMA-3.2-1B optimized for RTX 4090 and H100
- RTX4090
  - Gated model
  - Slim model
- H100
  - Gated model
  - Slim model

Notebooks

The /notebooks directory contains end-to-end, executable examples demonstrating the full HawAda workflow — from loading pretrained models to exporting and fine-tuning hardware-optimized slim models.

Each notebook covers:

Loading pretrained models from Hugging Face
Measuring real GPU latency
Training HawAda gates with hardware-aware objectives
Exporting and pruning slim models
Optional fine-tuning of the exported models

Available notebooks

ResNet-18: /notebooks/resnet.ipynb
ViT-Base/16 /notebooks/vit.ipynb
LLaMA-3.2-1B /notebooks/llama.ipynb

Repository overview

The HawAda repository is organized around a modular, model-agnostic hardware-aware optimization pipeline, separating model definitions, adapters, training logic, latency modeling, and export utilities.

hawada/
├── .gitignore
├── README.md
├── __init__.py
├── requirements.txt
│
├── adapters/
│   ├── __init__.py
│   ├── huggingface/
│   │   ├── __init__.py
│   │   ├── llama.py
│   │   └── vit.py
│   └── torchvision/
│       └── resnet.py
│
├── core/
│   ├── __init__.py
│   ├── distill.py
│   ├── export.py
│   ├── finetune.py
│   ├── gates.py
│   ├── profiler.py
│   ├── proxy_cost.py
│   ├── search_export.py
│   ├── train.py
│   └── utils.py
│
├── data/
│   ├── __init__.py
│   ├── llms.py
│   └── vision.py
│
├── examples/
│   ├── __init__.py
│   ├── run_llama_optimize.py
│   ├── run_resnet_optimize.py
│   └── run_vit_optimize.py
│
├── notebooks/
│   ├── llama.ipynb
│   ├── resnet.ipynb
│   ├── vit.ipynb
│   └── utils/
│       ├── ViT_pretraining.ipynb
│       └── imagenet224.ipynb
│
├── recipes/
│   ├── H100/
│   │   ├── llama_3_2_1b.yaml
│   │   ├── resnet18_imagenet224.yaml
│   │   └── vit_base_patch16_224.yaml
│   └── RTX4090/
│       ├── llama_3_2_1b.yaml
│       ├── resnet18_imagenet224.yaml
│       └── vit_base_patch16_224.yaml
│
└── tools/
    ├── eval_agreement.py
    ├── eval_llama_slim.py
    ├── export_to_hf.py
    └── print_shapes.py

Design principles

Model-agnostic: Adapters and gates can be attached to arbitrary architectures.
Hardware-aware: Real GPU latency is periodically measured and integrated into training.
Separation of concerns: Training, latency modeling, pruning, and export are cleanly decoupled.
Reproducible: GPU- and model-specific configs fully define an experiment.
HF-compatible: Exported slim models can be pushed directly to Hugging Face.

Typical workflow

Choose a base model and target GPU
Attach HawAda adapters
Train gates using the Lagrangian optimizer
Export and prune the model using learned gates
Fine-tune or deploy the slim model

References

[1] NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications

[2] Once-for-All: Train One Network and Specialize it for Efficient Deployment

[3] MnasNet: Platform-Aware Neural Architecture Search for Mobile

[4] HALP: Hardware-Aware Latency Pruning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HawAda: Hardware-aware Adaptation

High-level overview

Speedup

Current support

Notebooks

Available notebooks

Repository overview

Design principles

Typical workflow

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
adapters		adapters
assets		assets
core		core
data		data
examples		examples
notebooks		notebooks
recipes		recipes
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

HawAda: Hardware-aware Adaptation

High-level overview

Speedup

Current support

Notebooks

Available notebooks

Repository overview

Design principles

Typical workflow

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages