[RFC] Production Optimizations: GPU-Resident Inference & Fine-Tuning Efficiency

Hi @abdulfatir and the Chronos team,

First, I sincerely apologize for the disruption caused by my previous PRs (#454, #456). I understand that opening significant architectural changes without prior discussion creates unnecessary noise, especially when they deviate from the project's core roadmap.

I am currently deploying Chronos in a high-throughput production environment and have identified two specific bottlenecks. I wanted to share my findings and ask if architectural support for these use cases aligns with your long-term goals.

### 1. High-Throughput Inference (Removing the CPU-GPU Sync)
I profiled the `predict()` loop and noticed that moving tensors between CPU and GPU at every generation step acts as a significant bottleneck for low-latency applications.

*   **Experiment:** I implemented a generation loop that keeps the context and predictions entirely on VRAM until completion.
*   **Result:** On local benchmarks (MPS/CUDA), this yielded a **~5x improvement in throughput** for batch inference.
*   **Proposal:** Instead of modifying the core `ChronosModel`, would you be open to an optional `ChronosFastPipeline` (or similar utility) specifically designed for production inference where latency is critical?

### 2. Static Covariates for Fine-Tuning
I reviewed the discussion in #352 and understand that pretrained checkpoints do not support static covariates. However, for users fine-tuning on retail datasets (where item metadata is constant), repeating static features across the temporal dimension significantly increases memory usage.

*   **Proposal:** Would you consider accepting a `static_embedding` module in the architecture that is **disabled by default**?
*   **Benefit:** This would allow advanced users to fine-tune custom models with metadata efficiently, without breaking compatibility for users of the pretrained checkpoints.

I am happy to keep these optimizations in my own fork if they are out of scope, but I wanted to offer them properly in case they benefit the community.

Thanks for your work on this SOTA model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Production Optimizations: GPU-Resident Inference & Fine-Tuning Efficiency #460

1. High-Throughput Inference (Removing the CPU-GPU Sync)

2. Static Covariates for Fine-Tuning

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Production Optimizations: GPU-Resident Inference & Fine-Tuning Efficiency #460

Description

1. High-Throughput Inference (Removing the CPU-GPU Sync)

2. Static Covariates for Fine-Tuning

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions