LLM Inference Profiler

A production-quality framework for profiling LLM inference latency, throughput, and memory usage.

Directory Structure

llm-inference-profiler/
├── benchmark/
│   ├── run_inference.py     # Main entry point for benchmarking
│   ├── benchmark_latency.py # Latency measurement logic
│   ├── benchmark_throughput.py # Throughput measurement logic
│   ├── configs.yaml         # Default configuration
│   └── utils.py             # Utilities and Model Management
├── profiling/
│   ├── torch_profiler.py    # GPU Profiling implementation
├── scripts/                 # Convenience scripts
│   ├── run_fp32.sh
│   ├── run_fp16.sh
│   └── run_nsight.sh
├── results/                 # Output directory for CSVs
└── plots/                   # Plotting scripts
    └── plot_results.py

Setup

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Run Benchmark

Run with default configuration (GPT-2, FP16 & FP32, various batch sizes):

python benchmark/run_inference.py

Override parameters:

python benchmark/run_inference.py --model gpt2 --precision fp16 --batch-sizes 1,8 --seq-lens 128

Run using Scripts

./scripts/run_fp16.sh
./scripts/run_fp32.sh

Plot Results

After running benchmarks, generate plots in plots/:

python plots/plot_results.py

Metrics

Latency: End-to-end inference time (ms) for processing inputs.
Throughput: Tokens processed per second.
Memory: Peak GPU memory allocated and reserved.
Utilization: Average GPU compute utilization.

Trade-offs

FP32: Higher precision, increased memory usage, standard performance.
FP16: Mixed precision (via autocast), reduced memory usage, higher throughput on Tensor Cores.

Hugging Face API Support

You can also run this dashboard using the Hugging Face Inference API.

Select "Hugging Face API" in the dashboard sidebar.
Enter your Hugging Face API Token (get one at hf.co/settings/tokens).
Specify the Repository ID of the model you want to test (e.g., mistralai/Mistral-7B-Instruct-v0.2).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmark		benchmark
plots		plots
profiling		profiling
results		results
scripts		scripts
README.md		README.md
dashboard.py		dashboard.py
image copy.png		image copy.png
image.png		image.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Inference Profiler

Directory Structure

Setup

Usage

Run Benchmark

Run using Scripts

Plot Results

Metrics

Trade-offs

Hugging Face API Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Inference Profiler

Directory Structure

Setup

Usage

Run Benchmark

Run using Scripts

Plot Results

Metrics

Trade-offs

Hugging Face API Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages