Skip to content

kkumarM/gparx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPARX — GPU Performance Analysis & Reasoning eXplorer

End-to-end simulator with Go backend and React/Tailwind web UI. Models request pipelines (preprocess → h2d → compute → d2h → postprocess), queueing, GPU bandwidth/compute constraints, and emits Chrome trace JSON for visualization. Supports overlaying real Nsight Systems traces (sqlite) for comparison.

gPARX is an AI-ready GPU inference performance advisor for LLM serving, benchmark analysis, and accelerated workload planning.

It helps users estimate, diagnose, and explain GPU workload behavior before and after deployment by combining:

  • LLM cost and capacity planning
  • scenario-based workload simulation
  • timeline visualization
  • benchmark and trace interpretation
  • rule-based advisor diagnosis
  • sweep and comparison workflows
  • future-ready LLM assistant interfaces

Requirements

  • Go (1.18 here; code targets 1.22 features lightly)
  • Node 18+ / npm for the web UI
  • Optional: Nsight Systems CLI (nsys) if you want to export .nsys-rep to sqlite

Build & Test

make test          # go test ./...
make build         # build backend binary bin/sim-api
(cd web && npm run test)  # vitest for trace parser
(cd web && npm run build) # web production build

Run backend

make build
./bin/sim-api       # serves on :8080

Dev mode (backend + web dev proxy):

make dev            # web on :5173, backend on :8080

Web UI

Located in web/ (Vite + React + Tailwind). Run:

cd web
npm install
npm run dev    # http://localhost:5173 (proxied to :8080)
# or build
npm run build

UI features:

  • Scenario builder (pipeline stages, jitter, presets) with inline validation
  • Embedded timeline/Gantt with lanes (QUEUE, CPU, H2D, GPU, D2H), zoom/scrub/playback, tooltips, request details
  • Live counters during playback (queued, GPU, transfer, CPU); bottleneck cues (GPU saturated / queue forming)
  • Run history and basic run comparison
  • Trace download link
  • gPARX Advisor for rule-driven bottleneck diagnosis and recommendations

API (sim-api)

  • POST /v1/scenarios{scenario_id}
  • GET /v1/scenarios/{id} → scenario JSON
  • POST /v1/runs with { "scenario_id": "..." } or { "scenario": { ... } }{ run_id, summary, breakdown, artifacts.trace }
  • GET /v1/runs/{id} → run summary
  • GET /v1/runs/{id}/breakdown → per-stage/per-request breakdown
  • GET /v1/runs/{id}/trace → Chrome trace JSON
  • POST /v1/agent/diagnose → advisor diagnosis based on run or benchmark payload
  • Real traces (Nsight Systems):
    • POST /v1/realtraces (multipart upload sqlite or .nsys-rep) → { real_trace_id }
    • GET /v1/realtraces/{id}/trace → real Chrome trace JSON
    • GET /v1/realtraces/{id}/metrics → parsed metrics

Artifacts layout:

artifacts/
  runs/<run_id>/...            # (reserved for future persistence)
  realtraces/<id>/trace.json
  realtraces/<id>/metrics.json
  realtraces/<id>/uploaded.*

Example scenario JSON

{
  "name": "demo",
  "workload": { "name": "wl", "rps": 2, "duration_s": 10, "batch_size": 1, "jitter_pct": 5 },
  "target": {
    "name": "L40",
    "tflops": 180,
    "mem_gbps": 3000,
    "ms_per_token": 0.2,
    "h2d_gbps": 32,
    "d2h_gbps": 32,
    "concurrency": 4
  },
  "pipeline": [
    { "name": "preprocess", "kind": "fixed_ms", "value": 2 },
    { "name": "h2d", "kind": "bytes", "value": 8388608 },
    { "name": "compute", "kind": "tokens", "value": 128 },
    { "name": "d2h", "kind": "bytes", "value": 2097152 },
    { "name": "postprocess", "kind": "fixed_ms", "value": 1 }
  ]
}

Quick run via curl:

curl -X POST http://localhost:8080/v1/runs \
  -H 'Content-Type: application/json' \
  -d @scenario.json
curl http://localhost:8080/v1/runs/<run_id>/trace -o trace.json

Open trace.json in Chrome/Perfetto (chrome://tracing) if desired; UI already embeds a timeline.

Nsight Systems ingestion

  1. Generate sqlite from .nsys-rep (if nsys available):
nsys profile -o report <your_command>
nsys export --type sqlite report.nsys-rep
  1. CLI ingest:
go run ./cmd/profiler-agent ingest-nsys --sqlite report.sqlite --out trace.json --out-metrics metrics.json
  1. Upload to backend (UI supports upload):
curl -F "file=@report.sqlite" http://localhost:8080/v1/realtraces

Then fetch: /v1/realtraces/<id>/trace and /v1/realtraces/<id>/metrics.

Timeline overlay (UI)

  • TimelineViewer renders sim trace; upload a real trace to view real spans too (stacked/overlay modes planned).
  • Playback controls: play/pause, scrub, speed; live counters show queued/GPU/transfer/CPU active counts; bottleneck tags when GPU saturated or queue forming.
  • Legend/help panel explains lanes and overlap.

Capacity planning (roadmap)

Coming modules will estimate GPU count for target RPS/SLA using simulated service times and simple scaling assumptions.

Make targets

  • make test → go test ./...
  • make build → build sim-api
  • make dev → run backend + web dev server
  • make web-build → npm install + npm run build

Notes

  • Offline friendly; no external services required at runtime.
  • If nsys is not installed, export-nsys reports a clear error; sqlite ingestion still works if you have the export.

About

GPU inference performance advisor for LLM cost estimation, capacity planning, benchmark diagnosis, and workload optimization.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors