Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
196 commits
Select commit Hold shift + click to select a range
c12bc12
initial plumming
cquil11 Apr 20, 2026
faf3821
feat: add agentic benchmark scripts and support utilities
cquil11 Apr 20, 2026
01b1509
feat: add scenario-type routing to benchmark workflow and runners
cquil11 Apr 20, 2026
6f628fa
feat: port agentic trace configs to master YAML format
cquil11 Apr 20, 2026
7aa3cc2
fix: rename agentic scripts to follow model-prefix convention
cquil11 Apr 20, 2026
d7fdcbd
refactor: clean up agentic benchmark scripts and shared helpers
cquil11 Apr 20, 2026
dc1d50e
feat: add agentic-coding matrix entry generation
cquil11 Apr 20, 2026
b0f671c
feat: wire agentic-coding through run-sweep.yml and benchmark template
cquil11 Apr 20, 2026
f314160
feat: add agentic results upload and status check to benchmark template
cquil11 Apr 20, 2026
a8f16e1
feat: add process_agentic_result.py for aggregated benchmark JSON
cquil11 Apr 20, 2026
b5b7f6f
fix: always include cache hit rate fields in agentic results (null wh…
cquil11 Apr 20, 2026
ef5592a
feat: add agentic support to e2e-tests.yml
cquil11 Apr 20, 2026
1b3e92e
fix: hardcode HF trace source to semianalysisai/cc-traces-weka-042026
cquil11 Apr 20, 2026
07c5ba9
fix: wait for agentic jobs before collect-results in e2e-tests
cquil11 Apr 20, 2026
d5b023d
fix: use ROCm vLLM image for MI355X agentic config
cquil11 Apr 20, 2026
1fc8373
fix: bump MI355X agentic image to vllm-openai-rocm:v0.19.1
cquil11 Apr 20, 2026
c9058c0
fix: handle unset vars in check_env_vars with set -u
cquil11 Apr 20, 2026
ac6473d
fix: set RESULT_DIR env var in benchmark template
cquil11 Apr 20, 2026
3502e64
fix: enable submodule checkout in benchmark template
cquil11 Apr 20, 2026
ffc0dcd
fix: validate agentic success and handle missing isl in summarize
cquil11 Apr 21, 2026
cb5774a
feat: add KV offload stats to agentic result JSON
cquil11 Apr 21, 2026
c451306
fix: remove MI355X offloadoff config and add always() to artifact upload
cquil11 Apr 21, 2026
9e2b236
fix: remove --max-consecutive-errors (not a CLI arg)
cquil11 Apr 21, 2026
88fe663
fix: skip hf download when MODEL is a local path
cquil11 Apr 21, 2026
e183826
fix: don't rewrite MODEL to local path for agentic scenarios on DGXC-…
cquil11 Apr 21, 2026
c02b8f1
feat: enable --no-max-tokens by default for agentic benchmarks
cquil11 Apr 21, 2026
c5641b5
fix: upload server.log from results/ dir for agentic jobs
cquil11 Apr 21, 2026
de1cbe7
Revert "fix: upload server.log from results/ dir for agentic jobs"
cquil11 Apr 21, 2026
60c973c
fix: include results/server.log in agentic raw results artifact
cquil11 Apr 21, 2026
aeed72f
fix: upload correct server.log path for agentic scenarios
cquil11 Apr 21, 2026
bb0e591
fix: increase CSV field size limit in check_agentic_success and proce…
cquil11 Apr 21, 2026
ae902d9
feat: add agentic configs and scripts for H200, H100, MI300X, MI325X
cquil11 Apr 21, 2026
eb8f9ff
fix: use existing gptoss-fp4-h200-vllm for H200 agentic benchmarks
cquil11 Apr 21, 2026
5b3fb72
fix: add trust-remote-code for Kimi K2.5 and fix HMA for H100/H200
cquil11 Apr 21, 2026
7586399
fix: update all agentic configs to vLLM v0.19.1
cquil11 Apr 21, 2026
031d6ee
fix: clean stale results/ dir before agentic benchmark launch
cquil11 Apr 21, 2026
3e1c6c0
Revert "fix: clean stale results/ dir before agentic benchmark launch"
cquil11 Apr 21, 2026
f2bb328
fix: remove stale status.txt before runner launch
cquil11 Apr 21, 2026
2404297
feat: add metrics plots to agentic artifacts and collect-agentic-resu…
cquil11 Apr 21, 2026
7821267
fix: add --max-model-len 131072 to all agentic scripts
cquil11 Apr 21, 2026
707fd10
Revert "fix: add --max-model-len 131072 to all agentic scripts"
cquil11 Apr 21, 2026
7181314
feat: fix collect_sweep_results.py parser and save B200 v6 results
cquil11 Apr 21, 2026
88362a9
fix: fix plot generation in collect_sweep_results and plot_pareto
cquil11 Apr 21, 2026
f1c910e
chore: remove results from tracking and add to .gitignore
cquil11 Apr 21, 2026
e872f86
feat: add duration field to agentic configs and fix YAML formatting
cquil11 Apr 21, 2026
d625608
fix: properly add results/ to .gitignore (fix newline)
cquil11 Apr 21, 2026
50c5377
fix: restore master YAML formatting from main and re-apply migration
cquil11 Apr 21, 2026
442c656
feat: add SGLang agentic benchmark support for B200 DSR1 FP4
cquil11 Apr 21, 2026
bb4eb54
fix: remove duplicate agentic-coding block from B300 area
cquil11 Apr 21, 2026
f2b5472
adding b200 sglang
cquil11 Apr 21, 2026
e787e59
chore: size sglang batch/running-requests to $USERS, drop command echo
cquil11 Apr 21, 2026
47f6ae6
adding b200 sglang pt 2
cquil11 Apr 21, 2026
7c25f7b
adding mi355x sglang dsr
cquil11 Apr 21, 2026
1b5d50a
fix: force-upgrade datasets in install_agentic_deps
cquil11 Apr 21, 2026
495db32
remove no prefix
cquil11 Apr 21, 2026
55bb441
fix: use simple KV offload on H100/H200 (native stays on MI300/MI325)
cquil11 Apr 21, 2026
cbd195c
fix: wipe results/ between jobs and bump datasets pin to >=4.7
cquil11 Apr 21, 2026
fc61268
fix: rewrite MODEL to mount path for agentic too on b200-dgxc-slurm
cquil11 Apr 21, 2026
5bf3a47
fix: unbuffered stdout for agentic metrics collector
cquil11 Apr 21, 2026
61f77d0
fix b200 model path
cquil11 Apr 21, 2026
91dbf86
fix b200 model path
cquil11 Apr 21, 2026
3a01747
fix b200 model path
cquil11 Apr 21, 2026
5d4fa4d
bump kv-cache-tester: remove budget concept
cquil11 Apr 21, 2026
01afd15
fix: add --enable-metrics to sglang agentic scripts
cquil11 Apr 21, 2026
7f05c5a
set duration: 300 for gptoss agentic configs on h100/h200/mi300/mi325
cquil11 Apr 21, 2026
7a7c246
fix: guard None in process_agentic_result print statements
cquil11 Apr 21, 2026
2d2d7ec
fix: trim idle leading snapshots before plotting and CSV export
cquil11 Apr 21, 2026
530c4ba
bump b200/mi355x sglang agentic duration to 1800s
cquil11 Apr 21, 2026
271132a
fix: drop kv-cache-dtype fp8 + async-scheduling from gptoss h100/h200…
cquil11 Apr 21, 2026
23021ff
keep async-scheduling: true on h100/h200 agentic (only kv-cache-dtype…
cquil11 Apr 21, 2026
3ace2e2
fix: explicitly set --max-num-batched-tokens 8192 on mi300x/mi325x ag…
cquil11 Apr 21, 2026
04712af
bump mi300x/mi325x gptoss vllm to v0.19.1, drop explicit max-num-batc…
cquil11 Apr 21, 2026
943d5eb
drop max-num-batched-tokens from h100/h200 gptoss agentic config
cquil11 Apr 21, 2026
6b6592f
bump kv-cache-tester: infinite-cache hit rate, drop prev-only metric
cquil11 Apr 21, 2026
dbaa10f
fix: treat MAX_MODEL_LEN=0 as unset in gptoss agentic scripts
cquil11 Apr 21, 2026
987aabe
fix: compute SGLang prefix cache hit rate from cumulative token counters
cquil11 Apr 21, 2026
21868cd
diag: log filesystem state after srun to track down mi325x missing re…
cquil11 Apr 21, 2026
42d58e7
fix: lower --gpu-memory-utilization to 0.85 on mi300x/mi325x agentic
cquil11 Apr 21, 2026
ab25cc7
bump kv-cache-tester: remove ramp/cooldown user-scaling logic
cquil11 Apr 21, 2026
8f1800d
set all agentic durations to 300s for 5-min sweep runs
cquil11 Apr 21, 2026
783dfa5
fix: scancel_sync on mi325x so NFS flushes results/ before status.txt…
cquil11 Apr 21, 2026
657c7f2
unify agentic + fixed-seq-len success check on workspace-root sentinel
cquil11 Apr 21, 2026
9013f53
write real agentic result json at workspace root, drop sentinel
cquil11 Apr 21, 2026
0df0644
revert scancel_sync on mi325x — retry-based result check handles NFS lag
cquil11 Apr 21, 2026
cf57ed9
bump b200 and mi355x agentic durations to 1800s
cquil11 Apr 21, 2026
48fc75e
bump h100/h200/mi300x/mi325x gptoss agentic durations to 1800s
cquil11 Apr 21, 2026
07f559f
pre-cache HF traces dataset via 'hf download --repo-type dataset'
cquil11 Apr 21, 2026
0b8ed61
Add multinode agentic trace replay
cquil11 Apr 21, 2026
e0901e6
Use sa-bench recipe for multinode agentic runs
cquil11 Apr 21, 2026
92a0835
Restore multinode agentic trace outputs
cquil11 Apr 21, 2026
5ec410a
collect_sweep_results: derive SUCCESS from detailed_results.csv, drop…
cquil11 Apr 21, 2026
494e28b
Use shallow checkout for multinode benchmark
cquil11 Apr 21, 2026
956618d
Clean multinode runner workspace on checkout
cquil11 Apr 22, 2026
5b7b4ab
Checkout submodules for multinode benchmarks
cquil11 Apr 22, 2026
4f7dee3
bump kv-cache-tester: drop ttft_headroom_pct + threshold log noise
cquil11 Apr 22, 2026
8a85aee
add multinode
cquil11 Apr 22, 2026
2d5f6f1
remove no max tokens
cquil11 Apr 22, 2026
cf7e03a
increase duration
cquil11 Apr 22, 2026
766b28e
increase duration
cquil11 Apr 22, 2026
d0a90ee
increase duration
cquil11 Apr 22, 2026
f7c582e
plot_pareto: derive SUCCESS from detailed_results.csv, drop status.tx…
cquil11 Apr 22, 2026
e7cd65a
align agentic JSON field names with fixed-seq-len
cquil11 Apr 22, 2026
b2e056b
agentic agg: add p99.9 + std to workload and qps stats
cquil11 Apr 22, 2026
50ec0be
add offload mode to mi30
cquil11 Apr 22, 2026
0ba282a
set clean: true on benchmark-tmpl checkout
cquil11 Apr 22, 2026
96a4813
more h100 h200 conc
cquil11 Apr 22, 2026
6284a63
plumb per-config no-max-tokens through agentic pipeline
cquil11 Apr 22, 2026
f368090
set no-max-tokens: true on all agentic entries except gb200
cquil11 Apr 22, 2026
3dd16f9
point kv-cache-tester submodule at agentx-minimized branch
cquil11 Apr 22, 2026
dd903cb
bump kv-cache-tester: restore vocabulary.py (lazy-imported by replayer)
cquil11 Apr 22, 2026
ab85b83
bump kv-cache-tester: self-contained requirements + doc cleanup
cquil11 Apr 22, 2026
0ec7d68
chore: update trace replay submodule
cquil11 Apr 23, 2026
90d96b2
chore: update trace replay submodule
cquil11 Apr 23, 2026
64fa8c9
drop tp=1 from h100 gptoss agentic — doesn't fit 80GB reliably
cquil11 Apr 23, 2026
6487b80
Add GB300 agentic multinode sweep
cquil11 Apr 23, 2026
e752286
Add GB200 SGLang agentic sweep
cquil11 Apr 23, 2026
64802db
refactor: integrate metrics collector into trace replayer
cquil11 Apr 23, 2026
cbd63a9
Resolve GB300 model paths dynamically
cquil11 Apr 23, 2026
c6fd249
bump trace-replay: replace print() with logger in server_metrics
cquil11 Apr 23, 2026
c4ebe3e
bump trace-replay: single-message metrics summary
cquil11 Apr 23, 2026
874e0f9
Fix GB300 model path resolution
cquil11 Apr 23, 2026
5883111
Add B200 SGLang agentic sweep
cquil11 Apr 23, 2026
42f0fc0
Clean up broken GB300 squash symlinks
cquil11 Apr 23, 2026
656c3f4
Add H200 SGLang agentic sweep
cquil11 Apr 23, 2026
a377402
feat: enable warmup for agentic benchmarks
cquil11 Apr 23, 2026
97d22bc
Create GB300 log directory after submit
cquil11 Apr 23, 2026
e7a84cc
Add B200 FP8 agentic sweep
cquil11 Apr 23, 2026
1a3d367
Add GB200 FP8 agentic sweep
cquil11 Apr 23, 2026
70e6422
Add H100 FP8 agentic sweep
cquil11 Apr 23, 2026
f1d7ec3
Validate GB300 squash image imports
cquil11 Apr 23, 2026
532e553
Use GB300 batch_1 partition
cquil11 Apr 23, 2026
a7985ba
Retry GB300 squash visibility after import
cquil11 Apr 23, 2026
e12c3e5
fix: update trace replay warmup handling
cquil11 Apr 23, 2026
81b3862
test: temp 10-min duration for h200 agentic warmup test
cquil11 Apr 23, 2026
d32fc69
revert: restore h200 agentic duration to 1800s
cquil11 Apr 23, 2026
27cce8c
Add GB300 FP8 agentic config
cquil11 Apr 23, 2026
3df4950
bump trace-replay: stop metrics collector before tail drain
cquil11 Apr 23, 2026
792a6eb
Add B300 TRT agentic config
cquil11 Apr 23, 2026
356bd1d
Add B300 FP4 TRT agentic config
cquil11 Apr 23, 2026
7dee309
Use forked srt-slurm for B300 multinode
cquil11 Apr 23, 2026
bacb95d
Allow agentic deps install in managed Python
cquil11 Apr 23, 2026
f6b50f3
Set explicit max tokens for B300 TRT agentic
cquil11 Apr 23, 2026
42204e5
test: temp 10-min duration for h200 agentic warmup re-test
cquil11 Apr 23, 2026
4ada1b8
revert: restore h200 agentic duration to 1800s
cquil11 Apr 23, 2026
b7e3711
Use explicit SRT integration branch clones
cquil11 Apr 23, 2026
bd3826f
Add GB300 TRT agentic configs
cquil11 Apr 23, 2026
c0246b7
Add GB200 FP8 TRT agentic config
cquil11 Apr 23, 2026
86e2314
Add B200 TRT agentic configs
cquil11 Apr 23, 2026
818245e
Add H100 and H200 TRT agentic configs
cquil11 Apr 23, 2026
41151a3
feat: extend mi300x agentic offload conc range to 256
cquil11 Apr 23, 2026
cfe90c0
Expand B300 TRT agentic concurrencies
cquil11 Apr 23, 2026
d9d3ee6
Ensure HF CLI for agentic traces
cquil11 Apr 23, 2026
f7063bd
Align GB300 TRT agentic decode DP metadata
cquil11 Apr 23, 2026
b5ba876
Exclude bad GB300 node from agentic launches
cquil11 Apr 23, 2026
397d7a4
Apply GB300 node exclusion in recipes
cquil11 Apr 23, 2026
89b9cb5
refactor: rename cpu-offloading flag to offloading enum
cquil11 Apr 23, 2026
a174186
Restore GB200 agentic benchmark duration
cquil11 Apr 23, 2026
1820fa9
Use TP8 GB200 FP4 agentic topology
cquil11 Apr 23, 2026
9a06d22
Match GB200 FP4 agentic decode topology
cquil11 Apr 23, 2026
8456a8e
fix: pin workflow checkouts to github.sha instead of github.ref
cquil11 Apr 23, 2026
c52fdee
Expand FP8 SGLang agentic topology sweeps
cquil11 Apr 24, 2026
c1f0fcb
Enable no-max-tokens for FP8 SGLang agentic sweeps
cquil11 Apr 24, 2026
d4e9747
refactor: remove no-max-tokens plumbing, force exact trace outputs
cquil11 Apr 24, 2026
6d53b27
test: temp 10-min duration for b200 agentic sanity test
cquil11 Apr 24, 2026
5668dab
revert: restore b200 agentic duration to 1800s
cquil11 Apr 24, 2026
c3ec900
test: temp 10-min duration for b200 agentic rerun
cquil11 Apr 24, 2026
28d2473
revert: restore b200 agentic duration to 1800s
cquil11 Apr 24, 2026
6d0cb4f
Skip blocked GB300 agentic 18-node points
cquil11 Apr 24, 2026
b61e065
bump trace-replay: cumulative-only assessment period output
cquil11 Apr 24, 2026
89f7c2a
test: temp 10-min duration for mi355x dsr1 agentic sanity test
cquil11 Apr 24, 2026
9d9737e
bump trace-replay: warmup timeout to 900s
cquil11 Apr 24, 2026
d8e7a80
bump trace-replay: guard over-context prompts
cquil11 Apr 24, 2026
fd90279
bump trace-replay: aligned-table assessment period format
cquil11 Apr 24, 2026
fc08c4e
debug-trace: plumb workflow flag + bump trace-replay for token-id cap…
cquil11 Apr 27, 2026
070c09f
e2e-tests: add duration-override input for quick smoke runs
cquil11 Apr 27, 2026
5a9bf52
drop --no-color flag from replayer invocation
cquil11 Apr 27, 2026
567bf6d
bump trace-replay: drop dead Colors + MODEL_DEFAULTS
cquil11 Apr 27, 2026
bf5e8fa
bump trace-replay: drop dead CSV columns
cquil11 Apr 27, 2026
6015cbe
bump trace-replay: full-chunk capture in debug_trace.jsonl
cquil11 Apr 27, 2026
0db5cfb
bump trace-replay: drop rate-limiting, fix period-header prefix
cquil11 Apr 27, 2026
c20248d
bump trace-replay: per-model delta-field abstraction
cquil11 Apr 27, 2026
483e70c
bump trace-replay: ISL metric uses server's usage.prompt_tokens
cquil11 Apr 27, 2026
51498e9
bump trace-replay: period header counts up elapsed time
cquil11 Apr 27, 2026
153fadd
bump trace-replay: append reasoning to conversation history
cquil11 Apr 27, 2026
437ee70
bump trace-replay: per-user salt to break cross-user KV-cache match
cquil11 Apr 27, 2026
9cadc71
bump trace-replay: "Wait time" → "Inter-turn time" in period summary
cquil11 Apr 27, 2026
874b1fe
bump trace-replay: 5s quiesce between warmup and metrics start
cquil11 Apr 27, 2026
9541ae8
add kimik2.5-fp4-b200-vllm agentic-coding scenario
cquil11 Apr 27, 2026
539beb5
runners: rename launch_b200-dgxc-slurm.sh to launch_b200-dgxc.sh
cquil11 Apr 27, 2026
57bf46d
remove test-file changes from PR and pin submodule
cquil11 Apr 27, 2026
0fdc559
restore branch tracking on trace-replay submodule
cquil11 Apr 27, 2026
9813fe8
cleanup coreweave runners
cquil11 Apr 27, 2026
45d3c5f
kimik2.5-fp4-b200-vllm: bump image to vllm/vllm-openai:v0.19.1
cquil11 Apr 27, 2026
c5566f9
agentic-coding: prune redundant low-conc offload-cpu points
cquil11 Apr 27, 2026
3aa813f
bump trace-replay: register kimi in per-model delta-field map
cquil11 Apr 27, 2026
337fda0
bump trace-replay: silence kimi tokenization warning flood
cquil11 Apr 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 28 additions & 18 deletions .github/configs/CONFIGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,21 @@ entry-name:
runner: string
precision: string
framework: string
seq-len-configs:
- isl: int
osl: int
search-space:
- { tp: int, conc-start: int, conc-end: int }
# Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention)
- { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int }
scenarios:
fixed-seq-len:
- isl: int
osl: int
search-space:
- { tp: int, conc-start: int, conc-end: int }
# Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention)
- { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int }
- ...
- ...
- ...
agentic-coding: # optional
- trace-source: string
search-space:
- { tp: int, conc-start: int, conc-end: int }
- ...
```
Note: while not required, `entry-name` typically takes the format `<INFMAX_MODEL_PREFIX>-<PRECISION>-<GPU>-<FRAMEWORK>`.

Expand All @@ -32,16 +38,20 @@ The below list describes what each field is:
- `runner`: This is the runner on which to run the benchmark. This must be a valid runner (key or value) from `runners.yaml`.
- `precision`: The precision to run the benchmark. Again, this is used to find which script to run in `benchmarks/`.
- `framework`: The framework (serving runtime) to serve the benchmark, e.g., `vllm`, `sglang`, `trt`.
- `seq-len-configs`: A list of possible sequence lengths to benchmark. Each entry must have the following fields:
- `isl`: An integer representing the input sequence length, e.g., `1024`
- `osl`: An integer representing the output sequence length, e.g., `8192`
- `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields:
- `tp`: An integer representing the tensor parallelism level that the configuration will be served at.
- `conc-start`: An integer representing the starting level of concurrency e.g., `4`
- `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128`
- Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run.
- (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified.
- (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified.
- `scenarios`: A dictionary of benchmark scenario types. At least one must be specified. Currently supported:
- `fixed-seq-len`: Fixed input/output sequence length benchmarks. Each entry must have:
- `isl`: An integer representing the input sequence length, e.g., `1024`
- `osl`: An integer representing the output sequence length, e.g., `8192`
- `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields:
- `tp`: An integer representing the tensor parallelism level that the configuration will be served at.
- `conc-start`: An integer representing the starting level of concurrency e.g., `4`
- `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128`
- Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run.
- (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified.
- (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified.
- `agentic-coding`: Agentic trace replay benchmarks using real conversation traces. Each entry must have:
- `trace-source`: Identifier for the trace dataset to use.
- `search-space`: Same structure as `fixed-seq-len` search-space entries.

Notes:
- No extra fields besides the ones listed may be specified, or else the benchmarks will fail to run.
Expand Down
Loading
Loading