Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 28 additions & 18 deletions .github/configs/CONFIGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,21 @@ entry-name:
runner: string
precision: string
framework: string
seq-len-configs:
- isl: int
osl: int
search-space:
- { tp: int, conc-start: int, conc-end: int }
# Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention)
- { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int }
scenarios:
fixed-seq-len:
- isl: int
osl: int
search-space:
- { tp: int, conc-start: int, conc-end: int }
# Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention)
- { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int }
- ...
- ...
- ...
agentic-coding: # optional
- trace-source: string
search-space:
- { tp: int, conc-start: int, conc-end: int }
- ...
```
Note: while not required, `entry-name` typically takes the format `<INFMAX_MODEL_PREFIX>-<PRECISION>-<GPU>-<FRAMEWORK>`.

Expand All @@ -32,16 +38,20 @@ The below list describes what each field is:
- `runner`: This is the runner on which to run the benchmark. This must be a valid runner (key or value) from `runners.yaml`.
- `precision`: The precision to run the benchmark. Again, this is used to find which script to run in `benchmarks/`.
- `framework`: The framework (serving runtime) to serve the benchmark, e.g., `vllm`, `sglang`, `trt`.
- `seq-len-configs`: A list of possible sequence lengths to benchmark. Each entry must have the following fields:
- `isl`: An integer representing the input sequence length, e.g., `1024`
- `osl`: An integer representing the output sequence length, e.g., `8192`
- `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields:
- `tp`: An integer representing the tensor parallelism level that the configuration will be served at.
- `conc-start`: An integer representing the starting level of concurrency e.g., `4`
- `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128`
- Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run.
- (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified.
- (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified.
- `scenarios`: A dictionary of benchmark scenario types. At least one must be specified. Currently supported:
- `fixed-seq-len`: Fixed input/output sequence length benchmarks. Each entry must have:
- `isl`: An integer representing the input sequence length, e.g., `1024`
- `osl`: An integer representing the output sequence length, e.g., `8192`
- `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields:
- `tp`: An integer representing the tensor parallelism level that the configuration will be served at.
- `conc-start`: An integer representing the starting level of concurrency e.g., `4`
- `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128`
- Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run.
- (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified.
- (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified.
- `agentic-coding`: Agentic trace replay benchmarks using real conversation traces. Each entry must have:
- `trace-source`: Identifier for the trace dataset to use.
- `search-space`: Same structure as `fixed-seq-len` search-space entries.

Notes:
- No extra fields besides the ones listed may be specified, or else the benchmarks will fail to run.
Expand Down
Loading
Loading