SemiAnalysisAI · cquil11 · Apr 27, 2026
diff --git a/.github/configs/CONFIGS.md b/.github/configs/CONFIGS.md
@@ -12,15 +12,21 @@ entry-name:
   runner: string
   precision: string
   framework: string
-  seq-len-configs:
-  - isl: int
-    osl: int
-    search-space:
-    - { tp: int, conc-start: int, conc-end: int }
-    # Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention)
-    - { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int }
+  scenarios:
+    fixed-seq-len:
+    - isl: int
+      osl: int
+      search-space:
+      - { tp: int, conc-start: int, conc-end: int }
+      # Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention)
+      - { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int }
+      - ...
     - ...
-  - ...
+    agentic-coding:  # optional
+    - trace-source: string
+      search-space:
+      - { tp: int, conc-start: int, conc-end: int }
+      - ...
 ```
 Note: while not required, `entry-name` typically takes the format `<INFMAX_MODEL_PREFIX>-<PRECISION>-<GPU>-<FRAMEWORK>`.
 
@@ -32,16 +38,20 @@ The below list describes what each field is:
 - `runner`: This is the runner on which to run the benchmark. This must be a valid runner (key or value) from `runners.yaml`.
 - `precision`: The precision to run the benchmark. Again, this is used to find which script to run in `benchmarks/`.
 - `framework`: The framework (serving runtime) to serve the benchmark, e.g., `vllm`, `sglang`, `trt`.
-- `seq-len-configs`: A list of possible sequence lengths to benchmark. Each entry must have the following fields:
-  - `isl`: An integer representing the input sequence length, e.g., `1024`
-  - `osl`: An integer representing the output sequence length, e.g., `8192`
-  - `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields:
-    - `tp`: An integer representing the tensor parallelism level that the configuration will be served at.
-    - `conc-start`: An integer representing the starting level of concurrency e.g., `4`
-    - `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128`
-    - Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run.
-    - (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified.
-    - (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified.
+- `scenarios`: A dictionary of benchmark scenario types. At least one must be specified. Currently supported:
+  - `fixed-seq-len`: Fixed input/output sequence length benchmarks. Each entry must have:
+    - `isl`: An integer representing the input sequence length, e.g., `1024`
+    - `osl`: An integer representing the output sequence length, e.g., `8192`
+    - `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields:
+      - `tp`: An integer representing the tensor parallelism level that the configuration will be served at.
+      - `conc-start`: An integer representing the starting level of concurrency e.g., `4`
+      - `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128`
+      - Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run.
+      - (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified.
+      - (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified.
+  - `agentic-coding`: Agentic trace replay benchmarks using real conversation traces. Each entry must have:
+    - `trace-source`: Identifier for the trace dataset to use.
+    - `search-space`: Same structure as `fixed-seq-len` search-space entries.
 
 Notes:
 - No extra fields besides the ones listed may be specified, or else the benchmarks will fail to run.