llm_bench: add --targets for multi-deployment load testing#116
llm_bench: add --targets for multi-deployment load testing#116aidando73 wants to merge 5 commits into
Conversation
Adds a repeatable --targets flag (format 'url[|model][|api_key]') so a single load test can drive multiple deployments at once. Users are assigned round-robin across targets, and each request is tagged with the target URL in the locust stats name so per-target latency and throughput are visible without running separate locust processes. When --targets is unset, behavior is unchanged.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2dc7362. Configure here.
| stream=True, | ||
| catch_response=True, | ||
| timeout=60, | ||
| name=f"[{self._target['label']}] {self.provider_formatter.get_url()}", |
There was a problem hiding this comment.
Request stat names change even without --targets flag
Medium Severity
The name parameter is unconditionally applied to the request, changing locust stats row names from the bare path (e.g. /v1/chat/completions) to [https://host] /v1/chat/completions even when --targets is not used. The PR promises backward-compatibility when --targets is unset, but this always-on label prefix breaks existing dashboards, scripts, or --max-fail-ratio monitoring that matches stats by request name.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2dc7362. Configure here.
Multiply num_users and spawn_rate by the number of --targets in an events.init hook so '-u 100 --targets a --targets b' spawns 200 total users (100 per target). Matches what 'load per deployment' actually means in side-by-side comparisons, removes the manual multiplication step from the call site.
Multi-target runs intentionally assign a different model per user, but notify_init asserts every user shares identical logging_params. Skip the 'model' key in that comparison when --targets is non-empty; the single-target assertion is unchanged.
Dranoxgithub
left a comment
There was a problem hiding this comment.
Wonder why not wrapping the script to call the entry point twice instead?
locust -f load_test.py --headless -u 100 -r 10 --host=... --model=.../clone-1 &
locust -f load_test.py --headless -u 100 -r 10 --host=... --model=.../clone-2 &
wait
This will avoid layering more complexities.
|
You are probably right |
The previous label was the target URL, which collapsed stats into one row when two targets share the same host but differ by model (common with Fireworks deployment-pinned model strings like 'accounts/x/models/y#accounts/x/deployments/z'). Extract the deployment id from the '#' suffix when present, otherwise fall back to 'url model'.


Summary
--targetsflag (formaturl[|model][|api_key]) so one locust run can drive multiple deployments.--host/--model/--api-keyact as fallbacks for fields left blank in a per-target spec.-u/--usersis interpreted per-target: with N targets, total locust users =users × N(so the value passed is "load per deployment").--spawn-rateis scaled the same way.[<target_url>] /<path>in the locust stats name, so per-target latency / throughput / failure rate show up as separate rows in the summary.--targetsis unset, behavior is unchanged.Motivation
Use case: comparing two pyroworks GB200 deployments (different prefill counts) under matched synthetic load without spinning up two locust processes. The native traffic-forker was blocked by an arch mismatch (forker image is amd64-only, deployments run on arm64 GB200 nodes), so doing it from locust is the next-best option for now.
Usage
Test plan
python -c "import ast; ast.parse(open('llm_bench/load_test.py').read())"passes (verified locally)--targets) — confirm old behavior unchanged-u 100— confirm total spawned users = 200, stats table shows one row per target, request counts split roughly evenly--api-keyhonored when provided in the target specCaveats
_target_counteris a class attr — in distributed locust each worker has its own counter (round-robin within worker, not globally). Fine for matched-load comparisons; not a perfectly even split across workers.Note
Low Risk
Changes are confined to the llm_bench Locust script and remain backward-compatible when
--targetsis omitted; no production auth or data paths are touched.Overview
Adds repeatable
--targets(url[|model][|api_key]) so one Locust run can load several deployments. On init,-uand--spawn-rateare multiplied by the number of targets so the CLI value means load per target; users are assigned round-robin to a target with that target’s host, model, and optional API key (global--host/--model/--api-keyfill blanks).InitTracker.notify_initno longer treats differingmodelacross workers as inconsistent when--targetsis set. HTTP stats use[<label>] <path>so latency and failures split by target; labels prefer deployment id when the model string contains#.Without
--targets, behavior stays the same.Reviewed by Cursor Bugbot for commit 5ff4c81. Bugbot is set up for automated code reviews on this repo. Configure here.