llm_bench: add --targets for multi-deployment load testing by aidando73 · Pull Request #116 · fw-ai/benchmark

aidando73 · 2026-05-26T17:56:53Z

Summary

Adds repeatable --targets flag (format url[|model][|api_key]) so one locust run can drive multiple deployments.
Users are assigned round-robin across targets; --host / --model / --api-key act as fallbacks for fields left blank in a per-target spec.
-u/--users is interpreted per-target: with N targets, total locust users = users × N (so the value passed is "load per deployment"). --spawn-rate is scaled the same way.
Each request is tagged [<target_url>] /<path> in the locust stats name, so per-target latency / throughput / failure rate show up as separate rows in the summary.
Backward-compatible: when --targets is unset, behavior is unchanged.

Motivation

Use case: comparing two pyroworks GB200 deployments (different prefill counts) under matched synthetic load without spinning up two locust processes. The native traffic-forker was blocked by an arch mismatch (forker image is amd64-only, deployments run on arm64 GB200 nodes), so doing it from locust is the next-best option for now.

Usage

# Single target — unchanged behavior
locust -f load_test.py --host=https://api.fireworks.ai/inference --model=accounts/foo/models/bar

# Two targets, 100 users PER TARGET (script spawns 200 total)
locust -f load_test.py --headless -u 100 -r 10 \
  --host=https://api.fireworks.ai/inference \
  --api-key=$FIREWORKS_API_KEY \
  --targets="https://api.fireworks.ai/inference|accounts/pyroworks/deployedModels/cursor-clone-1-prefill-2" \
  --targets="https://api.fireworks.ai/inference|accounts/pyroworks/deployedModels/cursor-clone-2-prefill"

Test plan

python -c "import ast; ast.parse(open('llm_bench/load_test.py').read())" passes (verified locally)
Single-target run (no --targets) — confirm old behavior unchanged
Two-target run with -u 100 — confirm total spawned users = 200, stats table shows one row per target, request counts split roughly evenly
Per-target --api-key honored when provided in the target spec

Caveats

_target_counter is a class attr — in distributed locust each worker has its own counter (round-robin within worker, not globally). Fine for matched-load comparisons; not a perfectly even split across workers.
Target label uses the URL. If two targets share the same URL but differ by model, labels collide; happy to switch the label to include model if that's a real case.

Note

Low Risk
Changes are confined to the llm_bench Locust script and remain backward-compatible when --targets is omitted; no production auth or data paths are touched.

Overview
Adds repeatable --targets (url[|model][|api_key]) so one Locust run can load several deployments. On init, -u and --spawn-rate are multiplied by the number of targets so the CLI value means load per target; users are assigned round-robin to a target with that target’s host, model, and optional API key (global --host / --model / --api-key fill blanks).

InitTracker.notify_init no longer treats differing model across workers as inconsistent when --targets is set. HTTP stats use [<label>] <path> so latency and failures split by target; labels prefer deployment id when the model string contains #.

Without --targets, behavior stays the same.

^{Reviewed by Cursor Bugbot for commit 5ff4c81. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds a repeatable --targets flag (format 'url[|model][|api_key]') so a single load test can drive multiple deployments at once. Users are assigned round-robin across targets, and each request is tagged with the target URL in the locust stats name so per-target latency and throughput are visible without running separate locust processes. When --targets is unset, behavior is unchanged.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2dc7362. Configure here.}

cursor · 2026-05-26T18:02:43Z

            stream=True,
            catch_response=True,
            timeout=60,
+            name=f"[{self._target['label']}] {self.provider_formatter.get_url()}",


Request stat names change even without --targets flag

Medium Severity

The name parameter is unconditionally applied to the request, changing locust stats row names from the bare path (e.g. /v1/chat/completions) to [https://host] /v1/chat/completions even when --targets is not used. The PR promises backward-compatibility when --targets is unset, but this always-on label prefix breaks existing dashboards, scripts, or --max-fail-ratio monitoring that matches stats by request name.

Additional Locations (1)

llm_bench/load_test.py#L1175-L1182

^{Reviewed by Cursor Bugbot for commit 2dc7362. Configure here.}

Multiply num_users and spawn_rate by the number of --targets in an events.init hook so '-u 100 --targets a --targets b' spawns 200 total users (100 per target). Matches what 'load per deployment' actually means in side-by-side comparisons, removes the manual multiplication step from the call site.

Multi-target runs intentionally assign a different model per user, but notify_init asserts every user shares identical logging_params. Skip the 'model' key in that comparison when --targets is non-empty; the single-target assertion is unchanged.

Dranoxgithub

Wonder why not wrapping the script to call the entry point twice instead?

locust -f load_test.py --headless -u 100 -r 10 --host=... --model=.../clone-1 &
locust -f load_test.py --headless -u 100 -r 10 --host=... --model=.../clone-2 &
wait

This will avoid layering more complexities.

aidando73 · 2026-05-26T18:41:42Z

You are probably right

The previous label was the target URL, which collapsed stats into one row when two targets share the same host but differ by model (common with Fireworks deployment-pinned model strings like 'accounts/x/models/y#accounts/x/deployments/z'). Extract the deployment id from the '#' suffix when present, otherwise fall back to 'url model'.

cursor Bot reviewed May 26, 2026

View reviewed changes

aidando73 added 2 commits May 26, 2026 11:06

llm_bench: tighten --targets help text

3d61e81

aidando73 requested a review from Dranoxgithub May 26, 2026 18:11

Dranoxgithub reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llm_bench: add --targets for multi-deployment load testing#116

llm_bench: add --targets for multi-deployment load testing#116
aidando73 wants to merge 5 commits into
mainfrom
aidand-multi-target-load-test

aidando73 commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 26, 2026

Uh oh!

Dranoxgithub left a comment

Uh oh!

aidando73 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

aidando73 commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Usage

Test plan

Caveats

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Request stat names change even without --targets flag

Uh oh!

Dranoxgithub left a comment

Choose a reason for hiding this comment

Uh oh!

aidando73 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aidando73 commented May 26, 2026 •

edited by cursor Bot

Loading