Skip to content

llm_bench: add --targets for multi-deployment load testing#116

Open
aidando73 wants to merge 5 commits into
mainfrom
aidand-multi-target-load-test
Open

llm_bench: add --targets for multi-deployment load testing#116
aidando73 wants to merge 5 commits into
mainfrom
aidand-multi-target-load-test

Conversation

@aidando73

@aidando73 aidando73 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds repeatable --targets flag (format url[|model][|api_key]) so one locust run can drive multiple deployments.
  • Users are assigned round-robin across targets; --host / --model / --api-key act as fallbacks for fields left blank in a per-target spec.
  • -u/--users is interpreted per-target: with N targets, total locust users = users × N (so the value passed is "load per deployment"). --spawn-rate is scaled the same way.
  • Each request is tagged [<target_url>] /<path> in the locust stats name, so per-target latency / throughput / failure rate show up as separate rows in the summary.
  • Backward-compatible: when --targets is unset, behavior is unchanged.

Motivation

Use case: comparing two pyroworks GB200 deployments (different prefill counts) under matched synthetic load without spinning up two locust processes. The native traffic-forker was blocked by an arch mismatch (forker image is amd64-only, deployments run on arm64 GB200 nodes), so doing it from locust is the next-best option for now.

Usage

# Single target — unchanged behavior
locust -f load_test.py --host=https://api.fireworks.ai/inference --model=accounts/foo/models/bar

# Two targets, 100 users PER TARGET (script spawns 200 total)
locust -f load_test.py --headless -u 100 -r 10 \
  --host=https://api.fireworks.ai/inference \
  --api-key=$FIREWORKS_API_KEY \
  --targets="https://api.fireworks.ai/inference|accounts/pyroworks/deployedModels/cursor-clone-1-prefill-2" \
  --targets="https://api.fireworks.ai/inference|accounts/pyroworks/deployedModels/cursor-clone-2-prefill"

Test plan

  • python -c "import ast; ast.parse(open('llm_bench/load_test.py').read())" passes (verified locally)
  • Single-target run (no --targets) — confirm old behavior unchanged
  • Two-target run with -u 100 — confirm total spawned users = 200, stats table shows one row per target, request counts split roughly evenly
  • Per-target --api-key honored when provided in the target spec

Caveats

  • _target_counter is a class attr — in distributed locust each worker has its own counter (round-robin within worker, not globally). Fine for matched-load comparisons; not a perfectly even split across workers.
  • Target label uses the URL. If two targets share the same URL but differ by model, labels collide; happy to switch the label to include model if that's a real case.

Note

Low Risk
Changes are confined to the llm_bench Locust script and remain backward-compatible when --targets is omitted; no production auth or data paths are touched.

Overview
Adds repeatable --targets (url[|model][|api_key]) so one Locust run can load several deployments. On init, -u and --spawn-rate are multiplied by the number of targets so the CLI value means load per target; users are assigned round-robin to a target with that target’s host, model, and optional API key (global --host / --model / --api-key fill blanks).

InitTracker.notify_init no longer treats differing model across workers as inconsistent when --targets is set. HTTP stats use [<label>] <path> so latency and failures split by target; labels prefer deployment id when the model string contains #.

Without --targets, behavior stays the same.

Reviewed by Cursor Bugbot for commit 5ff4c81. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds a repeatable --targets flag (format 'url[|model][|api_key]') so a
single load test can drive multiple deployments at once. Users are
assigned round-robin across targets, and each request is tagged with the
target URL in the locust stats name so per-target latency and throughput
are visible without running separate locust processes.

When --targets is unset, behavior is unchanged.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2dc7362. Configure here.

Comment thread llm_bench/load_test.py
stream=True,
catch_response=True,
timeout=60,
name=f"[{self._target['label']}] {self.provider_formatter.get_url()}",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request stat names change even without --targets flag

Medium Severity

The name parameter is unconditionally applied to the request, changing locust stats row names from the bare path (e.g. /v1/chat/completions) to [https://host] /v1/chat/completions even when --targets is not used. The PR promises backward-compatibility when --targets is unset, but this always-on label prefix breaks existing dashboards, scripts, or --max-fail-ratio monitoring that matches stats by request name.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2dc7362. Configure here.

aidando73 added 2 commits May 26, 2026 11:06
Multiply num_users and spawn_rate by the number of --targets in an
events.init hook so '-u 100 --targets a --targets b' spawns 200 total
users (100 per target). Matches what 'load per deployment' actually
means in side-by-side comparisons, removes the manual multiplication
step from the call site.
@aidando73 aidando73 requested a review from Dranoxgithub May 26, 2026 18:11
Multi-target runs intentionally assign a different model per user, but
notify_init asserts every user shares identical logging_params. Skip
the 'model' key in that comparison when --targets is non-empty; the
single-target assertion is unchanged.

@Dranoxgithub Dranoxgithub left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder why not wrapping the script to call the entry point twice instead?

locust -f load_test.py --headless -u 100 -r 10 --host=... --model=.../clone-1 &
locust -f load_test.py --headless -u 100 -r 10 --host=... --model=.../clone-2 &
wait

This will avoid layering more complexities.

@aidando73

Copy link
Copy Markdown
Contributor Author

You are probably right

The previous label was the target URL, which collapsed stats into one
row when two targets share the same host but differ by model (common
with Fireworks deployment-pinned model strings like
'accounts/x/models/y#accounts/x/deployments/z'). Extract the deployment
id from the '#' suffix when present, otherwise fall back to 'url model'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants