Strict prompt construction in load_test.py (minimal) by ishaan-shivhare · Pull Request #120 · fw-ai/benchmark

ishaan-shivhare · 2026-06-06T00:59:58Z

Summary

Makes limericks/code prompt construction exact for completions workloads (PER-75), with a minimal diff scoped to prompt building and request wiring only.

What changed

TranslationDataset: strict mode (default) uses build_pair_ids / build_ids_to_length from prefill_load_test.py to produce exact token-id prompts with a deterministic shared prefix
--prompt preserved: custom instruction is encoded and appended before the chat suffix, same as the old text suffix
Rerank unchanged: strict=False when --rerank is set, so prompts stay text strings and format_payload still paragraph-splits them into documents (fixes Bugbot rerank regression)
Request path: token-id prompts go to /v1/completions via format_payload + inline URL override in _do_generate_text (no get_url(prompt) change that broke other providers)
Forced generation: decodes token ids when --forced-generation-from-dataset is used

What did NOT change

Locust pacing, providers (except token-id payload branch), dataset holder structure, rerank/embeddings flows
No new shared helper module — imports existing functions from prefill_load_test.py
No KV warmup / per-worker priming (follow-up)

Tests

cd llm_bench && python3 -m unittest test_load_test_strict.py -v

7 tests: exact lengths, shared prefix, custom --prompt suffix, rerank text mode, token-id payload wiring.

vs prior PR #119

Previous PR rewired DatasetHolder (dropped --prompt, always token ids including rerank). This version keeps legacy text construction for rerank and only strictifies completions prompt building.

Slack Thread

Use build_pair_ids from prefill_load_test for exact token-id prompts with a deterministic shared prefix. Keep the legacy text path for rerank so documents are still paragraph-split strings. Preserve --prompt as an encoded suffix on strict prompts. Wire token ids through format_payload and /v1/completions routing. Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 383af18. Configure here.}

cursor · 2026-06-06T01:01:33Z


+        url = self.provider_formatter.get_url()
+        if isinstance(prompt, list) and prompt and isinstance(prompt[0], int):
+            url = "/v1/completions"


Chat parser mismatches completions stream

High Severity

Token-id prompts force /v1/completions, but OpenAIProvider.parse_output_json still branches on --chat (default true) and reads delta/message content fields. Completions streaming exposes choices[].text, so decoded output stays empty and requests can be marked failed.

Additional Locations (1)

llm_bench/load_test.py#L928-L946

^{Reviewed by Cursor Bugbot for commit 383af18. Configure here.}

cursor · 2026-06-06T01:01:33Z


+        url = self.provider_formatter.get_url()
+        if isinstance(prompt, list) and prompt and isinstance(prompt[0], int):
+            url = "/v1/completions"


Embeddings URL overridden incorrectly

Medium Severity

The token-id URL override sets /v1/completions for any int list prompt, including when --embeddings built an /v1/embeddings body. That sends embedding JSON to the completions route and breaks embeddings runs on strict datasets.

^{Reviewed by Cursor Bugbot for commit 383af18. Configure here.}

cursor · 2026-06-06T01:01:33Z

+        if isinstance(sample_prompt, list):
+            self.prompt_tokenizer_tokens = len(sample_prompt)
+        else:
+            self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt))


Image placeholders assume string prompts

Medium Severity

With --prompt-images, _get_input still runs insert_image_placeholders on strict token-id prompts. That helper slices and concatenates strings, so list prompts raise TypeError or corrupt placeholders instead of valid multimodal input.

Additional Locations (1)

llm_bench/load_test.py#L1519-L1536

^{Reviewed by Cursor Bugbot for commit 383af18. Configure here.}

cursor Bot reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strict prompt construction in load_test.py (minimal)#120

Strict prompt construction in load_test.py (minimal)#120
ishaan-shivhare wants to merge 1 commit into
mainfrom
cursor/load-test-strict-prompts-2463

ishaan-shivhare commented Jun 6, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 6, 2026

Uh oh!

cursor Bot Jun 6, 2026

Uh oh!

cursor Bot Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ishaan-shivhare commented Jun 6, 2026

Summary

What changed

What did NOT change

Tests

vs prior PR #119

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 6, 2026

Choose a reason for hiding this comment

Chat parser mismatches completions stream

Uh oh!

cursor Bot Jun 6, 2026

Choose a reason for hiding this comment

Embeddings URL overridden incorrectly

Uh oh!

cursor Bot Jun 6, 2026

Choose a reason for hiding this comment

Image placeholders assume string prompts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants