Make load_test.py always strict (remove conditional branch) by ishaan-shivhare · Pull Request #122 · fw-ai/benchmark

ishaan-shivhare · 2026-06-06T01:38:28Z

Summary

Removes the strict conditional branch from TranslationDataset — limericks/code prompts are always built as exact token-id sequences via build_pair_ids from prefill_load_test.py.

Fixes PER-75: deployment experiments were measuring ~58% cache hit vs 75% configured because the legacy text path used approximate token counts.

Changes

Deleted legacy text prompt construction (strict=False path, ~45 lines)
Removed strict parameter and strict=not rerank from DatasetHolder
Rerank: decodes token-id prompts to text for paragraph document splitting (no separate legacy dataset path)
--prompt custom instruction still preserved as trailing instruction token ids

Test plan

cd llm_bench && python3 -m unittest test_load_test_strict.py -v (7/7 pass)
Re-run DSV4 deployment experiment with --prompt-cache-max-len=37500 — measured cache hit should approach 75%

Slack Thread

Build limericks/code prompts as exact token-id sequences using build_pair_ids from prefill_load_test.py. Token ids route to /v1/completions even with --chat (template applied client-side). Legacy text construction is kept only for --rerank. Custom --prompt instructions are preserved as trailing instruction token ids. Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>

Always build limericks/code prompts as exact token-id sequences via build_pair_ids. Drop the legacy text path and strict=False rerank exception; rerank now decodes token ids to text for document splitting. Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>

Cap cached_tokens against the body budget, parse completions vs chat responses by shape, route token-id prompts to completions only when appropriate, and add cache warmup blast for multi-gen servers. Remove the benchmark unit test file. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.}

cursor · 2026-06-06T19:07:21Z

+        if isinstance(sample_prompt, list):
+            self.prompt_tokenizer_tokens = len(sample_prompt)
+        else:
+            self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt))


Image placeholders break token prompts

Medium Severity

With the default limericks/code datasets, _get_input now yields token-id lists, but when --prompt-images-with-resolutions is set it still runs insert_image_placeholders, which treats the prompt as a string and concatenates text placeholders. That raises a TypeError or corrupts the payload before the later guard in format_payload.

Additional Locations (1)

llm_bench/load_test.py#L1566-L1593

^{Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.}

cursor · 2026-06-06T19:07:21Z


        assert len(data["choices"]) == 1, f"Too many choices {len(data['choices'])}"
        choice = data["choices"][0]
-        if self.parsed_options.chat:


Embeddings mode gets token ids

Medium Severity

For limericks/code, the dataset now always emits integer token-id prompts, but the embeddings branch still sets input to that list unchanged. The /v1/embeddings API expects text (or string batches), unlike rerank, which this PR decodes from token ids.

Additional Locations (1)

llm_bench/load_test.py#L216-L235

^{Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.}

cursoragent and others added 2 commits June 6, 2026 01:05

cursor Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread llm_bench/load_test.py Outdated

Comment thread llm_bench/load_test.py

cursor Bot reviewed Jun 6, 2026

View reviewed changes

ishaan-shivhare marked this pull request as ready for review June 6, 2026 19:08

ishaan-shivhare requested a review from a team June 8, 2026 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make load_test.py always strict (remove conditional branch)#122

Make load_test.py always strict (remove conditional branch)#122
ishaan-shivhare wants to merge 3 commits into
mainfrom
cursor/unconditional-strict-load-test-dea8

ishaan-shivhare commented Jun 6, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 6, 2026

Uh oh!

cursor Bot Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ishaan-shivhare commented Jun 6, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 6, 2026

Choose a reason for hiding this comment

Image placeholders break token prompts

Uh oh!

cursor Bot Jun 6, 2026

Choose a reason for hiding this comment

Embeddings mode gets token ids

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants