Make load_test.py unconditionally strict for limericks/code by ishaan-shivhare · Pull Request #121 · fw-ai/benchmark

ishaan-shivhare · 2026-06-06T01:05:12Z

Summary

Fixes PER-75: deployment experiments were measuring ~58% cache hit vs 75% configured because load_test.py built approximate text prompts instead of exact token-id sequences.

TranslationDataset now always builds strict prompts for limericks/code (imports build_pair_ids from prefill_load_test.py). Only prompt construction and the minimal request-path wiring changed:

Exact len(prompt) == prompt_tokens
First cached_tokens IDs identical across requests
--prompt custom instruction preserved as trailing instruction token ids
Token ids → /v1/completions even with --chat (template applied client-side)
Legacy text path kept only for --rerank (rerank needs paragraph-split strings)

Test plan

cd llm_bench && python3 -m unittest test_load_test_strict.py -v (7 tests)
Re-run DSV4 deployment experiment with --prompt-cache-max-len=37500 — measured cache hit should approach 75% (warmup follow-up still needed for full hit rate)

Slack Thread

Build limericks/code prompts as exact token-id sequences using build_pair_ids from prefill_load_test.py. Token ids route to /v1/completions even with --chat (template applied client-side). Legacy text construction is kept only for --rerank. Custom --prompt instructions are preserved as trailing instruction token ids. Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.}

cursor · 2026-06-06T01:06:44Z

+            self._chunks,
+            body_tokens,
+            self._cached_tokens,
+            self._rng,


Cache length exceeds body tokens

Medium Severity

Strict TranslationDataset passes full prompt_cache_max_len into build_pair_ids while that helper’s prompt_tokens is only the body length (num_tokens minus trailing instruction ids). When cache length exceeds that body size, build_pair_ids raises ValueError and the Locust task fails on the first prompt.

^{Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.}

cursor · 2026-06-06T01:06:44Z

+        if isinstance(sample_prompt, list):
+            self.prompt_tokenizer_tokens = len(sample_prompt)
+        else:
+            self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt))


Image placeholders break token prompts

Low Severity

With strict limericks/code, _get_input can return a list of token ids, but _get_input still runs insert_image_placeholders, which concatenates string placeholders onto prompt slices. That path assumes a string prompt and errors before the newer format_payload guard for token-id prompts with images.

^{Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.}

cursor Bot reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make load_test.py unconditionally strict for limericks/code#121

Make load_test.py unconditionally strict for limericks/code#121
ishaan-shivhare wants to merge 1 commit into
mainfrom
cursor/unconditional-strict-load-test-7cec

ishaan-shivhare commented Jun 6, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 6, 2026

Uh oh!

cursor Bot Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ishaan-shivhare commented Jun 6, 2026

Summary

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 6, 2026

Choose a reason for hiding this comment

Cache length exceeds body tokens

Uh oh!

cursor Bot Jun 6, 2026

Choose a reason for hiding this comment

Image placeholders break token prompts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants