Skip to content

Make load_test.py always strict (remove conditional branch)#122

Open
ishaan-shivhare wants to merge 3 commits into
mainfrom
cursor/unconditional-strict-load-test-dea8
Open

Make load_test.py always strict (remove conditional branch)#122
ishaan-shivhare wants to merge 3 commits into
mainfrom
cursor/unconditional-strict-load-test-dea8

Conversation

@ishaan-shivhare

Copy link
Copy Markdown
Contributor

Summary

Removes the strict conditional branch from TranslationDataset — limericks/code prompts are always built as exact token-id sequences via build_pair_ids from prefill_load_test.py.

Fixes PER-75: deployment experiments were measuring ~58% cache hit vs 75% configured because the legacy text path used approximate token counts.

Changes

  • Deleted legacy text prompt construction (strict=False path, ~45 lines)
  • Removed strict parameter and strict=not rerank from DatasetHolder
  • Rerank: decodes token-id prompts to text for paragraph document splitting (no separate legacy dataset path)
  • --prompt custom instruction still preserved as trailing instruction token ids

Test plan

  • cd llm_bench && python3 -m unittest test_load_test_strict.py -v (7/7 pass)
  • Re-run DSV4 deployment experiment with --prompt-cache-max-len=37500 — measured cache hit should approach 75%

Slack Thread

Open in Web Open in Cursor 

cursoragent and others added 2 commits June 6, 2026 01:05
Build limericks/code prompts as exact token-id sequences using
build_pair_ids from prefill_load_test.py. Token ids route to
/v1/completions even with --chat (template applied client-side).

Legacy text construction is kept only for --rerank. Custom --prompt
instructions are preserved as trailing instruction token ids.

Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>
Always build limericks/code prompts as exact token-id sequences via
build_pair_ids. Drop the legacy text path and strict=False rerank
exception; rerank now decodes token ids to text for document splitting.

Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>
Comment thread llm_bench/load_test.py Outdated
Comment thread llm_bench/load_test.py
Cap cached_tokens against the body budget, parse completions vs chat responses
by shape, route token-id prompts to completions only when appropriate, and add
cache warmup blast for multi-gen servers. Remove the benchmark unit test file.

Co-authored-by: Cursor <cursoragent@cursor.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.

Comment thread llm_bench/load_test.py
if isinstance(sample_prompt, list):
self.prompt_tokenizer_tokens = len(sample_prompt)
else:
self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image placeholders break token prompts

Medium Severity

With the default limericks/code datasets, _get_input now yields token-id lists, but when --prompt-images-with-resolutions is set it still runs insert_image_placeholders, which treats the prompt as a string and concatenates text placeholders. That raises a TypeError or corrupts the payload before the later guard in format_payload.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.

Comment thread llm_bench/load_test.py

assert len(data["choices"]) == 1, f"Too many choices {len(data['choices'])}"
choice = data["choices"][0]
if self.parsed_options.chat:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embeddings mode gets token ids

Medium Severity

For limericks/code, the dataset now always emits integer token-id prompts, but the embeddings branch still sets input to that list unchanged. The /v1/embeddings API expects text (or string batches), unlike rerank, which this PR decodes from token ids.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.

@ishaan-shivhare ishaan-shivhare marked this pull request as ready for review June 6, 2026 19:08
@ishaan-shivhare ishaan-shivhare requested a review from a team June 8, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants