Skip to content

Make load_test.py unconditionally strict for limericks/code#121

Draft
ishaan-shivhare wants to merge 1 commit into
mainfrom
cursor/unconditional-strict-load-test-7cec
Draft

Make load_test.py unconditionally strict for limericks/code#121
ishaan-shivhare wants to merge 1 commit into
mainfrom
cursor/unconditional-strict-load-test-7cec

Conversation

@ishaan-shivhare

Copy link
Copy Markdown
Contributor

Summary

Fixes PER-75: deployment experiments were measuring ~58% cache hit vs 75% configured because load_test.py built approximate text prompts instead of exact token-id sequences.

TranslationDataset now always builds strict prompts for limericks/code (imports build_pair_ids from prefill_load_test.py). Only prompt construction and the minimal request-path wiring changed:

  • Exact len(prompt) == prompt_tokens
  • First cached_tokens IDs identical across requests
  • --prompt custom instruction preserved as trailing instruction token ids
  • Token ids → /v1/completions even with --chat (template applied client-side)
  • Legacy text path kept only for --rerank (rerank needs paragraph-split strings)

Test plan

  • cd llm_bench && python3 -m unittest test_load_test_strict.py -v (7 tests)
  • Re-run DSV4 deployment experiment with --prompt-cache-max-len=37500 — measured cache hit should approach 75% (warmup follow-up still needed for full hit rate)

Slack Thread

Open in Web Open in Cursor 

Build limericks/code prompts as exact token-id sequences using
build_pair_ids from prefill_load_test.py. Token ids route to
/v1/completions even with --chat (template applied client-side).

Legacy text construction is kept only for --rerank. Custom --prompt
instructions are preserved as trailing instruction token ids.

Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.

Comment thread llm_bench/load_test.py
self._chunks,
body_tokens,
self._cached_tokens,
self._rng,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache length exceeds body tokens

Medium Severity

Strict TranslationDataset passes full prompt_cache_max_len into build_pair_ids while that helper’s prompt_tokens is only the body length (num_tokens minus trailing instruction ids). When cache length exceeds that body size, build_pair_ids raises ValueError and the Locust task fails on the first prompt.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.

Comment thread llm_bench/load_test.py
if isinstance(sample_prompt, list):
self.prompt_tokenizer_tokens = len(sample_prompt)
else:
self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image placeholders break token prompts

Low Severity

With strict limericks/code, _get_input can return a list of token ids, but _get_input still runs insert_image_placeholders, which concatenates string placeholders onto prompt slices. That path assumes a string prompt and errors before the newer format_payload guard for token-id prompts with images.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants