Make load_test.py unconditionally strict for limericks/code#121
Make load_test.py unconditionally strict for limericks/code#121ishaan-shivhare wants to merge 1 commit into
Conversation
Build limericks/code prompts as exact token-id sequences using build_pair_ids from prefill_load_test.py. Token ids route to /v1/completions even with --chat (template applied client-side). Legacy text construction is kept only for --rerank. Custom --prompt instructions are preserved as trailing instruction token ids. Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.
| self._chunks, | ||
| body_tokens, | ||
| self._cached_tokens, | ||
| self._rng, |
There was a problem hiding this comment.
Cache length exceeds body tokens
Medium Severity
Strict TranslationDataset passes full prompt_cache_max_len into build_pair_ids while that helper’s prompt_tokens is only the body length (num_tokens minus trailing instruction ids). When cache length exceeds that body size, build_pair_ids raises ValueError and the Locust task fails on the first prompt.
Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.
| if isinstance(sample_prompt, list): | ||
| self.prompt_tokenizer_tokens = len(sample_prompt) | ||
| else: | ||
| self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt)) |
There was a problem hiding this comment.
Image placeholders break token prompts
Low Severity
With strict limericks/code, _get_input can return a list of token ids, but _get_input still runs insert_image_placeholders, which concatenates string placeholders onto prompt slices. That path assumes a string prompt and errors before the newer format_payload guard for token-id prompts with images.
Reviewed by Cursor Bugbot for commit 47ddab4. Configure here.


Summary
Fixes PER-75: deployment experiments were measuring ~58% cache hit vs 75% configured because
load_test.pybuilt approximate text prompts instead of exact token-id sequences.TranslationDatasetnow always builds strict prompts for limericks/code (importsbuild_pair_idsfromprefill_load_test.py). Only prompt construction and the minimal request-path wiring changed:len(prompt) == prompt_tokenscached_tokensIDs identical across requests--promptcustom instruction preserved as trailing instruction token ids/v1/completionseven with--chat(template applied client-side)--rerank(rerank needs paragraph-split strings)Test plan
cd llm_bench && python3 -m unittest test_load_test_strict.py -v(7 tests)--prompt-cache-max-len=37500— measured cache hit should approach 75% (warmup follow-up still needed for full hit rate)Slack Thread