Make load_test.py always strict (remove conditional branch)#122
Make load_test.py always strict (remove conditional branch)#122ishaan-shivhare wants to merge 3 commits into
Conversation
Build limericks/code prompts as exact token-id sequences using build_pair_ids from prefill_load_test.py. Token ids route to /v1/completions even with --chat (template applied client-side). Legacy text construction is kept only for --rerank. Custom --prompt instructions are preserved as trailing instruction token ids. Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>
Always build limericks/code prompts as exact token-id sequences via build_pair_ids. Drop the legacy text path and strict=False rerank exception; rerank now decodes token ids to text for document splitting. Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>
Cap cached_tokens against the body budget, parse completions vs chat responses by shape, route token-id prompts to completions only when appropriate, and add cache warmup blast for multi-gen servers. Remove the benchmark unit test file. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.
| if isinstance(sample_prompt, list): | ||
| self.prompt_tokenizer_tokens = len(sample_prompt) | ||
| else: | ||
| self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt)) |
There was a problem hiding this comment.
Image placeholders break token prompts
Medium Severity
With the default limericks/code datasets, _get_input now yields token-id lists, but when --prompt-images-with-resolutions is set it still runs insert_image_placeholders, which treats the prompt as a string and concatenates text placeholders. That raises a TypeError or corrupts the payload before the later guard in format_payload.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.
|
|
||
| assert len(data["choices"]) == 1, f"Too many choices {len(data['choices'])}" | ||
| choice = data["choices"][0] | ||
| if self.parsed_options.chat: |
There was a problem hiding this comment.
Embeddings mode gets token ids
Medium Severity
For limericks/code, the dataset now always emits integer token-id prompts, but the embeddings branch still sets input to that list unchanged. The /v1/embeddings API expects text (or string batches), unlike rerank, which this PR decodes from token ids.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 3453ccc. Configure here.


Summary
Removes the
strictconditional branch fromTranslationDataset— limericks/code prompts are always built as exact token-id sequences viabuild_pair_idsfromprefill_load_test.py.Fixes PER-75: deployment experiments were measuring ~58% cache hit vs 75% configured because the legacy text path used approximate token counts.
Changes
strict=Falsepath, ~45 lines)strictparameter andstrict=not rerankfromDatasetHolder--promptcustom instruction still preserved as trailing instruction token idsTest plan
cd llm_bench && python3 -m unittest test_load_test_strict.py -v(7/7 pass)--prompt-cache-max-len=37500— measured cache hit should approach 75%Slack Thread