Skip to content

Strict prompt construction in load_test.py (minimal)#120

Draft
ishaan-shivhare wants to merge 1 commit into
mainfrom
cursor/load-test-strict-prompts-2463
Draft

Strict prompt construction in load_test.py (minimal)#120
ishaan-shivhare wants to merge 1 commit into
mainfrom
cursor/load-test-strict-prompts-2463

Conversation

@ishaan-shivhare

Copy link
Copy Markdown
Contributor

Summary

Makes limericks/code prompt construction exact for completions workloads (PER-75), with a minimal diff scoped to prompt building and request wiring only.

What changed

  • TranslationDataset: strict mode (default) uses build_pair_ids / build_ids_to_length from prefill_load_test.py to produce exact token-id prompts with a deterministic shared prefix
  • --prompt preserved: custom instruction is encoded and appended before the chat suffix, same as the old text suffix
  • Rerank unchanged: strict=False when --rerank is set, so prompts stay text strings and format_payload still paragraph-splits them into documents (fixes Bugbot rerank regression)
  • Request path: token-id prompts go to /v1/completions via format_payload + inline URL override in _do_generate_text (no get_url(prompt) change that broke other providers)
  • Forced generation: decodes token ids when --forced-generation-from-dataset is used

What did NOT change

  • Locust pacing, providers (except token-id payload branch), dataset holder structure, rerank/embeddings flows
  • No new shared helper module — imports existing functions from prefill_load_test.py
  • No KV warmup / per-worker priming (follow-up)

Tests

cd llm_bench && python3 -m unittest test_load_test_strict.py -v

7 tests: exact lengths, shared prefix, custom --prompt suffix, rerank text mode, token-id payload wiring.

vs prior PR #119

Previous PR rewired DatasetHolder (dropped --prompt, always token ids including rerank). This version keeps legacy text construction for rerank and only strictifies completions prompt building.

Slack Thread

Open in Web Open in Cursor 

Use build_pair_ids from prefill_load_test for exact token-id prompts
with a deterministic shared prefix. Keep the legacy text path for rerank
so documents are still paragraph-split strings. Preserve --prompt as an
encoded suffix on strict prompts. Wire token ids through format_payload
and /v1/completions routing.

Co-authored-by: Ishaan Shivhare <ishaan-shivhare@users.noreply.github.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 383af18. Configure here.

Comment thread llm_bench/load_test.py

url = self.provider_formatter.get_url()
if isinstance(prompt, list) and prompt and isinstance(prompt[0], int):
url = "/v1/completions"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chat parser mismatches completions stream

High Severity

Token-id prompts force /v1/completions, but OpenAIProvider.parse_output_json still branches on --chat (default true) and reads delta/message content fields. Completions streaming exposes choices[].text, so decoded output stays empty and requests can be marked failed.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 383af18. Configure here.

Comment thread llm_bench/load_test.py

url = self.provider_formatter.get_url()
if isinstance(prompt, list) and prompt and isinstance(prompt[0], int):
url = "/v1/completions"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embeddings URL overridden incorrectly

Medium Severity

The token-id URL override sets /v1/completions for any int list prompt, including when --embeddings built an /v1/embeddings body. That sends embedding JSON to the completions route and breaks embeddings runs on strict datasets.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 383af18. Configure here.

Comment thread llm_bench/load_test.py
if isinstance(sample_prompt, list):
self.prompt_tokenizer_tokens = len(sample_prompt)
else:
self.prompt_tokenizer_tokens = len(tokenizer.encode(sample_prompt))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image placeholders assume string prompts

Medium Severity

With --prompt-images, _get_input still runs insert_image_placeholders on strict token-id prompts. That helper slices and concatenates strings, so list prompts raise TypeError or corrupt placeholders instead of valid multimodal input.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 383af18. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants