v1.0.1-Fix #3
AronDaron
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hotfix — Category description now authoritative across the full prompt chain (critical)
Problem
Category description (the user-facing field in CategoryCard) was only passed to topic generation — it never reached the outline-generation or example-generation stages. Every detailed instruction a user wrote there — "return only the solution", "no top-level print or usage demo", "contest-style problem", "reply with the function only", follow-up framing, etc. — was silently dropped after the first stage.
Three additional tutorial-framing biases were baked into the prompts:
["Explain concept X", "Show a practical example", "Mention common pitfall"]as the format example → models copied this "Explain/Show/Mention" scaffold verbatim, forcing the final user turn into "Could you explain X?" regardless of what the real category wantedResult: categories intended for problem-posing output (e.g. contest-style algorithmic problems targeting LiveCodeBench) produced tutorial dialogues opening with "I've been studying palindromic substrings, could you explain Manacher's Algorithm?" — and the judge accepted them because judge criteria were per-job and couldn't catch the framing mismatch.
Fix
build_outline_generation_prompt(...)— added keyword-only parametercategory_description="". When non-empty, the description is injected into the user prompt as aCategory description:block.build_example_generation_prompt(...)— added keyword-only parametercategory_description=""with the same injection. The system prompt was rewritten to state that the Category Description is the authoritative instruction for framing, style, and format of both user and assistant messages, and to explicitly forbid tutorial openings like "Could you explain…", "How does X work?", "I've been studying…" unless the description asks for them.["Explain X", "Show Y", "Mention Z"]outline format example with a neutral["First aspect", "Second aspect", "Third aspect"].job_runner.py— both_generate_outlineand_generate_examplenow passcat.descriptionthrough to the prompt builders.Files:
backend/app/services/prompt_builder.py,backend/app/services/job_runner.pyTests:
tests/unit/test_prompt_builder.py— 5 new tests verifying description propagation and anti-tutorial system-prompt rules. Full suite green (335/335).Impact
The category description field is now the single source of truth for per-category generation behavior. Detailed format constraints written there — "no print demo", "contest framing", "return the function only", "single-language solution", etc. — reach the final generation model and are respected.
Verified on a 10-example pilot for the category Algorithmic Problems (LiveCodeBench target):
Before this fix the same description produced ~20% contest framing and ~40% trade-off follow-ups; tutorial dialogues were the default.
Migration notes
No data migration needed. No API shape change. The new
category_descriptionkwarg defaults to""for backward compatibility, so existing callers (and tests that pass positional args) keep working — but anything going throughjob_runnernow always supplies the real description.If you ran jobs before this fix with a category description that relied on format constraints (no print, no demo, contest framing, function-only output, etc.), those jobs' outputs were not honoring those constraints. Consider re-generating affected categories — especially any targeting LiveCodeBench, HumanEval+, or BigCodeBench where output format purity matters.
Fix
_extract_contenterrors no longer crash generation. When a provider returns an unexpected response shape (wrongchoices[]structure, missingmessage, etc.), the generator now logs a[gen-fail]warning with model/provider/attempt context, emits ageneration_invalid_structureevent to the Activity Log, retries on the first attempt, and skips the example on the second. Previously the raisedValueErrorpropagated out and killed the example-generation loop silently.File:
backend/app/services/job_runner.py—_generate_and_validate_exampleThis discussion was created from the release v1.0.1-Fix.
Beta Was this translation helpful? Give feedback.
All reactions