v1.0.1-Fix #3

AronDaron · 2026-04-23T13:04:14Z

AronDaron
Apr 23, 2026
Maintainer

Hotfix — Category description now authoritative across the full prompt chain (critical)

Problem

Category description (the user-facing field in CategoryCard) was only passed to topic generation — it never reached the outline-generation or example-generation stages. Every detailed instruction a user wrote there — "return only the solution", "no top-level print or usage demo", "contest-style problem", "reply with the function only", follow-up framing, etc. — was silently dropped after the first stage.

Three additional tutorial-framing biases were baked into the prompts:

Topic generation system prompt called the output an "instructional conversation" → pushed all categories toward tutorial style
Outline generation system prompt called the example a "question-and-answer training example" → same push
Outline user prompt shipped a literal ["Explain concept X", "Show a practical example", "Mention common pitfall"] as the format example → models copied this "Explain/Show/Mention" scaffold verbatim, forcing the final user turn into "Could you explain X?" regardless of what the real category wanted

Result: categories intended for problem-posing output (e.g. contest-style algorithmic problems targeting LiveCodeBench) produced tutorial dialogues opening with "I've been studying palindromic substrings, could you explain Manacher's Algorithm?" — and the judge accepted them because judge criteria were per-job and couldn't catch the framing mismatch.

Fix

build_outline_generation_prompt(...) — added keyword-only parameter category_description="". When non-empty, the description is injected into the user prompt as a Category description: block.
build_example_generation_prompt(...) — added keyword-only parameter category_description="" with the same injection. The system prompt was rewritten to state that the Category Description is the authoritative instruction for framing, style, and format of both user and assistant messages, and to explicitly forbid tutorial openings like "Could you explain…", "How does X work?", "I've been studying…" unless the description asks for them.
Removed the tutorial-priming word "instructional" from the topic-generation system prompt.
Rewrote the outline-generation system prompt so it must match the category's implied style (problem-solving stages vs. concept stages).
Replaced the biasing ["Explain X", "Show Y", "Mention Z"] outline format example with a neutral ["First aspect", "Second aspect", "Third aspect"].
job_runner.py — both _generate_outline and _generate_example now pass cat.description through to the prompt builders.

Files: backend/app/services/prompt_builder.py, backend/app/services/job_runner.py
Tests: tests/unit/test_prompt_builder.py — 5 new tests verifying description propagation and anti-tutorial system-prompt rules. Full suite green (335/335).

Impact

The category description field is now the single source of truth for per-category generation behavior. Detailed format constraints written there — "no print demo", "contest framing", "return the function only", "single-language solution", etc. — reach the final generation model and are respected.

Verified on a 10-example pilot for the category Algorithmic Problems (LiveCodeBench target):

10/10 user turns open in contest style (e.g. "You're given a directed graph…", "Design a data structure…", "Given an integer array…") — zero tutorial openings
10/10 follow-ups probe an actual trade-off (list vs. matrix, hash vs. tree, Dijkstra vs. Bellman-Ford, etc.) rather than asking for further explanation
10/10 include an explicit Time/Space complexity note

Before this fix the same description produced ~20% contest framing and ~40% trade-off follow-ups; tutorial dialogues were the default.

Migration notes

No data migration needed. No API shape change. The new category_description kwarg defaults to "" for backward compatibility, so existing callers (and tests that pass positional args) keep working — but anything going through job_runner now always supplies the real description.

If you ran jobs before this fix with a category description that relied on format constraints (no print, no demo, contest framing, function-only output, etc.), those jobs' outputs were not honoring those constraints. Consider re-generating affected categories — especially any targeting LiveCodeBench, HumanEval+, or BigCodeBench where output format purity matters.

Fix

_extract_content errors no longer crash generation. When a provider returns an unexpected response shape (wrong choices[] structure, missing message, etc.), the generator now logs a [gen-fail] warning with model/provider/attempt context, emits a generation_invalid_structure event to the Activity Log, retries on the first attempt, and skips the example on the second. Previously the raised ValueError propagated out and killed the example-generation loop silently.

File: backend/app/services/job_runner.py — _generate_and_validate_example

This discussion was created from the release v1.0.1-Fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.0.1-Fix #3

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

v1.0.1-Fix #3

Uh oh!

AronDaron Apr 23, 2026 Maintainer

Hotfix — Category description now authoritative across the full prompt chain (critical)

Problem

Fix

Impact

Migration notes

Fix

Replies: 0 comments

AronDaron
Apr 23, 2026
Maintainer