-
Notifications
You must be signed in to change notification settings - Fork 3
Failure decomposition: categorize rejections to accelerate learning #5
Description
Problem
When an experiment is REJECTED, the framework logs which gates failed but doesn't diagnose WHY. This makes it hard to learn from failures at scale. The same root cause (e.g., "execution cost kills thin edges") can repeat 10+ times across branches before the orchestrator converges on a fix.
In a real deployment, 62% of experiments were rejected. Many had the same root cause repeated across different branches, but each rejection was treated as an independent failure.
Proposal
After Step 5 (Collect Results), add automatic failure decomposition for REJECTed experiments.
Failure categories
INSUFFICIENT_DATA - n_entries < threshold
→ broaden filter, add data sources, or relax gate
WRONG_PARAMETER_RANGE - metric improves monotonically toward search boundary
→ extend search space in that direction
WRONG_SIGNAL_TYPE - metric doesn't respond to any parameter variation
→ branch hypothesis is wrong, consider exhausting
REGIME_DEPENDENT - positive in some folds, negative in others
→ needs regime filter or conditional activation
EXECUTION_KILLED - positive pre-cost metric, negative post-cost
→ switch execution mode or find larger edges
CONCENTRATION_RISK - edge exists but concentrated in few samples/families
→ needs diversification or larger universe
GATE_BLOCKED - would have promoted but for one specific gate
→ flag for gate evolution review (see issue #4)
NOISE - metric within 1 sigma of champion, no clear direction
→ inconclusive, may need more data
Implementation
- The judge (or a post-judge analysis step) assigns a failure category to each REJECT
- Categories are logged in
experiment_log.jsonlunderfailure_category - Track category distributions per branch in
branch_beliefs.json - When a branch accumulates 3+ failures of the same category, the orchestrator proposes the corresponding fix in the handoff
Orchestrator behavior
In synthesis (Step 5b), after collecting rejections:
"Branch {X} has {N} consecutive {EXECUTION_KILLED} failures. The signal has positive pre-cost edge but execution costs destroy it. Recommended action: switch to maker mode or increase minimum edge threshold."
Branch-level tracking
{
"branch_name": {
"failure_distribution": {
"EXECUTION_KILLED": 4,
"REGIME_DEPENDENT": 2,
"NOISE": 1
},
"dominant_failure": "EXECUTION_KILLED",
"recommended_action": "switch to maker mode"
}
}Why this matters
Failures contain as much information as successes. A lab that treats every REJECT as an opaque "didn't work" is throwing away signal. Categorizing failures turns rejections into directed next steps. This is the difference between random search and adaptive search.
Relationship to existing features
- Extends the synthesis step (5b) with structured failure analysis
- Feeds into the research scout: "Branch X is stuck with REGIME_DEPENDENT failures → scout for regime detection techniques"
- Feeds into gate evolution (issue Gate evolution: detect and flag overly restrictive scoring gates #4): "Branch X is stuck with GATE_BLOCKED failures → review the blocking gate"
- Complements diagnostics: persistent failure categories are natural triggers for diagnostic experiments