Turn backtest artifacts into disciplined promote, watch, or reject decisions.
Structured analysis for backtest ledgers, trade logs, daily PnL series, scorecards, and A/B comparison packs.
This repository contains a portable SKILL.md-compatible skill for agents such as
Codex or other skill-aware coding assistants.
It also includes a small input profiler that recognizes common backtest artifact shapes before analysis starts.
Most backtest reviews fail in one of four ways:
- they over-weight
total_pnl - they ignore left-tail risk
- they skip stability checks
- they confuse selection outputs with deployable evidence
This skill exists to force a more disciplined review path.
Instead of asking an agent to "look at this backtest" and hoping it chooses the right framework, you give it a concrete artifact and a fixed analysis contract.
The skill turns concrete backtest artifacts into a disciplined decision memo.
It forces every analysis to cover:
- return
- risk
- stability
- trade quality
And it prevents common mistakes:
- deciding from one metric alone
- ignoring risk
- letting
total_pnldominate the verdict - treating selection outputs as if they were full portfolio audits
This is not just a prompt that says "analyze the data carefully."
It explicitly:
- fingerprints the input before analysis
- distinguishes summary ledgers from trade ledgers and daily series
- requires return, risk, stability, and trade-quality coverage
- pushes the agent toward a final decision memo
- keeps verdict language separate from root-cause diagnosis
Use this when you want a repeatable decision process, not a one-off opinion.
By default the skill uses:
promotewatchreject
You can map those to your own language, such as:
上线观察弃用
- Profile the artifact.
- Identify whether it is a summary ledger, trade ledger, daily PnL series, or A/B pack.
- Analyze the artifact across the four required dimensions.
- Produce a verdict:
promotewatchreject
backtest-analysis/
SKILL.md
agents/openai.yaml
references/
framework.md
data_shapes.md
scripts/
profile_input.py
examples/
summary_ledger.csv
trade_ledger.csv
daily_pnl.csv
This skill is designed for agents that support SKILL.md-style skills.
Tested structure:
- Codex-style agents
- Claude Code style skills
Copy the backtest-analysis/ directory into your repo-local skills directory:
mkdir -p .agents/skills
cp -R backtest-analysis .agents/skills/Copy the skill directory into your global skills location:
mkdir -p ~/.codex/skills
cp -R backtest-analysis ~/.codex/skills/Copy the skill into your Claude project skills directory:
mkdir -p .claude/skills
cp -R backtest-analysis .claude/skills/Copy the skill into your user-level Claude skills directory:
mkdir -p ~/.claude/skills
cp -R backtest-analysis ~/.claude/skills/The bundled profiler script requires:
pandaspyarrowfor parquet inputs
Install with:
pip install -r requirements.txtProfile a file first:
python backtest-analysis/scripts/profile_input.py --input path/to/results.parquetTry the bundled examples:
python backtest-analysis/scripts/profile_input.py --input examples/summary_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/trade_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/daily_pnl.csvThen ask your agent to use the skill on that artifact.
Example:
Use
$backtest-analysisto analyze this ledger and return a structured promote/watch/reject verdict.
The skill is designed to produce a report with sections like:
Input fingerprint
Decision context
Return
Risk
Stability
Trade quality
Verdict
Why not the other verdicts
Next action
Confidence and missing evidence
That structure is the point. It prevents the analysis from collapsing into "Sharpe looks fine" or "PnL looks good, ship it."
- strategy candidate ledgers
- trade-level ledgers
- daily PnL or return series
- A/B result packs
- scorecard exports
- quantitative researchers comparing candidate sets
- solo traders reviewing strategy ledgers
- teams using agents to standardize backtest review
- anyone with parquet/csv/json backtest artifacts who wants a stricter decision rubric
The repository ships with a GitHub Actions workflow that:
- profiles the example datasets
- checks the skill frontmatter
You can also run the same checks locally:
python backtest-analysis/scripts/profile_input.py --input examples/summary_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/trade_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/daily_pnl.csvFirst public release.
Included in v0.1.0:
- portable
SKILL.md-compatiblebacktest-analysisskill - input profiler for ledgers, trade logs, and daily PnL artifacts
- example datasets for quick local testing
- GitHub Actions validation workflow
- MIT-licensed repository ready for Codex and Claude-style skill installs
- open-ended root-cause diagnosis with no concrete dataset
- code review
- implementation-vs-research consistency audits
- direct broker-state verification
Likely future additions:
- richer comparison helpers for multi-file A/B analysis
- optional markdown report generation
- more example datasets
- additional data-shape recognizers for common backtest frameworks
MIT