Backtest Analysis Skill

Turn backtest artifacts into disciplined promote, watch, or reject decisions.

Structured analysis for backtest ledgers, trade logs, daily PnL series, scorecards, and A/B comparison packs.

This repository contains a portable SKILL.md-compatible skill for agents such as Codex or other skill-aware coding assistants.

It also includes a small input profiler that recognizes common backtest artifact shapes before analysis starts.

Why This Exists

Most backtest reviews fail in one of four ways:

they over-weight total_pnl
they ignore left-tail risk
they skip stability checks
they confuse selection outputs with deployable evidence

This skill exists to force a more disciplined review path.

Instead of asking an agent to "look at this backtest" and hoping it chooses the right framework, you give it a concrete artifact and a fixed analysis contract.

What It Does

The skill turns concrete backtest artifacts into a disciplined decision memo.

It forces every analysis to cover:

return
risk
stability
trade quality

And it prevents common mistakes:

deciding from one metric alone
ignoring risk
letting total_pnl dominate the verdict
treating selection outputs as if they were full portfolio audits

What Makes It Different

This is not just a prompt that says "analyze the data carefully."

It explicitly:

fingerprints the input before analysis
distinguishes summary ledgers from trade ledgers and daily series
requires return, risk, stability, and trade-quality coverage
pushes the agent toward a final decision memo
keeps verdict language separate from root-cause diagnosis

Use this when you want a repeatable decision process, not a one-off opinion.

Decision Vocabulary

By default the skill uses:

promote
watch
reject

You can map those to your own language, such as:

上线
观察
弃用

Typical Flow

Profile the artifact.
Identify whether it is a summary ledger, trade ledger, daily PnL series, or A/B pack.
Analyze the artifact across the four required dimensions.
Produce a verdict:
- promote
- watch
- reject

Repository Layout

backtest-analysis/
  SKILL.md
  agents/openai.yaml
  references/
    framework.md
    data_shapes.md
  scripts/
    profile_input.py
examples/
  summary_ledger.csv
  trade_ledger.csv
  daily_pnl.csv

Agent Compatibility

This skill is designed for agents that support SKILL.md-style skills.

Tested structure:

Codex-style agents
Claude Code style skills

Install

Repo-local install for Codex-style agents

Copy the backtest-analysis/ directory into your repo-local skills directory:

mkdir -p .agents/skills
cp -R backtest-analysis .agents/skills/

User-global install

Copy the skill directory into your global skills location:

mkdir -p ~/.codex/skills
cp -R backtest-analysis ~/.codex/skills/

Repo-local install for Claude Code

Copy the skill into your Claude project skills directory:

mkdir -p .claude/skills
cp -R backtest-analysis .claude/skills/

User-global install for Claude Code

Copy the skill into your user-level Claude skills directory:

mkdir -p ~/.claude/skills
cp -R backtest-analysis ~/.claude/skills/

Dependencies

The bundled profiler script requires:

pandas
pyarrow for parquet inputs

Install with:

pip install -r requirements.txt

Quick Start

Profile a file first:

python backtest-analysis/scripts/profile_input.py --input path/to/results.parquet

Try the bundled examples:

python backtest-analysis/scripts/profile_input.py --input examples/summary_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/trade_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/daily_pnl.csv

Then ask your agent to use the skill on that artifact.

Example:

Use $backtest-analysis to analyze this ledger and return a structured promote/watch/reject verdict.

Example Output Shape

The skill is designed to produce a report with sections like:

Input fingerprint
Decision context
Return
Risk
Stability
Trade quality
Verdict
Why not the other verdicts
Next action
Confidence and missing evidence

That structure is the point. It prevents the analysis from collapsing into "Sharpe looks fine" or "PnL looks good, ship it."

Best Use Cases

strategy candidate ledgers
trade-level ledgers
daily PnL or return series
A/B result packs
scorecard exports

Who It Is For

quantitative researchers comparing candidate sets
solo traders reviewing strategy ledgers
teams using agents to standardize backtest review
anyone with parquet/csv/json backtest artifacts who wants a stricter decision rubric

Validation

The repository ships with a GitHub Actions workflow that:

profiles the example datasets
checks the skill frontmatter

You can also run the same checks locally:

python backtest-analysis/scripts/profile_input.py --input examples/summary_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/trade_ledger.csv
python backtest-analysis/scripts/profile_input.py --input examples/daily_pnl.csv

Release Notes

v0.1.0

First public release.

Included in v0.1.0:

portable SKILL.md-compatible backtest-analysis skill
input profiler for ledgers, trade logs, and daily PnL artifacts
example datasets for quick local testing
GitHub Actions validation workflow
MIT-licensed repository ready for Codex and Claude-style skill installs

Not For

open-ended root-cause diagnosis with no concrete dataset
code review
implementation-vs-research consistency audits
direct broker-state verification

Roadmap

Likely future additions:

richer comparison helpers for multi-file A/B analysis
optional markdown report generation
more example datasets
additional data-shape recognizers for common backtest frameworks

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
backtest-analysis		backtest-analysis
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Backtest Analysis Skill

Why This Exists

What It Does

What Makes It Different

Decision Vocabulary

Typical Flow

Repository Layout

Agent Compatibility

Install

Repo-local install for Codex-style agents

User-global install

Repo-local install for Claude Code

User-global install for Claude Code

Dependencies

Quick Start

Example Output Shape

Best Use Cases

Who It Is For

Validation

Release Notes

v0.1.0

Not For

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Backtest Analysis Skill

Why This Exists

What It Does

What Makes It Different

Decision Vocabulary

Typical Flow

Repository Layout

Agent Compatibility

Install

Repo-local install for Codex-style agents

User-global install

Repo-local install for Claude Code

User-global install for Claude Code

Dependencies

Quick Start

Example Output Shape

Best Use Cases

Who It Is For

Validation

Release Notes

v0.1.0

Not For

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages