Skip to content

feat: integrate step-level metrics into STOI product#1

Open
liziye627-design wants to merge 6 commits intoKevinYoung-Kw:mainfrom
liziye627-design:feat/step-level-metrics
Open

feat: integrate step-level metrics into STOI product#1
liziye627-design wants to merge 6 commits intoKevinYoung-Kw:mainfrom
liziye627-design:feat/step-level-metrics

Conversation

@liziye627-design
Copy link
Copy Markdown
Collaborator

Summary

  • stoi_analyze: extract output_text from session content blocks, add compute_step_metrics(), aggregate_step_metrics(), print_step_metrics_report() functions, support --metrics flag
  • stoi: add metrics and report CLI commands with F/V/C/U quality dimensions and TE/SUS/FR/MG/RR core metrics
  • stoi_tui: add metrics-box to dashboard with F/V/C/U bars and core metrics display, enhance suggestions with step-level insights
  • tests: add 10 integration tests for compute/aggregate/output_text extraction

Test plan

  • All 83 tests pass (python3 -m pytest tests/ -v)
  • python3 stoi.py metrics --latest — verify metrics command works
  • python3 stoi.py report --days 7 — verify report command works
  • python3 stoi.py analyze --metrics --top 5 — verify --metrics flag works
  • python3 stoi.py stats — verify existing commands still work

KevinYoung-Kw and others added 6 commits April 9, 2026 03:13
Based on the research report defining F/V/C/U quality dimensions and
TE/SUS/FR/MG/RR core metrics:

- Add stoi_metrics.py: StepParser, TokenAttributor, QualityScorer,
  MetricsCalculator implementing step-level token attribution with
  tiktoken/heuristic fallback, four-dimensional quality scoring
  (Factuality, Validity, Coherence, Utility), and five quantitative
  metrics (Token Efficiency, Step Utility Score, Faithfulness Risk,
  Monitorability Gain, Redundancy Ratio)

- Fix calc_stoi algorithm: wasted_tokens now correctly counts only
  new_tokens instead of (total_context - cache_read), and applies
  cache_creation investment ratio to reduce waste penalty

- Fix calc_stoi_score L3 weight comment (0.35 -> 0.20 to match code)

- Update stoi_proxy.py with the corrected waste calculation

- Add token efficiency and useful output ratio to session reports

- Add comprehensive test suite (73 tests) covering engine fixes and
  the new metrics framework

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- stoi_analyze: extract output_text from session content blocks, add
  compute_step_metrics/aggregate_step_metrics/print_step_metrics_report
  functions, support --metrics flag in analyze command
- stoi: add 'metrics' and 'report' CLI commands with F/V/C/U quality
  dimensions and TE/SUS/FR/MG/RR core metrics
- stoi_tui: add metrics-box to dashboard with F/V/C/U bars and core
  metrics display, enhance suggestions with step-level insights
- tests: add 10 integration tests for compute/aggregate/output_text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants