Skip to content

Trend analysis filtering by prompt/schema version #28

@richardkiene

Description

@richardkiene

Overview

Add the ability to filter and group trend analysis results by system prompt and MCP tool schema versions. This enables users to correlate performance changes with configuration changes over time.

Background

MCProbe now captures system prompts and MCP tool schemas with each test run (implemented in #24, #25, #27), including SHA256 hashes for quick comparison. This data can be leveraged to provide more powerful trend analysis.

Proposed Features

1. Filter by Prompt Version

  • Filter trend data to show only runs with a specific prompt hash
  • Compare performance metrics across different prompt versions
  • Identify which prompt version performed best for a given scenario

2. Filter by Schema Version

  • Filter trend data to show only runs with a specific schema hash
  • Track how tool description changes affect tool usage patterns
  • Correlate schema changes with pass/fail rate changes

3. Group by Configuration

  • Group trend results by prompt hash to see performance distribution per prompt version
  • Group by schema hash to compare tool effectiveness across schema versions
  • Combined grouping to see performance by full configuration

4. CLI Support

# Filter trends by prompt version
mcprobe trends --prompt-hash abc123

# Filter by schema version
mcprobe trends --schema-hash xyz789

# Show trends grouped by prompt version
mcprobe trends --group-by prompt

5. HTML Report Enhancements

  • Add filter controls in trend visualizations
  • Show prompt/schema version timeline
  • Highlight configuration change points on trend graphs

Use Cases

  1. A/B Testing: Compare test results between two different prompt versions
  2. Regression Analysis: Quickly see if a prompt change caused a score drop
  3. Optimization Tracking: Track improvements as prompts are iterated
  4. Configuration Audit: Review which configurations were tested and when

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions