Skip to content

Conversation

@janisz
Copy link
Contributor

@janisz janisz commented Jan 15, 2026

Description

Enhanced tool descriptions and parameter schemas to better guide LLMs on when to use optional parameters and which tools to select for different query types. Added mcp-testing-framework configuration with 8 test cases covering CVE queries and cluster operations, achieving 87.5% pass rate with GPT-5 models.

Validation

./scripts/run-tests.sh
══════════════════════════════════════════════════════════
  StackRox MCP E2E Testing with Gevals
══════════════════════════════════════════════════════════

Loading environment variables from .env...
Configuration:
  Agent Model: gpt-4o
  Judge Model: gpt-4o
  MCP Server: stackrox-mcp (via go run)

Running gevals tests...


=== Starting Evaluation ===

Task: list-clusters
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

Task: cve-affecting-workloads
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

Task: cve-affecting-clusters
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

Task: cve-nonexistent
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

Task: cve-cluster-scooby
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

Task: cve-cluster-maria
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

Task: cve-clusters-general
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

Task: cve-cluster-list
  Difficulty: easy
  → Running agent...
  → Verifying results...
  ✓ Task passed

=== Evaluation Complete ===

📄 Results saved to: gevals-stackrox-mcp-e2e-out.json

=== Results Summary ===

Task: list-clusters
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/list-clusters.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

Task: cve-affecting-workloads
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/cve-affecting-workloads.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

Task: cve-affecting-clusters
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/cve-affecting-clusters.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

Task: cve-nonexistent
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/cve-nonexistent.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

Task: cve-cluster-scooby
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/cve-cluster-scooby.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

Task: cve-cluster-maria
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/cve-cluster-maria.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

Task: cve-clusters-general
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/cve-clusters-general.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

Task: cve-cluster-list
  Path: /home/janisz/go/src/github.com/stackrox/stackrox-mcp/e2e-tests/gevals/tasks/cve-cluster-list.yaml
  Difficulty: easy
  Task Status: PASSED
  Assertions: PASSED (3/3)

=== Overall Statistics ===
Total Tasks: 8
Tasks Passed: 8/8
Assertions Passed: 24/24

=== Statistics by Difficulty ===

easy:
  Tasks: 8/8
  Assertions: 24/24

══════════════════════════════════════════════════════════
  Tests Completed Successfully!
══════════════════════════════════════════════════════════

@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.58%. Comparing base (53e4b0b) to head (2868f53).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   77.36%   77.58%   +0.22%     
==========================================
  Files          26       26              
  Lines        1109     1120      +11     
==========================================
+ Hits          858      869      +11     
  Misses        216      216              
  Partials       35       35              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Enhanced tool descriptions and parameter schemas to better guide LLMs on when to use optional parameters and which tools to select for different query types. Added mcp-testing-framework configuration with 8 test cases covering CVE queries and cluster operations, achieving 87.5% pass rate with GPT-5 models.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants