[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2026-02-08 #14457
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-02-15T05:08:04.757Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily NLP-based clustering analysis of copilot agent task prompts to identify patterns, success rates, and optimization opportunities.
Analysis Summary
Period: Last 30 days
Total PRs Analyzed: 1000
Valid Prompts: 973
Clusters Identified: 7
Overall Success Rate: 69.2% (673/973 merged)
Key Metrics
📊 View Detailed Cluster Analysis
Cluster 1: Bug Fixes & Error Resolution (3.3%, 81.2% success)
Characteristics:
job,fix,workflow,failing,implementRepresentative Examples:
Insights: Tasks with clear failure signals and specific job URLs show highest success rates.
Cluster 2: New Features & Additions (29.7%, 61.6% success)
Characteristics:
workflow,issue,section,code,addRepresentative Examples:
Insights: Complex multi-file changes with architectural decisions need clearer requirements.
Cluster 3: Dependency Updates & Version Upgrades (29.4%, 72.0% success)
Characteristics:
reference,update,agent,fixRepresentative Examples:
Insights: Routine update tasks perform well but touch many files.
Cluster 4: General Maintenance (11.0%, 78.5% success)
Characteristics:
agentic,workflows,md,createRepresentative Examples:
Insights: Well-scoped workflow maintenance tasks succeed consistently.
Cluster 5: MCP & Gateway Updates (10.5%, 63.7% success)
Characteristics:
mcp,server,gateway,toolRepresentative Examples:
Insights: High complexity MCP changes need better scoping.
Cluster 6: Safe Outputs & Project (9.9%, 76.0% success)
Characteristics:
safe,outputs,project,createRepresentative Examples:
Insights: Infrastructure changes with clear requirements succeed well.
Cluster 7: Campaign & Security (6.3%, 67.2% success)
Characteristics:
campaign,security,project,fixRepresentative Examples:
Insights: Removal/cleanup tasks have clear success criteria.
Success Rate Comparison
Key Findings
Success Rate Variation: Significant variation (61.6% to 81.2%) across clusters, indicating task type strongly affects success.
Complexity vs Success: Higher file count doesn't always mean lower success. Bug fixes touch 28 files but succeed at 81%, while new features touch 11 files but succeed at 62%.
Clear Goals Win: Tasks with specific failure signals (bug fixes) or well-defined scope (maintenance) outperform open-ended feature work.
Volume Challenge: 59% of tasks fall into lower-performing clusters (New Features 30%, Dependency Updates 29%), suggesting need for better prompt engineering.
MCP Complexity: MCP/Gateway cluster shows lowest success despite touching most files (38.4 avg), indicating architectural complexity.
Recommendations
Break Down Feature Tasks: New Features cluster (30% of volume, 62% success) would benefit from smaller, more focused PRs. Consider splitting complex features into multiple issues.
Template Standardization: Bug fix tasks succeed at 81% - use their prompt pattern (job URL, specific failure, clear analysis steps) as template for other categories.
Pre-Planning for MCP: MCP/Gateway changes (64% success, 38 files avg) need architectural planning before agent execution. Consider human design phase first.
Scope Validation: Average task touches 19.6 files. Tasks touching >30 files should be reviewed for scope reduction opportunities.
Success Metrics: Track success rate by cluster over time to measure prompt engineering improvements.
Methodology
Data Collection:
Clustering Approach:
Validation:
Data Quality Notes
Full Analysis Files:
/tmp/gh-aw/pr-data/clustered-prompts.json- All PRs with cluster assignments/tmp/gh-aw/pr-data/cluster-analysis.json- Cluster statistics/tmp/gh-aw/pr-data/clustering-report.md- Extended report with full data table (973 PRs)References:
Beta Was this translation helpful? Give feedback.
All reactions