[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2026-02-08 #14457

2026-02-08T05:08:05Z

github-actions[bot]
bot Feb 8, 2026

Daily NLP-based clustering analysis of copilot agent task prompts to identify patterns, success rates, and optimization opportunities.

Analysis Summary

Period: Last 30 days
Total PRs Analyzed: 1000
Valid Prompts: 973
Clusters Identified: 7
Overall Success Rate: 69.2% (673/973 merged)

Key Metrics

Average Files Changed: 19.6
Average Commits: 3.8
Average Lines: +516/-399
🏆 Highest Success Rate: Bug Fixes & Error Resolution (81.2%)
📊 Most Common Pattern: New Features & Additions (29.7%)

📊 View Detailed Cluster Analysis

Cluster 1: Bug Fixes & Error Resolution (3.3%, 81.2% success)

Characteristics:

Small cluster focused on fixing CI/workflow failures
Highest success rate across all clusters
Average complexity: 28.1 files changed
Top keywords: job, fix, workflow, failing, implement

Representative Examples:

Fix CI cache conflicts
Update test expectations
Resolve workflow failures

Insights: Tasks with clear failure signals and specific job URLs show highest success rates.

Cluster 2: New Features & Additions (29.7%, 61.6% success)

Characteristics:

Largest cluster, representing most common use case
Lower success rate suggests complexity challenges
Average complexity: 10.9 files changed
Top keywords: workflow, issue, section, code, add

Representative Examples:

Refactor ParseWorkflowFile (500+ lines → 74 lines)
Add AWF chroot mode support
Parallelize setup operations

Insights: Complex multi-file changes with architectural decisions need clearer requirements.

Cluster 3: Dependency Updates & Version Upgrades (29.4%, 72.0% success)

Characteristics:

Second largest cluster
Moderate success rate
Average complexity: 23.6 files changed
Top keywords: reference, update, agent, fix

Representative Examples:

Update CLI tool versions
Merge main branch
Add build steps

Insights: Routine update tasks perform well but touch many files.

Cluster 4: General Maintenance (11.0%, 78.5% success)

Characteristics:

Agentic workflow-specific maintenance
Strong success rate
Average complexity: 18.0 files changed
Top keywords: agentic, workflows, md, create

Representative Examples:

Implement hash for workflow frontmatter
Use runtime-import for markdown
Add plugin installation support

Insights: Well-scoped workflow maintenance tasks succeed consistently.

Cluster 5: MCP & Gateway Updates (10.5%, 63.7% success)

Characteristics:

MCP server and gateway changes
Lower success rate despite high file count
Average complexity: 38.4 files changed (highest)
Top keywords: mcp, server, gateway, tool

Representative Examples:

Add payloadDir to MCP gateway
Update MCP tool versions
Gateway configuration changes

Insights: High complexity MCP changes need better scoping.

Cluster 6: Safe Outputs & Project (9.9%, 76.0% success)

Characteristics:

Safe outputs infrastructure
Good success rate
Average complexity: 19.0 files changed
Top keywords: safe, outputs, project, create

Representative Examples:

Upgrade actions/download-artifact
Move safe-output file location
Convert to HTTP transport

Insights: Infrastructure changes with clear requirements succeed well.

Cluster 7: Campaign & Security (6.3%, 67.2% success)

Characteristics:

Campaign command and security tasks
Moderate success rate
Average complexity: 9.0 files changed
Top keywords: campaign, security, project, fix

Representative Examples:

Remove campaign command
Fix campaign warnings
Security improvements

Insights: Removal/cleanup tasks have clear success criteria.

Success Rate Comparison

Cluster	Theme	Tasks	Success Rate	Avg Files	Complexity
1	Bug Fixes & Error Resolution	32	81.2%	28.1	High
4	General Maintenance	107	78.5%	18.0	Medium
6	Safe Outputs & Project	96	76.0%	19.0	Medium
3	Dependency Updates	286	72.0%	23.6	High
7	Campaign & Security	61	67.2%	9.0	Low
5	MCP & Gateway Updates	102	63.7%	38.4	Very High
2	New Features & Additions	289	61.6%	10.9	Medium

Key Findings

Success Rate Variation: Significant variation (61.6% to 81.2%) across clusters, indicating task type strongly affects success.
Complexity vs Success: Higher file count doesn't always mean lower success. Bug fixes touch 28 files but succeed at 81%, while new features touch 11 files but succeed at 62%.
Clear Goals Win: Tasks with specific failure signals (bug fixes) or well-defined scope (maintenance) outperform open-ended feature work.
Volume Challenge: 59% of tasks fall into lower-performing clusters (New Features 30%, Dependency Updates 29%), suggesting need for better prompt engineering.
MCP Complexity: MCP/Gateway cluster shows lowest success despite touching most files (38.4 avg), indicating architectural complexity.

Recommendations

Break Down Feature Tasks: New Features cluster (30% of volume, 62% success) would benefit from smaller, more focused PRs. Consider splitting complex features into multiple issues.
Template Standardization: Bug fix tasks succeed at 81% - use their prompt pattern (job URL, specific failure, clear analysis steps) as template for other categories.
Pre-Planning for MCP: MCP/Gateway changes (64% success, 38 files avg) need architectural planning before agent execution. Consider human design phase first.
Scope Validation: Average task touches 19.6 files. Tasks touching >30 files should be reviewed for scope reduction opportunities.
Success Metrics: Track success rate by cluster over time to measure prompt engineering improvements.

Methodology

Data Collection:

Analyzed 1000 copilot-created PRs from last 30 days
Extracted task prompts from PR bodies
Cleaned and normalized text (removed markdown, URLs, code blocks)

Clustering Approach:

TF-IDF vectorization (200 features, 1-3 grams)
K-means clustering (k=7, optimized via silhouette score)
Cluster themes identified from top keywords and manual review

Validation:

Silhouette scores: 0.029-0.074 (low but expected for noisy text data)
Manual review of cluster coherence and examples
Cross-validation with PR metadata (files, commits, outcome)

Data Quality Notes

27 PRs excluded (no extractable prompt or <20 chars)
Prompt extraction focused on "Original Prompt" sections
Success defined as PR merged (not closed without merge)
File counts and complexity metrics from GitHub API

Full Analysis Files:

/tmp/gh-aw/pr-data/clustered-prompts.json - All PRs with cluster assignments
/tmp/gh-aw/pr-data/cluster-analysis.json - Cluster statistics
/tmp/gh-aw/pr-data/clustering-report.md - Extended report with full data table (973 PRs)

References:

Analysis Run: §21792478654

AI generated by Copilot Agent Prompt Clustering Analysis

expires on Feb 15, 2026, 5:08 AM UTC

2026-02-15T05:10:01Z

github-actions[bot]
bot Feb 15, 2026
Author

This discussion was automatically closed because it expired on 2026-02-15T05:08:04.757Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2026-02-08 #14457

Uh oh!

{{title}}

Uh oh!

Cluster 1: Bug Fixes & Error Resolution (3.3%, 81.2% success)

Cluster 2: New Features & Additions (29.7%, 61.6% success)

Cluster 3: Dependency Updates & Version Upgrades (29.4%, 72.0% success)

Cluster 4: General Maintenance (11.0%, 78.5% success)

Cluster 5: MCP & Gateway Updates (10.5%, 63.7% success)

Cluster 6: Safe Outputs & Project (9.9%, 76.0% success)

Cluster 7: Campaign & Security (6.3%, 67.2% success)

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2026-02-08 #14457

Uh oh!

github-actions[bot] bot Feb 8, 2026

Analysis Summary

Key Metrics

Cluster 1: Bug Fixes & Error Resolution (3.3%, 81.2% success)

Cluster 2: New Features & Additions (29.7%, 61.6% success)

Cluster 3: Dependency Updates & Version Upgrades (29.4%, 72.0% success)

Cluster 4: General Maintenance (11.0%, 78.5% success)

Cluster 5: MCP & Gateway Updates (10.5%, 63.7% success)

Cluster 6: Safe Outputs & Project (9.9%, 76.0% success)

Cluster 7: Campaign & Security (6.3%, 67.2% success)

Success Rate Comparison

Key Findings

Recommendations

Methodology

Data Quality Notes

Replies: 1 comment

Uh oh!

github-actions[bot] bot Feb 15, 2026 Author

github-actions[bot]
bot Feb 8, 2026

github-actions[bot]
bot Feb 15, 2026
Author