Skip to content

DevStudio-AI/real-world-debugging-agent-use

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VybeCoder Real-World Test Suite

A practical test suite measuring VybeCoder's ability to detect and fix common developer errors - the issues that actually block developers daily.

Philosophy

This suite tests what matters:

  • ✅ "Does my code compile after Fix Everything?"
  • ✅ "Did it catch the typo I couldn't find?"
  • ✅ "Can it fix my bracket/string issues?"

NOT trick questions like:

  • ❌ "Can it find a hidden race condition?"
  • ❌ "Does it detect subtle memory leaks?"
  • ❌ "Can it predict runtime behavior?"

Test Structure

vybecoder-realworld-suite/
├── tier1-syntax/           # 80% of real issues - MUST fix
│   ├── javascript/
│   ├── typescript/
│   ├── python/
│   ├── java/
│   ├── go/
│   ├── csharp/
│   ├── swift/
│   └── cpp/
├── tier2-compile/          # 15% of real issues - SHOULD fix
│   └── [same languages]
└── tier3-quality/          # 5% of real issues - suggestions
    └── [same languages]

Tier Definitions

Tier 1: Syntax Errors (BLOCKING - Must Fix)

These prevent code from parsing/compiling at all:

  • Unclosed brackets { ( [
  • Unclosed strings " '
  • Missing semicolons (where required)
  • Unmatched quotes
  • Invalid syntax tokens

Expected: 95-100% fix rate

Tier 2: Compile/Type Errors (BLOCKING - Should Fix)

Code parses but won't compile or has obvious errors:

  • Type mismatches
  • Missing return statements
  • Undefined variables/functions
  • Wrong argument counts
  • Missing imports (for used symbols)

Expected: 80-90% fix rate

Tier 3: Quality Suggestions (NON-BLOCKING)

Code works but could be better:

  • Unused imports
  • Style issues (== vs ===)
  • Potential null safety issues
  • Best practice violations

Expected: Detection only, optional fixes

Success Criteria

Tier Detection Fix Rate Verification
Tier 1 100% 95%+ Code parses
Tier 2 90%+ 80%+ Code compiles
Tier 3 80%+ N/A Suggestions shown

Languages Covered

Language Tier 1 Tier 2 Tier 3
JavaScript
TypeScript
Python
Java
Go
C#
Swift
C++

Error Counts

Tier Files Total Errors Expected Fixes
Tier 1 24 48 46+ (95%)
Tier 2 24 48 38+ (80%)
Tier 3 23 ~100 N/A (suggestions)
Total 71 ~196 84+

Version

  • Suite Version: 1.0
  • Created: December 2025
  • Target: VybeCoder v42.96+

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors