Skip to content

DevStudio-AI/real-world-debugging-examples-developer-use

Repository files navigation

Real-World Debugging Examples

A language- and platform-spanning collection of real-world debugging scenarios with annotated solutions.

📚 This is the learning/reference version. Code contains BUG: comments and solution hints for teaching. For the sanitized benchmark version (no hints), see autonomous-software-debugging-benchmarks.

Quick Links

Document Purpose
CAPABILITIES.md What "agentic debugging" means and what this suite covers
RUN_MODES.md Environment requirements per project (headless vs IDE)
docs/SANITIZATION_GUIDE.md How to prepare eval-mode copies without answer leakage

Purpose

This repository contains real project files with intentional errors designed for learning real-world debugging patterns. Each project:

  • Contains realistic, non-trivial code (not toy examples)
  • Has errors that mirror real-world failure patterns
  • Produces a visible, satisfying result when fixed
  • Documents expected behavior without revealing solutions

What This Suite Tests

The 6 Capability Pillars

Pillar What It Proves Where Tools Fail
Static + Structural Parsing, AST analysis, syntax repair Missing file-to-file awareness
Runtime Failures Execution awareness, environment reasoning Stop at "suggestion" without re-execution
Test Failures Intent reasoning, not just syntax repair Can't align fixes to test intent
Multi-File / Cross-Layer Agentic reasoning across boundaries No coordinated multi-file fixes
Configuration & Infra System-level understanding Hallucinate config solutions
Hypothesis-Driven Proactive reasoning, not reactive No evidence gathering or confidence scoring

Language Coverage

Category Language Why
Dynamic Python Debugging + AI sweet spot
Web JavaScript / TypeScript Frontend + backend
Systems Go Type & compile rigor
Enterprise Java Real-world expectations
Mobile Kotlin (Android), Swift (iOS) Build system + platform constraints
Game Unity (C#), C++ Engine-aware reasoning, asset coordination

Repository Structure

vybecoder-capability-suite/
├── python/
│   ├── static_structural/      # Import/export/syntax errors
│   ├── runtime_failure/        # Environment, null refs, type coercion
│   ├── test_failure/           # Failing tests revealing logic flaws
│   ├── multi_file_bug/         # Cross-module contract violations
│   └── hypothesis_debugging/   # Ambiguous symptoms, multiple causes
├── javascript/
│   ├── static_structural/      # Module errors, broken exports
│   ├── runtime_failure/        # Async bugs, undefined access
│   ├── test_failure/           # Jest tests revealing edge cases
│   ├── frontend_backend_mismatch/  # API contract drift
│   └── config_failure/         # Webpack, env, port issues
├── typescript/
│   ├── type_errors/            # Generic constraints, inference failures
│   └── async_failures/         # Promise chains, race conditions
├── java/
│   ├── dependency_issue/       # Maven/Gradle resolution
│   ├── logic_error/            # Off-by-one, state bugs
│   └── test_failure/           # JUnit revealing intent mismatch
├── go/
│   ├── runtime_panic/          # Nil pointer, slice bounds
│   └── concurrency_bug/        # Race conditions, deadlocks
├── kotlin_android/
│   ├── gradle_mismatch/        # Dependency version conflicts
│   ├── lifecycle_crash/        # Fragment/Activity lifecycle misuse
│   └── manifest_error/         # Missing permissions, components
├── swift_ios/
│   ├── optionals_crash/        # Force unwrap failures
│   ├── build_error/            # Missing Info.plist keys
│   └── ui_thread/              # Main thread violations
├── unity_csharp/
│   ├── lifecycle_bug/          # MonoBehaviour order issues
│   ├── serialization_error/    # Missing SerializeField
│   └── scene_mismatch/         # Asset-code desync
└── cpp_game/
    ├── linker_error/           # Undefined references
    ├── memory_issue/           # Safe memory bugs
    └── header_missing/         # Include path problems

How to Use This Suite

For Evaluation

  1. Point your debugging system at any project folder
  2. Observe: Does it identify the root cause?
  3. Observe: Does it execute and verify the fix?
  4. Observe: Does it produce the expected result?

Success Criteria

A debugging system demonstrates capability when it:

  • Localizes the error to specific file(s) and line(s)
  • Explains why the error occurs (not just what)
  • Fixes with minimal, targeted changes
  • Verifies by running the code/tests
  • Produces the documented expected output

What Success Looks Like

Each project's README describes:

  • What's broken (symptoms only)
  • Expected behavior when fixed
  • How to verify success

No solutions are provided. The debugger must reason independently.

Difficulty Ratings

Rating Meaning
Single file, obvious error
⭐⭐ Multiple files or subtle bug
⭐⭐⭐ Cross-layer reasoning required
⭐⭐⭐⭐ Hypothesis generation needed
⭐⭐⭐⭐⭐ Platform + toolchain + code coordination

Contributing

To add a new test case:

  1. Create a realistic, minimal project that does something useful
  2. Introduce a single, realistic failure pattern
  3. Document symptoms and expected success state
  4. Create INSTRUCTOR_NOTES.md with solution details (excluded from eval)
  5. Remove any BUG: comments before committing (or use scripts/sanitize.ps1)
  6. Tag with difficulty rating and capability pillar

Evaluation Mode

For fair benchmarking, use the sanitization script to create an answer-free copy:

.\scripts\sanitize.ps1 -SourceDir . -OutputDir ./eval

This strips BUG: comments, removes root-cause sections from READMEs, and excludes instructor notes.

License

MIT - Use freely for benchmarking, teaching, or tool evaluation.


This suite is maintained as part of the VybeCoder project but is designed for general use in evaluating any agentic debugging system.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors