Skip to content

fix(docs): precision improvements + feat(agent): SWE-bench SOTA optimizations#166

Merged
t-timms merged 2 commits into
mainfrom
fix/readme-precision-and-sota-improvements
May 11, 2026
Merged

fix(docs): precision improvements + feat(agent): SWE-bench SOTA optimizations#166
t-timms merged 2 commits into
mainfrom
fix/readme-precision-and-sota-improvements

Conversation

@t-timms
Copy link
Copy Markdown
Owner

@t-timms t-timms commented May 11, 2026

Summary

  • 8 README/docs precision fixes: version numbering clarity, permission tier disambiguation, benchmark labeling, LiteLLM attribution, tool count accuracy, stock_price context, leaked path removal, GEPA attribution
  • 4 SOTA improvements for SWE-bench: optimized system prompt with minimal-edit rule, swebench prompt profile, budget prompt injection to prevent agent-in-loop over-editing, DeepAnalysisTool integrated as core tool

Changes

Docs (7 files, 330+ / 27-)

File Change
README.md Version mapping table, 3-tiers-vs-4 clarified, benchmarks restructured as deployable/ceiling tiers, LiteLLM credited, tools counted and categorized, stock_price contextualized
experiments/swebench_lite/findings_2026_04_21.md Removed leaked ~/.claude/plans/ path
GODSPEED_ARCHITECTURE.md GEPA attribution expanded with naming convention context

Agent (3 files)

File Change
src/godspeed/agent/system_prompt.py Added SWEBENCH_TASK_PROMPT with problem-solving protocol, minimal-edit rule, verify-feedback handling, budget awareness
src/godspeed/agent/prompt_profiles.py Added swebench profile with targeted preamble and plan-style
src/godspeed/agent/loop.py Budget prompt injection after N writes and N verify failures to prevent over-edit regression

Tools (1 new file)

File Change
src/godspeed/tools/deep_analysis.py 3-step reasoning (generate→critique→refine) using agent LLM client, no hardcoded credentials

Verification

  • pytest tests/test_agent_loop.py tests/test_system_prompt.py tests/test_prompt_profiles.py: 135 passed, 1 skipped
  • ruff check + ruff format --check: 0 issues
  • git diff --staged: no secrets, no API keys

t-timms added 2 commits May 11, 2026 16:19
…ent): add SWE-bench prompt profile, budget injection, deep_analysis tool

- README: add version mapping table, clarify 3-tier vs 4-tier permissions,
  restructure benchmark table with deployable/research-ceiling tiers,
  attribute LiteLLM properly, document tool count precisely, label
  stock_price as infrastructure utility
- Findings doc: remove leaked local file path
- Architecture doc: improve GEPA attribution with naming convention note
- Agent: inject budget prompt after N writes/verify-failures to prevent
  agent-in-loop over-editing regression (30.4% vs 34.8% single-shot)
- System prompt: add SWEBENCH_TASK_PROMPT with minimal-edit rule,
  verify-feedback handling, and budget awareness
- Prompt profiles: add swebench profile for targeted task structure
- Tools: integrate DeepAnalysisTool (3-step reasoning) as core tool
  using existing LLM client, no hardcoded credentials
@t-timms t-timms merged commit 8606341 into main May 11, 2026
@t-timms t-timms deleted the fix/readme-precision-and-sota-improvements branch May 11, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant