fix(docs): precision improvements + feat(agent): SWE-bench SOTA optimizations by t-timms · Pull Request #166 · t-timms/godspeed-coding-agent

t-timms · 2026-05-11T21:19:41Z

Summary

8 README/docs precision fixes: version numbering clarity, permission tier disambiguation, benchmark labeling, LiteLLM attribution, tool count accuracy, stock_price context, leaked path removal, GEPA attribution
4 SOTA improvements for SWE-bench: optimized system prompt with minimal-edit rule, swebench prompt profile, budget prompt injection to prevent agent-in-loop over-editing, DeepAnalysisTool integrated as core tool

Changes

Docs (7 files, 330+ / 27-)

File	Change
`README.md`	Version mapping table, 3-tiers-vs-4 clarified, benchmarks restructured as deployable/ceiling tiers, LiteLLM credited, tools counted and categorized, stock_price contextualized
`experiments/swebench_lite/findings_2026_04_21.md`	Removed leaked `~/.claude/plans/` path
`GODSPEED_ARCHITECTURE.md`	GEPA attribution expanded with naming convention context

Agent (3 files)

File	Change
`src/godspeed/agent/system_prompt.py`	Added `SWEBENCH_TASK_PROMPT` with problem-solving protocol, minimal-edit rule, verify-feedback handling, budget awareness
`src/godspeed/agent/prompt_profiles.py`	Added `swebench` profile with targeted preamble and plan-style
`src/godspeed/agent/loop.py`	Budget prompt injection after N writes and N verify failures to prevent over-edit regression

Tools (1 new file)

File	Change
`src/godspeed/tools/deep_analysis.py`	3-step reasoning (generate→critique→refine) using agent LLM client, no hardcoded credentials

Verification

pytest tests/test_agent_loop.py tests/test_system_prompt.py tests/test_prompt_profiles.py: 135 passed, 1 skipped
ruff check + ruff format --check: 0 issues
git diff --staged: no secrets, no API keys

…ent): add SWE-bench prompt profile, budget injection, deep_analysis tool - README: add version mapping table, clarify 3-tier vs 4-tier permissions, restructure benchmark table with deployable/research-ceiling tiers, attribute LiteLLM properly, document tool count precisely, label stock_price as infrastructure utility - Findings doc: remove leaked local file path - Architecture doc: improve GEPA attribution with naming convention note - Agent: inject budget prompt after N writes/verify-failures to prevent agent-in-loop over-editing regression (30.4% vs 34.8% single-shot) - System prompt: add SWEBENCH_TASK_PROMPT with minimal-edit rule, verify-feedback handling, and budget awareness - Prompt profiles: add swebench profile for targeted task structure - Tools: integrate DeepAnalysisTool (3-step reasoning) as core tool using existing LLM client, no hardcoded credentials

t-timms added 2 commits May 11, 2026 16:19

fix(tools): correct _run_step return type annotation

8606341

t-timms merged commit 8606341 into main May 11, 2026

t-timms deleted the fix/readme-precision-and-sota-improvements branch May 11, 2026 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docs): precision improvements + feat(agent): SWE-bench SOTA optimizations#166

fix(docs): precision improvements + feat(agent): SWE-bench SOTA optimizations#166
t-timms merged 2 commits into
mainfrom
fix/readme-precision-and-sota-improvements

t-timms commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

t-timms commented May 11, 2026

Summary

Changes

Docs (7 files, 330+ / 27-)

Agent (3 files)

Tools (1 new file)

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant