feat: Optimize MCP and AGENTS.md instructions, add snippet_lines by Pringled · Pull Request #198 · MinishLab/semble

Pringled · 2026-06-16T07:39:36Z

This PR adds several changes and optimizations:

chunk_size halved (1500 to 750). Retrieval quality essentially flat on our benchmark but big token savings
snippet_lines=10 default on MCP tools so agents get the function signature + a few lines to confirm the location, rather than full 30–60 line chunks which saves even more tokens
Explicit self-correction guidance in the MCP tool description: call again with snippet_lines=None if the snippet isn't enough
Navigation instructions updated across all agent files so they go directly to the returned file and line and don't re-grep
Cache invalidation on chunk_size so stale index now rebuilds instead of silently returning wrong results

I tested these changes on SWE-bench Lite across 5 models (gpt-5.4, gpt-5.4-mini, deepseek-v4-pro, claude-sonnet-4.6, claude-haiku-4.5), and 3 harnesses (Claude Code, Codex, OpenCode). The biggest gains are for the MCP integration which consistently gave 50 to 70% cost reduction with flat or improved gold hit rate. For the CLI integrations (AGENTS.md/sub-agent) the savings were more modest (5 to 15% savings).

Adds a `snippet_lines` parameter to both MCP tools and the CLI. - Default (10): function/class signature + first lines of body - 0: file path and line range only - None: full chunk (~30-50 lines, previous behaviour) Benchmarking on Django SWE-bench tasks showed 10 lines is the sweet spot: agents make one semble call, get enough context to navigate directly to the fix, and cost less overall than agents using grep.

Validated by SWE-bench agent experiments (6 Django tasks, gpt-5.4-mini): - chunk_size 1500→750: improves top-1 retrieval hit rate (4/6 vs 3/6). Smaller chunks separate related files more cleanly; django-13315 switches from wrong (related.py) to correct (forms/models.py) at top-1. - top_k 5→3, snippet_lines 10→5: reduces tokens per semble call by ~60% with no retrieval loss (gold files are at rank 1 when found). - Prompt / instructions: replace "use grep for exhaustive literal matches" (too permissive) with "navigate directly to the returned line; do not grep for the same content". Applied to MCP server instructions, claude.md subagent, and AGENTS.md/CLAUDE.md installer snippet. Combined effect on 5/6 benchmark tasks: WITH semble is now cheaper than WITHOUT semble (-44% on django-13315, -29% on django-11999, -24% on django-14534). Overhead on easy tasks dropped from 2-4x to <5%.

- Apply updated step 3 ("navigate directly, don't re-search or grep") and step 5 ("grep only for exhaustive literal matches across whole repo") to all 10 agent files: claude, cursor, gemini, kiro, opencode, copilot, commandcode, pi, antigravity, reasonix. - Add --snippet-lines 5 token-efficiency tip to all 9 non-claude agent files (claude.md already had it from a previous commit). - Improve MCP search tool docstring: "use function/class names or behavior descriptions, not error messages" for the query. - Improve MCP find_related docstring: clarify its use for discovering all implementations, callers, or tests for a given location. - Annotate chunk_size=750 with the SWE-bench validation result and a concrete TODO for making it configurable with cache key invalidation.

- Fix "before using Grep, Glob, or Read" → "instead of using Grep or Glob to discover files" — the old phrasing wrongly implied you need semble before reading a file whose path you already know. - Rewrite workflow steps to lead with the action-oriented description: step 1 explains the query format and default parameters (top_k=3, snippet_lines=5 context lines); steps 2-3 focus on the navigate-and-edit flow; steps 4-5 cover advanced usage and the grep exception.

Update the manual AGENTS.md / CLAUDE.md snippet in docs/installation.md to match the improved agent files: - Add --snippet-lines 5 usage example for token-efficient searches - Step 3: "Inspect full files" → "Navigate directly to the returned file and line — do not re-search or grep for the same content" - Step 5: "quick confirmation" → "every occurrence across the whole repo" with a concrete example (all callers of a renamed function)

Indexes built with the old chunk_size (1500 chars) were silently reused after the change to 750 chars, returning stale retrieval results. - Save _DESIRED_CHUNK_LENGTH_CHARS in index metadata at build time. - Validate it in _metadata_matches: a None (old/missing field) or a different value triggers a transparent rebuild on next search. - Add two regression tests: chunk_size mismatch → None, and missing chunk_size field (old format) → None.

Replace Django-specific id_for_label/BoundWidget example with a generic validate-email example across all agent instructions and docs. Revert top_k default from 3 back to 5 — snippet_lines=5 already handles token efficiency; dropping top_k risked missing relevant results at ranks 4-5 with negligible savings. Remove stale "3 results by default" claim from workflow step wording.

Collapse format_results + format_results_snippet into a single format_results(query, results, snippet_lines=None). All three modes now return the same flat structure {file_path, start_line, end_line, score, content?} — previously snippet_lines=None fell through to the old format_results which returned a nested {"chunk": {...}} schema, incompatible with the flat schema returned by snippet_lines=5. Also renames the per-result field from "snippet" to "content" for consistency with Chunk.content. Update test to cover all three modes parametrically.

The initial benchmark (feat: add snippet_lines) explicitly validated 10 as the sweet spot — enough to show the function/class signature and first body lines. The reduction to 5 landed in the same commit as chunk_size 1500→750 and top_k 5→3 with no isolated validation. Empirical sampling across 5 codebases (pytest, astropy, requests, flask, django, ~75 results) shows that with chunk_size=750, ~56% of chunks start mid-function regardless of snippet length, ~19% would benefit from lines 6-10 being visible, and only ~25% show a function name in the first 5 lines. Raising to 10 recovers most of that 19% at negligible token cost (~250 chars per search call).

codecov · 2026-06-16T07:40:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines	Coverage Δ
src/semble/cache.py	`100.00% <100.00%> (ø)`
src/semble/chunking/chunking.py	`100.00% <100.00%> (ø)`
src/semble/cli.py	`100.00% <100.00%> (ø)`
src/semble/index/index.py	`100.00% <100.00%> (ø)`
src/semble/installer/agents.py	`100.00% <ø> (ø)`
src/semble/mcp.py	`100.00% <100.00%> (ø)`
src/semble/utils.py	`100.00% <100.00%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

greptile-apps · 2026-06-16T16:36:16Z

Confidence Score: 4/5

Safe to merge — cache invalidation is correctly wired end-to-end, and the token-saving defaults are easily overridden by agents that need more context.

The chunk-size change and cache-invalidation logic are well-implemented and tested. The only issues are a stale TODO comment (the "include in cache key" half is now done) and a slightly overstated line-count estimate in the MCP tool description ("~15-25 lines" for a 750-char chunk is high; ~10-20 lines is more accurate). Neither affects runtime behavior.

src/semble/chunking/chunking.py has a stale TODO; src/semble/mcp.py has a slightly inaccurate line-count estimate in the snippet_lines description.

_{Reviews (1): Last reviewed commit: "Update instructions" | Re-trigger Greptile}

stephantul

Nice!

First off. The em dash comment is not real (it can't hurt you) (I just don't like em dashes)

I think both other comments might require rerunning experiments, so feel free to ignore them, and mostly are about how a human might read or use semble, not an agent.

stephantul · 2026-06-18T04:52:47Z

 1. Start with `semble search` to find relevant chunks. The index is built and cached automatically.
 2. Use `--content docs` for documentation, `--content config` for config files, or `--content all` for everything.
-3. Inspect full files only when the returned chunk does not give enough context.
+3. Navigate directly to the returned file and line — do not re-search or grep for the same content.


🛑 em-dash detected 🛑

stephantul · 2026-06-18T04:53:48Z

-semble search "authentication flow" ./my-project
-semble search "save_pretrained" ./my-project
-semble search "save model to disk" ./my-project --top-k 10
+semble search "authentication flow" ./my-project --snippet-lines 10  # signatures only, fast


perhaps make this more literal? e.g.,

# first 10 lines only, fast

Also "fast" is debatable, it's not necessarily faster. Maybe "concise" actually fits better?

stephantul · 2026-06-18T04:59:23Z

    search_p.add_argument("query", help="Natural language or code query.")
    search_p.add_argument("path", nargs="?", default=".", help="Local path or git URL (default: current directory).")
    search_p.add_argument("-k", "--top-k", type=int, default=5, help="Number of results (default: 5).")
+    search_p.add_argument(


This argument, snippet_lines, is actually not that great of a description. Without knowing what it does, it's actually kind of surprising that:

this is a numeric argument

it truncates a chunk to a specific line length

So something like n_snippet_lines is maybe clearer. Or lines_per_chunk.

Nevertheless I'm ok with keeping it, I know you did all the tests with this argument probably.

Pringled added 11 commits June 13, 2026 12:27

docs: mention settings-change cache invalidation in README

480823d

Update instructions

5d5a766

Pringled marked this pull request as ready for review June 16, 2026 16:33

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/semble/chunking/chunking.py

Comment thread src/semble/mcp.py Outdated

Update

f02a570

Pringled requested a review from stephantul June 16, 2026 16:41

stephantul approved these changes Jun 18, 2026

View reviewed changes

Pringled added 2 commits June 18, 2026 07:33

Resolve comments

85e48db

Formatting

7ecc536

Pringled merged commit d561953 into main Jun 18, 2026
16 checks passed

Pringled deleted the optimizations branch June 18, 2026 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Optimize MCP and AGENTS.md instructions, add snippet_lines#198

feat: Optimize MCP and AGENTS.md instructions, add snippet_lines#198
Pringled merged 14 commits into
mainfrom
optimizations

Pringled commented Jun 16, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 16, 2026

Uh oh!

Uh oh!

Uh oh!

stephantul left a comment

Uh oh!

stephantul Jun 18, 2026

Uh oh!

stephantul Jun 18, 2026

Uh oh!

stephantul Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Pringled commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 16, 2026

Confidence Score: 4/5

Uh oh!

Uh oh!

Uh oh!

stephantul left a comment

Choose a reason for hiding this comment

Uh oh!

stephantul Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

stephantul Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

stephantul Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pringled commented Jun 16, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading