Skip to content

feat: Optimize MCP and AGENTS.md instructions, add snippet_lines#198

Merged
Pringled merged 14 commits into
mainfrom
optimizations
Jun 18, 2026
Merged

feat: Optimize MCP and AGENTS.md instructions, add snippet_lines#198
Pringled merged 14 commits into
mainfrom
optimizations

Conversation

@Pringled

@Pringled Pringled commented Jun 16, 2026

Copy link
Copy Markdown
Member

This PR adds several changes and optimizations:

  • chunk_size halved (1500 to 750). Retrieval quality essentially flat on our benchmark but big token savings
  • snippet_lines=10 default on MCP tools so agents get the function signature + a few lines to confirm the location, rather than full 30–60 line chunks which saves even more tokens
  • Explicit self-correction guidance in the MCP tool description: call again with snippet_lines=None if the snippet isn't enough
  • Navigation instructions updated across all agent files so they go directly to the returned file and line and don't re-grep
  • Cache invalidation on chunk_size so stale index now rebuilds instead of silently returning wrong results

I tested these changes on SWE-bench Lite across 5 models (gpt-5.4, gpt-5.4-mini, deepseek-v4-pro, claude-sonnet-4.6, claude-haiku-4.5), and 3 harnesses (Claude Code, Codex, OpenCode). The biggest gains are for the MCP integration which consistently gave 50 to 70% cost reduction with flat or improved gold hit rate. For the CLI integrations (AGENTS.md/sub-agent) the savings were more modest (5 to 15% savings).

Pringled added 11 commits June 13, 2026 12:27
Adds a `snippet_lines` parameter to both MCP tools and the CLI.

- Default (10): function/class signature + first lines of body
- 0: file path and line range only
- None: full chunk (~30-50 lines, previous behaviour)

Benchmarking on Django SWE-bench tasks showed 10 lines is the sweet
spot: agents make one semble call, get enough context to navigate
directly to the fix, and cost less overall than agents using grep.
Validated by SWE-bench agent experiments (6 Django tasks, gpt-5.4-mini):

- chunk_size 1500→750: improves top-1 retrieval hit rate (4/6 vs 3/6).
  Smaller chunks separate related files more cleanly; django-13315 switches
  from wrong (related.py) to correct (forms/models.py) at top-1.

- top_k 5→3, snippet_lines 10→5: reduces tokens per semble call by ~60%
  with no retrieval loss (gold files are at rank 1 when found).

- Prompt / instructions: replace "use grep for exhaustive literal matches"
  (too permissive) with "navigate directly to the returned line; do not grep
  for the same content". Applied to MCP server instructions, claude.md
  subagent, and AGENTS.md/CLAUDE.md installer snippet.

Combined effect on 5/6 benchmark tasks: WITH semble is now cheaper than
WITHOUT semble (-44% on django-13315, -29% on django-11999, -24% on
django-14534). Overhead on easy tasks dropped from 2-4x to <5%.
- Apply updated step 3 ("navigate directly, don't re-search or grep") and
  step 5 ("grep only for exhaustive literal matches across whole repo") to
  all 10 agent files: claude, cursor, gemini, kiro, opencode, copilot,
  commandcode, pi, antigravity, reasonix.

- Add --snippet-lines 5 token-efficiency tip to all 9 non-claude agent
  files (claude.md already had it from a previous commit).

- Improve MCP search tool docstring: "use function/class names or behavior
  descriptions, not error messages" for the query.

- Improve MCP find_related docstring: clarify its use for discovering
  all implementations, callers, or tests for a given location.

- Annotate chunk_size=750 with the SWE-bench validation result and a
  concrete TODO for making it configurable with cache key invalidation.
- Fix "before using Grep, Glob, or Read" → "instead of using Grep or Glob
  to discover files" — the old phrasing wrongly implied you need semble
  before reading a file whose path you already know.

- Rewrite workflow steps to lead with the action-oriented description:
  step 1 explains the query format and default parameters (top_k=3,
  snippet_lines=5 context lines); steps 2-3 focus on the navigate-and-edit
  flow; steps 4-5 cover advanced usage and the grep exception.
Update the manual AGENTS.md / CLAUDE.md snippet in docs/installation.md
to match the improved agent files:

- Add --snippet-lines 5 usage example for token-efficient searches
- Step 3: "Inspect full files" → "Navigate directly to the returned file
  and line — do not re-search or grep for the same content"
- Step 5: "quick confirmation" → "every occurrence across the whole repo"
  with a concrete example (all callers of a renamed function)
Indexes built with the old chunk_size (1500 chars) were silently reused
after the change to 750 chars, returning stale retrieval results.

- Save _DESIRED_CHUNK_LENGTH_CHARS in index metadata at build time.
- Validate it in _metadata_matches: a None (old/missing field) or a
  different value triggers a transparent rebuild on next search.
- Add two regression tests: chunk_size mismatch → None, and missing
  chunk_size field (old format) → None.
Replace Django-specific id_for_label/BoundWidget example with a generic
validate-email example across all agent instructions and docs.

Revert top_k default from 3 back to 5 — snippet_lines=5 already handles
token efficiency; dropping top_k risked missing relevant results at ranks
4-5 with negligible savings.

Remove stale "3 results by default" claim from workflow step wording.
Collapse format_results + format_results_snippet into a single
format_results(query, results, snippet_lines=None). All three modes now
return the same flat structure {file_path, start_line, end_line, score,
content?} — previously snippet_lines=None fell through to the old
format_results which returned a nested {"chunk": {...}} schema,
incompatible with the flat schema returned by snippet_lines=5.

Also renames the per-result field from "snippet" to "content" for
consistency with Chunk.content.

Update test to cover all three modes parametrically.
The initial benchmark (feat: add snippet_lines) explicitly validated 10
as the sweet spot — enough to show the function/class signature and
first body lines. The reduction to 5 landed in the same commit as
chunk_size 1500→750 and top_k 5→3 with no isolated validation.

Empirical sampling across 5 codebases (pytest, astropy, requests, flask,
django, ~75 results) shows that with chunk_size=750, ~56% of chunks
start mid-function regardless of snippet length, ~19% would benefit from
lines 6-10 being visible, and only ~25% show a function name in the
first 5 lines. Raising to 10 recovers most of that 19% at negligible
token cost (~250 chars per search call).
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/semble/cache.py 100.00% <100.00%> (ø)
src/semble/chunking/chunking.py 100.00% <100.00%> (ø)
src/semble/cli.py 100.00% <100.00%> (ø)
src/semble/index/index.py 100.00% <100.00%> (ø)
src/semble/installer/agents.py 100.00% <ø> (ø)
src/semble/mcp.py 100.00% <100.00%> (ø)
src/semble/utils.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Pringled Pringled marked this pull request as ready for review June 16, 2026 16:33
@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown

Confidence Score: 4/5

Safe to merge — cache invalidation is correctly wired end-to-end, and the token-saving defaults are easily overridden by agents that need more context.

The chunk-size change and cache-invalidation logic are well-implemented and tested. The only issues are a stale TODO comment (the "include in cache key" half is now done) and a slightly overstated line-count estimate in the MCP tool description ("~15-25 lines" for a 750-char chunk is high; ~10-20 lines is more accurate). Neither affects runtime behavior.

src/semble/chunking/chunking.py has a stale TODO; src/semble/mcp.py has a slightly inaccurate line-count estimate in the snippet_lines description.

Reviews (1): Last reviewed commit: "Update instructions" | Re-trigger Greptile

Comment thread src/semble/chunking/chunking.py
Comment thread src/semble/mcp.py Outdated
@Pringled Pringled requested a review from stephantul June 16, 2026 16:41

@stephantul stephantul left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

First off. The em dash comment is not real (it can't hurt you) (I just don't like em dashes)

I think both other comments might require rerunning experiments, so feel free to ignore them, and mostly are about how a human might read or use semble, not an agent.

Comment thread src/semble/agents/gemini.md Outdated
1. Start with `semble search` to find relevant chunks. The index is built and cached automatically.
2. Use `--content docs` for documentation, `--content config` for config files, or `--content all` for everything.
3. Inspect full files only when the returned chunk does not give enough context.
3. Navigate directly to the returned file and line — do not re-search or grep for the same content.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 em-dash detected 🛑

Comment thread src/semble/agents/kiro.md Outdated
semble search "authentication flow" ./my-project
semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
semble search "authentication flow" ./my-project --snippet-lines 10 # signatures only, fast

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps make this more literal? e.g.,

# first 10 lines only, fast

Also "fast" is debatable, it's not necessarily faster. Maybe "concise" actually fits better?

Comment thread src/semble/cli.py
search_p.add_argument("query", help="Natural language or code query.")
search_p.add_argument("path", nargs="?", default=".", help="Local path or git URL (default: current directory).")
search_p.add_argument("-k", "--top-k", type=int, default=5, help="Number of results (default: 5).")
search_p.add_argument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument, snippet_lines, is actually not that great of a description. Without knowing what it does, it's actually kind of surprising that:

  1. this is a numeric argument
  2. it truncates a chunk to a specific line length

So something like n_snippet_lines is maybe clearer. Or lines_per_chunk.

Nevertheless I'm ok with keeping it, I know you did all the tests with this argument probably.

@Pringled Pringled merged commit d561953 into main Jun 18, 2026
16 checks passed
@Pringled Pringled deleted the optimizations branch June 18, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants