Skip to content

feat: add configurable chunk size#204

Closed
stephantul wants to merge 1 commit into
mainfrom
add-configurable-chunk-size
Closed

feat: add configurable chunk size#204
stephantul wants to merge 1 commit into
mainfrom
add-configurable-chunk-size

Conversation

@stephantul

Copy link
Copy Markdown
Contributor

This PR adds a configurable chunk size to semble. The chunk size is set to 750, which is the default today, but can be overridden by setting SEMBLE_CHUNK_SIZE. I have not documented this option yet, we don't have a great place to do so.

The PR also exposes the chunk size parameter as part of the Python API.

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/semble/cache.py 100.00% <100.00%> (ø)
src/semble/chunking/chunking.py 100.00% <100.00%> (ø)
src/semble/index/create.py 100.00% <100.00%> (ø)
src/semble/index/index.py 100.00% <100.00%> (ø)
src/semble/types.py 100.00% <100.00%> (+2.32%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Confidence Score: 3/5

The indentation change in from_git causes token-savings stats to silently go empty for every git-indexed repository; merging as-is would ship a quiet regression on that code path.

The refactoring in from_git moved the return SembleIndex(...) outside the with tempfile.TemporaryDirectory() context manager. SembleIndex.__init__ calls _compute_file_sizes(root) synchronously, but the temp dir is already deleted at that point, so every file read silently fails and _file_sizes is always empty for git repos. This affects stats reporting on every from_git call. The rest of the changes — threading desired_chunk_length through the call stack, renaming the metadata key, env-var parsing — are clean and well-tested.

src/semble/index/index.py — the from_git method needs the return SembleIndex(...) block moved back inside the with tempfile.TemporaryDirectory() context.

Reviews (1): Last reviewed commit: "feat: add configurable chunk size" | Re-trigger Greptile

Comment thread src/semble/index/index.py
Comment thread src/semble/types.py
@stephantul stephantul closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant