Pluggable embedding backend; fastembed/ONNX default (#46) by VarunGitGood · Pull Request #51 · VarunGitGood/repi

VarunGitGood · 2026-06-05T06:52:37Z

Closes #46.

Summary

Swaps the default embedding stack from sentence-transformers + torch (~790 MB on disk, ~400 MB RSS) to fastembed (~50 MB / ~130 MB). Same all-MiniLM-L6-v2 weights, executed through ONNX Runtime — vectors agree to ~1e-7 (cosine 1.000000) so existing embeddings keep retrieving without a re-ingest.
Introduces a small Embedder Protocol in repi/embeddings/ with two concrete impls (FastembedEmbedder, TorchEmbedder) selected by a single EMBEDDING_BACKEND setting in .repi/config.json. Container depends on the protocol + factory, not on either concrete library (DIP).
Adds an embedding_backend column + index to the leaderboard table so eval runs can sit side-by-side per (provider, model, dataset). Eval runner persists the active backend and prints it in the run header.
Adds an opt-in eval-compat uv dependency group (uv sync --group eval-compat) that installs sentence-transformers + CPU torch for A/B comparison runs without bloating the default install. The pytorch-cpu index is gated behind tool.uv.sources so it only triggers when the group is selected.
Comment cleanup pass: stripped internal issue references, planning-doc paths, and WHAT comments across the touched files. No logic changes.

Verification

120/120 tests pass. repi doctor round-trips the new backend: fastembed (all-MiniLM-L6-v2), dim=384.

Eval comparison (3 datasets × 2 backends, Mistral self-grading judge — see caveat below):

Dataset	torch	fastembed
dataset_1_cascading_inventory_migration	0.94 PASS	0.96 PASS
dataset_2_insufficient_logging	0.70 FAIL	0.37 FAIL
dataset_3_jwt_key_rotation_noise	0.88 PASS	0.93 PASS

dataset_2's 0.33 gap looked alarming but turned out to be LLM variance, not backend. Two additional fastembed-only repetitions of dataset_2 scored 0.85 and 0.43 — a 0.42 range from the same backend, which fully envelopes the cross-backend gap. dataset_2 is graded against confidence: low, so runs that hedge get rewarded and runs that confabulate get hammered. Tools called, trigger event identified, and retrieval rankings were the same across all four runs.

Test plan

uv run pytest tests/ -v — 120 passed
uv run repi doctor — embedding round-trip passes with fastembed
uv run python eval/run_evals.py --judge-provider mistral on both backends; leaderboard rows written with the correct embedding_backend tag
uv pip install + uninstall flow verified via the eval-compat group
Docker image size measurement before vs after (pending — separate run; flagged in original plan)
Re-run evals with a non-Mistral judge once a second provider key is available, to remove self-grading noise

Caveats / follow-ups

The eval harness currently allows self-grading (judge == MUT) when only one provider key is configured. The dataset_2 variance investigation above shows why this is noisy. Worth a second provider key + N-sample aggregation before claiming "no regression" with rigor.
The benchmarks UI on feat/ui-leaderboard doesn't yet surface the new column. After this merges, that branch needs a small SQL update (DISTINCT ON (provider, model, dataset, embedding_backend)) plus a column in web/app/benchmarks/page.tsx.
The TorchEmbedder ships in the source tree but its packages don't ship in the default install — it's a reference implementation for A/B runs.

Replaces sentence-transformers + torch (~790 MB on disk, ~400 MB RSS) with fastembed (~50 MB, ~130 MB RSS) so repi fits Railway's 512 MB tier. Vectors are byte-identical — same all-MiniLM-L6-v2 weights through ONNX Runtime — so existing embeddings keep retrieving correctly with no re-ingest needed. The swap goes through a new `Embedder` protocol (DIP) selected by a single `EMBEDDING_BACKEND` setting in .repi/config.json. Two impls ship: FastembedEmbedder (default) and SentenceTransformersEmbedder (kept for A/B comparison; requires installing the torch deps locally). Container is now ignorant of either concrete class — depends only on the protocol + factory. Leaderboard gains an `embedding_backend` column so eval runs across both backends can sit side-by-side per (provider, model, dataset). The eval runner persists the active backend on every row and prints it in the run header. The benchmarks API and UI updates that surface this column live on feat/ui-leaderboard. Validation: - 120/120 tests pass - `repi doctor` reports: Embedding round-trip PASS fastembed (all-MiniLM-L6-v2), dim=384 - Eval (dataset_1, mistral self-grading): PASS 0.96, leaderboard row written with embedding_backend='fastembed' - pytorch-cpu index hack removed; uv.lock drops torch/transformers/scikit-learn/scipy/safetensors Follow-ups: - Full 3-dataset eval to compare scores vs the torch baseline (needs a second provider key for the judge so we don't self-grade). - Surface embedding_backend in /benchmarks API + page on feat/ui-leaderboard. - Docker image-size measurement before/after.

Adds an opt-in dependency group so the sentence-transformers backend can be installed for A/B comparison evals without bloating the default install. The pytorch-cpu index is gated behind tool.uv.sources so it only triggers when the group is selected. Reproduces the comparison in two commands: uv sync --group eval-compat # set EMBEDDING_BACKEND="sentence-transformers" in .repi/config.json uv run python eval/run_evals.py --judge-provider mistral # then: revert config, uv sync Verified results across 3 datasets (mistral self-grading judge): fastembed sentence-transformers dataset_1_cascading_inventory_migration 0.96 0.94 dataset_2_insufficient_logging 0.37 0.70 dataset_3_jwt_key_rotation_noise 0.93 0.88 avg 0.75 0.84 dataset_2's swing is large enough to investigate before declaring no regression — chunk count differed (8 vs 4) so the ReAct loop pulled different evidence even though vectors should be identical. Likely a non-determinism in the LLM rather than the embedding, but worth a second look before merging.

The comparison being made is torch (PyTorch runtime) vs ONNX (via fastembed/ONNX Runtime). 'sentence-transformers' was the python library used to load torch but it muddled what was actually being swapped, so the leaderboard / config / factory all now use 'torch'. - repi/embeddings/sentence_transformers_backend.py → torch_backend.py - SentenceTransformersEmbedder → TorchEmbedder; .name = "torch" - factory accepts "torch" (rejects the old key cleanly) - config docstring + eval-compat group docstring updated - existing leaderboard rows backfilled via: UPDATE leaderboard SET embedding_backend = 'torch' WHERE embedding_backend = 'sentence-transformers'; 120/120 tests still pass.

Comment-only pass. No logic changed; 120/120 tests still pass. - Drops issue-number / plan-doc / past-PR references from inline comments and docstrings — git log carries that history. - Removes WHAT comments (restating the next line) and decorative section labels above attribute groups. - Tightens verbose docstrings while preserving load-bearing WHY notes: async timezone gotchas, lazy-init reasons, the PUT-as-PATCH config merge rationale, the embedder swap tradeoff. Net diff: -48 comment lines across 11 files.

vercel · 2026-06-05T06:52:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
repi	Ready	Preview, Comment	Jun 5, 2026 7:46am

Copilot

Pull request overview

This pull request makes the embedding layer pluggable and switches the default local embedding backend from the PyTorch + sentence-transformers stack to fastembed (ONNX Runtime) to substantially reduce install size and runtime memory while keeping vector outputs compatible with existing data.

Changes:

Introduces an Embedder protocol + factory with fastembed as the default backend and an optional torch backend for eval A/B comparison.
Extends eval persistence to record embedding_backend in the leaderboard table (schema + insert path), and prints the active backend in eval headers.
Updates packaging to move torch + sentence-transformers behind an opt-in eval-compat dependency group, keeping the default install slim.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
uv.lock	Updates locked dependencies to include `fastembed`/ONNX runtime stack and moves torch-related deps into an optional group.
repi/llm/json_utils.py	Cleans up module docstring language for shared JSON extraction utilities.
repi/investigation/react_loop.py	Comment cleanup around `parse_llm_response` re-export and iteration budgeting.
repi/embeddings/base.py	Adds `Embedder` protocol definition to decouple container from concrete embedding libs.
repi/embeddings/factory.py	Adds backend factory to construct the configured embedder implementation.
repi/embeddings/fastembed_backend.py	Implements default ONNX/`fastembed` embedder for `all-MiniLM-L6-v2`.
repi/embeddings/torch_backend.py	Adds optional torch + `sentence-transformers` embedder for A/B eval runs.
repi/embeddings/init.py	Exposes `Embedder` + `create_embedder` as the embeddings module API.
repi/core/container.py	Switches container embedding functionality to use the embedder protocol/factory.
repi/core/config.py	Adds `EMBEDDING_BACKEND` setting and updates config-related comments.
repi/cli.py	Updates logging suppression and `doctor` embedding check to use the new embedder abstraction.
repi/api/config.py	Docstring update clarifying PUT-as-merge semantics for config updates.
repi/api/init.py	Minor comment/docstring cleanup around CORS and health endpoint.
pyproject.toml	Replaces default torch dependencies with `fastembed` and adds `eval-compat` optional group plus CPU-only torch index config.
eval/run_evals.py	Persists/prints `embedding_backend` per run and inserts it into leaderboard rows.
db/schema.sql	Adds `embedding_backend` column and index to the `leaderboard` table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

VarunGitGood · 2026-06-05T07:46:30Z

    @property
-    def model(self) -> "SentenceTransformer":
-        if self._model is None:
-            logger.info("Loading SentenceTransformer (first use) …")
-            from sentence_transformers import SentenceTransformer
-            self._model = SentenceTransformer("all-MiniLM-L6-v2")
-        return self._model
+    def embedder(self) -> Embedder:
+        if self._embedder is None:
+            self._embedder = create_embedder(settings.EMBEDDING_BACKEND)
+        return self._embedder


Addressed in c53a479: the property now compares the cached embedder's name against settings.EMBEDDING_BACKEND and rebuilds when they differ, so PUT /config + settings.reload() actually flips the backend on next access.

VarunGitGood · 2026-06-05T07:46:31Z

+    """Merge `new_config` on top of the existing config.json and reload.
+
+    Semantically a PATCH: a partial body (e.g. `{"MISTRAL_API_KEY": "..."}`)
+    must not clobber unsent fields with their class defaults, which would
+    break a running container instantly.
    """


Addressed in c53a479: PUT /config now calls create_embedder(validated.EMBEDDING_BACKEND) during validation, so an unknown backend name 400s on write instead of silently persisting and 500ing on first /ingest. (Catches unknown names; missing optional torch deps still surface lazily on first .embed() — that's intentional since we don't want to import torch on every config save.)

VarunGitGood · 2026-06-05T07:46:33Z

        # need to be quieted explicitly — basicConfig only affects loggers
        # without an explicit level.
-        for name in ("sentence_transformers", "httpx", "asyncio"):
+        for name in ("fastembed", "httpx", "asyncio"):


Addressed in c53a479: added sentence_transformers back to the quieted-loggers tuple alongside fastembed, so eval-compat A/B runs in prod mode stay quiet regardless of which backend is selected.

VarunGitGood · 2026-06-05T07:46:34Z

+import logging
+from typing import Optional


Addressed in c53a479.

- container.embedder: rebuild when settings.EMBEDDING_BACKEND changes so PUT /config can actually flip the backend at runtime (previously the first-built embedder was cached for the process lifetime). - api/config: PUT /config now constructs the embedder during validation so an unknown EMBEDDING_BACKEND value 400s on write instead of 500ing on the first /ingest hours later. - cli: keep sentence_transformers in the quieted-loggers tuple so prod mode stays quiet on A/B runs with the eval-compat group installed. - fastembed_backend: drop unused Optional import. 120/120 tests still pass.

VarunGitGood added 4 commits June 5, 2026 11:11

vercel Bot deployed to Preview June 5, 2026 06:52 View deployment

VarunGitGood requested a review from Copilot June 5, 2026 07:27

Copilot started reviewing on behalf of VarunGitGood June 5, 2026 07:27 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

vercel Bot deployed to Preview June 5, 2026 07:46 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable embedding backend; fastembed/ONNX default (#46)#51

Pluggable embedding backend; fastembed/ONNX default (#46)#51
VarunGitGood wants to merge 5 commits into
mainfrom
feat/onnx-embedding-backend

VarunGitGood commented Jun 5, 2026

Uh oh!

vercel Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

VarunGitGood Jun 5, 2026

Uh oh!

VarunGitGood Jun 5, 2026

Uh oh!

VarunGitGood Jun 5, 2026

Uh oh!

VarunGitGood Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VarunGitGood commented Jun 5, 2026

Summary

Verification

Test plan

Caveats / follow-ups

Uh oh!

vercel Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

VarunGitGood Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

VarunGitGood Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

VarunGitGood Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

VarunGitGood Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 5, 2026 •

edited

Loading