EdgeRAG is a reproducibility-first, publication-facing refactor of the local RAG experiment used in the paper Resource-Constrained Evaluation of Quantized Local LLMs for Retrieval-Augmented Generation.
The experiment evaluates four pipelines under a fixed local-stack workflow:
- P0 — retrieval-only
- P1 — closed-book answering
- P2 — standard retrieve-then-answer RAG
- P3 — same-model query-rewrite RAG
The study is generator-centric. Retrieval still matters and is swept across multiple embedding models and retrieval depths, but the central question is how local quantized generators behave once lexical quality, grounding, latency, semantic adequacy, and runtime failures are measured together.
This repository is the runnable companion to the paper’s workstation-scale evaluation story.
The manuscript describes a single personal workstation with:
- one NVIDIA GeForce RTX 3090 GPU with 24.0 GB VRAM
- Ollama 0.17.1
- FAISS 1.13.2
- Python 3.11.14
The purpose of this repository is therefore not to claim universal model rankings, but to make this exact local-stack workflow inspectable, repeatable, and extendable.
Install the runner and base-analysis dependencies:
pip install -e ".[runner,analysis]"Start Ollama locally, then run the canonical paper-style command:
python -m edgerag.cli.run --config configs/phase1.json --verbose_stream --first_token_timeout_s 300 --stream_timeout_s 600Regenerate the base analysis:
python -m edgerag.analysis.base --config configs/phase1.json --results results_phase1/results.jsonl --outdir artifacts/paper/sample_outputs/base_analysisAdd SAS only when needed:
pip install -e ".[runner,analysis,sas]"
python -m edgerag.analysis.sas --config configs/phase1.json --results results_phase1/results.jsonl --outdir artifacts/paper/sample_outputs/sas_analysis.
├── README.md
├── CITATION.cff
├── LICENSE-TODO.txt
├── pyproject.toml
├── configs/
├── docs/
├── artifacts/
├── legacy/
├── src/edgerag/
└── tests/
Key locations:
configs/phase1.json— canonical stable configsrc/edgerag/pipelines/runner.py— main experiment orchestrationsrc/edgerag/analysis/base.py— base analysissrc/edgerag/analysis/sas.py— optional SAS extensionartifacts/paper/— paper-facing tracked artifacts and sample output locationslegacy/original_snapshot/— preserved uploaded originals for audit comparison
Core package only:
pip install -e .Experiment runner + base analysis:
pip install -e ".[runner,analysis]"Add optional SAS:
pip install -e ".[runner,analysis,sas]"Add test tooling:
pip install -e ".[dev]"The main experiment runner expects a local Ollama server. This dependency is deliberate and part of the scientific setup rather than an implementation detail.
The canonical config is:
configs/phase1.json
It preserves the original settings, including:
- 500 deterministic KILT-NQ development questions
subset_mode=randomsubset_seed=123- reduced
gold_plus_randomKB construction - 100,000 random background passages plus all gold provenance passages
- three embedders
- twelve generator models in the current snapshot
- first-token timeout of 300 seconds
- stream timeout of 600 seconds
python -m edgerag.cli.run --config configs/phase1.json --dry_runDry-run avoids Ollama execution, FAISS building, and KILT downloads. If the local KILT files are already present, it will also resolve the exact sampled question count and KB tag. If the local KILT files are absent, it reports the planned paths and configuration without downloading the dataset.
python -m edgerag.cli.run --config configs/phase1.json --verbose_stream --first_token_timeout_s 300 --stream_timeout_s 600Installed console-script equivalent:
edgerag-run --config configs/phase1.json --verbose_stream --first_token_timeout_s 300 --stream_timeout_s 600python -m edgerag.analysis.base --config configs/phase1.json --results results/results.jsonl --outdir artifacts/paper/sample_outputs/base_analysispython -m edgerag.analysis.sas --config configs/phase1.json --results results/results.jsonl --outdir artifacts/paper/sample_outputs/sas_analysisSkip SAS scoring but keep a SAS-shaped output path:
python -m edgerag.analysis.sas --config configs/phase1.json --results results/results.jsonl --outdir artifacts/paper/sample_outputs/sas_analysis --skip_saspython -m edgerag.cli.rebuild_resume --config configs/phase1.jsonpython -m edgerag.cli.estimate_runtime --results_dir results_The canonical run writes to the configured results_dir (default: results_phase1/). Important files include:
results_phase1/results.jsonl— raw trial records, including failuresresults_phase1/resume_state.json— resume checkpoints keyed by run configurationresults_phase1/run_metadata.json— sidecar runtime metadata for resumability and reportingresults_phase1/live_streams/— verbose token-stream traces when--verbose_streamis enabledresults_phase1/runtime_estimate.json— optional post-hoc session estimate from live stream timestamps
Base analysis writes publication-style exports such as:
- canonicalized JSONL
- CSV summary tables
- publication CSV tables
publication_tables.xlsx- figure PNG / SVG / PDF files
summary.txt
See docs/RESULTS.md for a fuller output map.
Tracked under artifacts/paper/:
- manuscript snapshots
- uploaded spreadsheets
- uploaded PDF reports
- sample output locations for regenerated analysis
Regenerated locally:
- results JSONL
- resume state
- runtime metadata
- live stream logs
- runtime estimates
- regenerated tables and figures
- KB SQLite files and FAISS indices
- downloaded KILT corpora
These are the legacy files:
edge_rag_experiment_fix10.pyedge_rag_final_analysis_v2_5.pyedge_rag_final_analysis_v2_7_sas.pyrebuild_resume_from_results.pyestimate_gpu_runtime.pyphase1_config_fix9_v4.json
They are preserved for migration and reproducibility.
This is a local-stack evaluation repository. Interpret the reported results as specific to:
- the reduced KILT-NQ setup
- the local Ollama serving stack
- the selected timeout policy
- the selected generators and embedders
- the single-workstation hardware budget
The rankings are not meant to be universal.
- If generation repeatedly stalls, restart
ollama serveand rerun the same command. - If a run stopped mid-sweep, rerun the same command first; use the rebuild utility only when
resume_state.jsonandresults.jsonldrift apart. - If base analysis works but SAS fails, install the optional SAS dependencies and rerun.
- If you want to validate the resolved plan without touching Ollama or FAISS, use
--dry_run. - If
--dry_runreports missing local KILT files, that is expected when the dataset has not been downloaded yet.
The manuscript files provided for the refactor use blinded author placeholders, so CITATION.cff keeps anonymized review metadata rather than inventing names. The source files also did not specify a final project license, so LICENSE-TODO.txt remains intentionally conservative until the authors choose a release license.
docs/REPRODUCIBILITY.md— step-by-step paper-style workflowdocs/COMMANDS.md— command reference with examplesdocs/PROJECT_STRUCTURE.md— module-level layout explanationdocs/RESULTS.md— output files and artifact expectationsdocs/MIGRATION.md— old-to-new command mapping