EdgeRAG

EdgeRAG is a reproducibility-first, publication-facing refactor of the local RAG experiment used in the paper Resource-Constrained Evaluation of Quantized Local LLMs for Retrieval-Augmented Generation.

What this repository evaluates

The experiment evaluates four pipelines under a fixed local-stack workflow:

P0 — retrieval-only
P1 — closed-book answering
P2 — standard retrieve-then-answer RAG
P3 — same-model query-rewrite RAG

The study is generator-centric. Retrieval still matters and is swept across multiple embedding models and retrieval depths, but the central question is how local quantized generators behave once lexical quality, grounding, latency, semantic adequacy, and runtime failures are measured together.

Relation to the paper

This repository is the runnable companion to the paper’s workstation-scale evaluation story.

The manuscript describes a single personal workstation with:

one NVIDIA GeForce RTX 3090 GPU with 24.0 GB VRAM
Ollama 0.17.1
FAISS 1.13.2
Python 3.11.14

The purpose of this repository is therefore not to claim universal model rankings, but to make this exact local-stack workflow inspectable, repeatable, and extendable.

Quickstart

Install the runner and base-analysis dependencies:

pip install -e ".[runner,analysis]"

Start Ollama locally, then run the canonical paper-style command:

python -m edgerag.cli.run   --config configs/phase1.json   --verbose_stream   --first_token_timeout_s 300   --stream_timeout_s 600

Regenerate the base analysis:

python -m edgerag.analysis.base   --config configs/phase1.json   --results results_phase1/results.jsonl   --outdir artifacts/paper/sample_outputs/base_analysis

Add SAS only when needed:

pip install -e ".[runner,analysis,sas]"
python -m edgerag.analysis.sas   --config configs/phase1.json   --results results_phase1/results.jsonl   --outdir artifacts/paper/sample_outputs/sas_analysis

Repository layout

.
├── README.md
├── CITATION.cff
├── LICENSE-TODO.txt
├── pyproject.toml
├── configs/
├── docs/
├── artifacts/
├── legacy/
├── src/edgerag/
└── tests/

Key locations:

configs/phase1.json — canonical stable config
src/edgerag/pipelines/runner.py — main experiment orchestration
src/edgerag/analysis/base.py — base analysis
src/edgerag/analysis/sas.py — optional SAS extension
artifacts/paper/ — paper-facing tracked artifacts and sample output locations
legacy/original_snapshot/ — preserved uploaded originals for audit comparison

Installation

Core package only:

pip install -e .

Experiment runner + base analysis:

pip install -e ".[runner,analysis]"

Add optional SAS:

pip install -e ".[runner,analysis,sas]"

Add test tooling:

pip install -e ".[dev]"

External software

The main experiment runner expects a local Ollama server. This dependency is deliberate and part of the scientific setup rather than an implementation detail.

Canonical paper-style workflow

1. Inspect the stable config

The canonical config is:

configs/phase1.json

It preserves the original settings, including:

500 deterministic KILT-NQ development questions
subset_mode=random
subset_seed=123
reduced gold_plus_random KB construction
100,000 random background passages plus all gold provenance passages
three embedders
twelve generator models in the current snapshot
first-token timeout of 300 seconds
stream timeout of 600 seconds

2. Resolve the plan safely before running

python -m edgerag.cli.run --config configs/phase1.json --dry_run

Dry-run avoids Ollama execution, FAISS building, and KILT downloads. If the local KILT files are already present, it will also resolve the exact sampled question count and KB tag. If the local KILT files are absent, it reports the planned paths and configuration without downloading the dataset.

3. Run the experiment

python -m edgerag.cli.run   --config configs/phase1.json   --verbose_stream   --first_token_timeout_s 300   --stream_timeout_s 600

Installed console-script equivalent:

edgerag-run   --config configs/phase1.json   --verbose_stream   --first_token_timeout_s 300   --stream_timeout_s 600

4. Regenerate the base analysis outputs

python -m edgerag.analysis.base   --config configs/phase1.json   --results results/results.jsonl   --outdir artifacts/paper/sample_outputs/base_analysis

5. Regenerate SAS outputs only when needed

python -m edgerag.analysis.sas   --config configs/phase1.json   --results results/results.jsonl   --outdir artifacts/paper/sample_outputs/sas_analysis

Skip SAS scoring but keep a SAS-shaped output path:

python -m edgerag.analysis.sas   --config configs/phase1.json   --results results/results.jsonl   --outdir artifacts/paper/sample_outputs/sas_analysis   --skip_sas

6. Rebuild resume state only if needed

python -m edgerag.cli.rebuild_resume --config configs/phase1.json

7. Estimate runtime from live stream logs

python -m edgerag.cli.estimate_runtime --results_dir results_

Expected outputs

The canonical run writes to the configured results_dir (default: results_phase1/). Important files include:

results_phase1/results.jsonl — raw trial records, including failures
results_phase1/resume_state.json — resume checkpoints keyed by run configuration
results_phase1/run_metadata.json — sidecar runtime metadata for resumability and reporting
results_phase1/live_streams/ — verbose token-stream traces when --verbose_stream is enabled
results_phase1/runtime_estimate.json — optional post-hoc session estimate from live stream timestamps

Base analysis writes publication-style exports such as:

canonicalized JSONL
CSV summary tables
publication CSV tables
publication_tables.xlsx
figure PNG / SVG / PDF files
summary.txt

See docs/RESULTS.md for a fuller output map.

Tracked artifacts vs regenerated outputs

Tracked under artifacts/paper/:

manuscript snapshots
uploaded spreadsheets
uploaded PDF reports
sample output locations for regenerated analysis

Regenerated locally:

results JSONL
resume state
runtime metadata
live stream logs
runtime estimates
regenerated tables and figures
KB SQLite files and FAISS indices
downloaded KILT corpora

Legacy

These are the legacy files:

edge_rag_experiment_fix10.py
edge_rag_final_analysis_v2_5.py
edge_rag_final_analysis_v2_7_sas.py
rebuild_resume_from_results.py
estimate_gpu_runtime.py
phase1_config_fix9_v4.json

They are preserved for migration and reproducibility.

Scope and limitations

This is a local-stack evaluation repository. Interpret the reported results as specific to:

the reduced KILT-NQ setup
the local Ollama serving stack
the selected timeout policy
the selected generators and embedders
the single-workstation hardware budget

The rankings are not meant to be universal.

Troubleshooting

If generation repeatedly stalls, restart ollama serve and rerun the same command.
If a run stopped mid-sweep, rerun the same command first; use the rebuild utility only when resume_state.json and results.jsonl drift apart.
If base analysis works but SAS fails, install the optional SAS dependencies and rerun.
If you want to validate the resolved plan without touching Ollama or FAISS, use --dry_run.
If --dry_run reports missing local KILT files, that is expected when the dataset has not been downloaded yet.

Citation and license status for this snapshot

The manuscript files provided for the refactor use blinded author placeholders, so CITATION.cff keeps anonymized review metadata rather than inventing names. The source files also did not specify a final project license, so LICENSE-TODO.txt remains intentionally conservative until the authors choose a release license.

Further documentation

docs/REPRODUCIBILITY.md — step-by-step paper-style workflow
docs/COMMANDS.md — command reference with examples
docs/PROJECT_STRUCTURE.md — module-level layout explanation
docs/RESULTS.md — output files and artifact expectations
docs/MIGRATION.md — old-to-new command mapping

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EdgeRAG

What this repository evaluates

Relation to the paper

Quickstart

Repository layout

Installation

External software

Canonical paper-style workflow

1. Inspect the stable config

2. Resolve the plan safely before running

3. Run the experiment

4. Regenerate the base analysis outputs

5. Regenerate SAS outputs only when needed

6. Rebuild resume state only if needed

7. Estimate runtime from live stream logs

Expected outputs

Tracked artifacts vs regenerated outputs

Legacy

Scope and limitations

Troubleshooting

Citation and license status for this snapshot

Further documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
artifacts		artifacts
configs		configs
docs		docs
legacy		legacy
src/edgerag		src/edgerag
tests		tests
CITATION.cff		CITATION.cff
LICENSE-TODO.txt		LICENSE-TODO.txt
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

EdgeRAG

What this repository evaluates

Relation to the paper

Quickstart

Repository layout

Installation

External software

Canonical paper-style workflow

1. Inspect the stable config

2. Resolve the plan safely before running

3. Run the experiment

4. Regenerate the base analysis outputs

5. Regenerate SAS outputs only when needed

6. Rebuild resume state only if needed

7. Estimate runtime from live stream logs

Expected outputs

Tracked artifacts vs regenerated outputs

Legacy

Scope and limitations

Troubleshooting

Citation and license status for this snapshot

Further documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages