BrowseNet: Graph-Based Associative Memory for Contextual Information Retrieval

BrowseNet is a graph-based associative memory framework for Retrieval-Augmented Generation (RAG) that decomposes multi-hop queries into a directed acyclic graph (DAG), termed a query-subgraph, and traverses a Graph-of-Chunks to retrieve structured, reasoning-aligned context for LLMs.

Figure 1: BrowseNet overview.

BrowseNet achieves state-of-the-art performance while being highly cost-efficient, reducing LLM costs by ~33× compared to the previous SOTA, HippoRAG-2, without a significant latency trade-off. This makes BrowseNet well-suited for large-scale, cost-sensitive RAG scenarios.

Figure 2: (a) Average retrieval performance across the 2WikiMQA, HotpotQA, and MuSiQue datasets. (b) Latency and cost comparison between BrowseNet and HippoRAG-2.

Note

Latency is measured as Average Time Per Query (ATPQ) over 50 sampled questions from the MuSiQue dataset and may vary depending on system hardware, runtime environment, and configuration settings. Cost analysis is based on the HotpotQA benchmark and reflects the full pipeline cost using gpt-4o-mini in both systems, computed with OpenAI’s API pricing ($0.15 per 1M input tokens and $0.6 per 1M output tokens, as of Sept 24, 2025).

Environment Setup

Set up the Conda environment and install the required dependencies:

conda env create -f environment.yml

To enable semantic keyword matching, ColBERTv2 is required. Download the pre-trained checkpoint, extract it into src/indexer/exp/

cd src/indexer/exp
wget https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz
tar -xvzf colbertv2.0.tar.gz

Dataset Format

All benchmark datasets used in this study are available under ./datasets/.

To evaluate BrowseNet on a new dataset, follow the steps below.

1️⃣ Create Dataset Directory

Create a new folder under ./datasets/ named ./datasets/<dataset_name>/.
The folder name must match the dataset name.

2️⃣ Required Files

Upload the following files to the dataset folder:

corpus.json — document corpus
questions.json — multi-hop queries

3️⃣ `corpus.json` Format

Each entry represents a passage in the corpus.

[
  {
    "title": "<title of the passage>",
    "text": "<passage>"
  }
]

4️⃣ `questions.json` Format

Each entry corresponds to a multi-hop query.
The fields gold_ids, edge_list, and answer are optional and are used for evaluating retrieval, Graph-of-Chunks construction, and answer generation stages respectively.

[
  {
    "question": "<multi-hop query>",
    "gold_ids": "<list of corpus indices required to answer the question>",
    "edge_list": "<list of edges in the query-subgraph>",
    "answer": "<ground-truth answer>"
  }
]

⚙️ Environment Variables

All parameters must be defined as environment variables. A sample .env file is provided in the repository.

🔑 API Keys

OPENAI_API_KEY — API key for OpenAI-based models
DEEPSEEK_API_KEY — API key for DeepSeek models (optional)

🖥️ System Configuration

DEVICE — Device for loading encoder models (cuda or cpu)

🕸️ Graph-of-Chunks Construction

COLBERT_THRESHOLD — Threshold for synonym matching using ColBERTv2 (default: 0.9)

📊 Dataset & Retrieval

DATASET — Dataset name (2wikimqa, hotpotqa, musique)
RETRIEVAL_METHOD — Retrieval strategy (browsenet or naiverag)
N_SUBGRAPHS — Number of subgraphs retrieved per query
N_CHUNKS — Number of chunks used for answer generation (default: 5)
ALPHA — Sparse retriever weight in hybrid retrieval (0–1, set to 0 in this study)

🧠 Models

NER_MODEL — Named entity recognition model (gliner or gpt-4o)
SEM_MODEL — Dense embedding model (miniLM, stella, nvembedv2, granite, qwen2)
SUBQUERY_MODEL — LLM for query decomposition (gpt-4o, o4-mini, deepseek-reasoner)
LLM — LLM provider (openai or deepseek)
MODEL — LLM variant for QA (gpt-4o, gpt-3.5-turbo)

Once all environment variables are configured, running main.py script will index the corpus, perform retrieval, and generate answers for the selected dataset.

Citation

If you find BrowseNet useful in your research, please cite our work:

@inproceedings{
s2026browsenet,
title={BrowseNet: Knowledge Graph-Based Associative Memory for Contextual Information Retrieval},
author={PAVAN KUMAR S and Kiran Kumar Nakka and C Vamshi Krishna Reddy and Divyateja Pasupuleti and Prakhar Agarwal and Harpinder Jot Singh and Anshu Avinash and Nirav Pravinbhai Bhatt},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=2q5CugVPoK}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BrowseNet: Graph-Based Associative Memory for Contextual Information Retrieval

Environment Setup

Dataset Format

1️⃣ Create Dataset Directory

2️⃣ Required Files

3️⃣ `corpus.json` Format

4️⃣ `questions.json` Format

⚙️ Environment Variables

🔑 API Keys

🖥️ System Configuration

🕸️ Graph-of-Chunks Construction

📊 Dataset & Retrieval

🧠 Models

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
datasets		datasets
images		images
src		src
.env		.env
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

BrowseNet: Graph-Based Associative Memory for Contextual Information Retrieval

Environment Setup

Dataset Format

1️⃣ Create Dataset Directory

2️⃣ Required Files

3️⃣ corpus.json Format

4️⃣ questions.json Format

⚙️ Environment Variables

🔑 API Keys

🖥️ System Configuration

🕸️ Graph-of-Chunks Construction

📊 Dataset & Retrieval

🧠 Models

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3️⃣ `corpus.json` Format

4️⃣ `questions.json` Format

Packages