🤖 Agentic-RAG-RL: Agentic Retrieval-Augmented Generation with Reinforcement Learning 🚀

Introduction 🌟

Agentic-RAG-RL is a personal project by meghanaNanuvala to build an Agentic Retrieval-Augmented Generation (RAG) system with autonomous search and reasoning skills through reinforcement learning.

Chinese Language Version:

English Language Version:

What is Agentic RAG? 💡

Agentic RAG combines two powerful concepts:

Retrieval‑Augmented Generation (RAG): Combines generative power with on‑the‑fly retrieval from external knowledge bases, ensuring factual and up‑to‑date answers.
Agentic AI: Gives the model the ability to decide when to retrieve, what to retrieve, and how to weave the retrieved evidence into its reasoning.

Architecture 🏗️

Our architecture is inspired by TC‑RAG and features an agent memory stack that orchestrates the full deliberation loop, supporting the following actions:

Plan (❌)
Reasoning (✅)
Backtrack (✅)
Summary (✅)
Tool Observation – wiki/document/knowledge‑graph search, etc. (✅)
Conclusion (✅)

Training Strategy 🧠

Motivated by DeepSeek-R1, we apply GRPO (Generalized Relevance Policy Optimization) to reinforce the agent's choice of reasoning steps and retrieval actions, effectively boosting both search depth and answer quality.

Rollout Generation 🔄

Installation 🛠️

We use conda to manage the environment. Follow these steps to set up:

conda create -n AgenticRAG python=3.11 -y
conda activate AgenticRAG 
pip install -r requirements.txt

Tools Environment (Optional) 🧰

We provide our search tool repository ArtSearch as the search engine, which supports retrieval of information from Wikipedia. You can follow the instructions in that repository to deploy a local instance of the search system.

Folder Structure 📁

.
├── ArtSearch                 # Search tool integration
├── checkpoints               # Model checkpoints
├── examples                  # Example use cases
├── experiments
│   ├── evaluation            # Evaluation scripts and results
│   └── training              # Training configurations
├── README.md
├── requirements.txt
├── script
│   ├── evaluation            # Evaluation scripts
│   ├── run_server.sh         # Server deployment script
│   └── training              # Training scripts
├── service
│   ├── chat_client.py        # Client for interacting with the model
│   └── chat_server.py        # Server for hosting the model
├── src
│   ├── config                # Configuration files
│   ├── data                  # Data processing utilities
│   ├── evaluation            # Evaluation metrics and tools
│   ├── models                # Model definitions
│   ├── train.py              # Main training script
│   └── utils                 # Utility functions

Quick Start ⚡

Follow the steps below to get up and running with Agentic-RAG-RL.

Before you start, rename file ".env_format" to ".env" and fill the necessary OS environment variables.

Training

Zero‑2 Mode

./script/training/train_zero2.sh

Zero‑3 Mode

./script/training/train_zero3.sh

Inference

Example Mode

comming soon~

Server Mode

Launch the chat server:

./script/run_server.sh

Features ✨

LoRA Tuning Support 🔧: Fine-tune efficiently with Low-Rank Adaptation
Model Quant Support 💻: Support model quant to nf4 and ..
Custom Agent Tools 🛠️: Integrate your own tools and personal RAG datasets
Distributed Training 🌐: Support for Deepspeed Zero 2 Stage and Zero 3 Stage
Efficient Resource Usage 💻: Support for models up to 32B parameters using only 2 A100 GPUs
Tool Calling Reward 🎯: Enhanced reward model that includes:
- Accuracy reward
- Format reward
- RAG accuracy reward using the RAGAS framework
The total reward is calculated as:

$$r_{total} = r_{accuracy} + r_{format} + r_{rag}$$
TCRAG Integration 🔗: Use TCRAG as the rollout generator

Results 📊

Experiment Log on Qwen 2.5-7B-Instruct

We have made our training logs publicly available at: SwanLab Training Log

Results on MedQA Test Set 🏥

Our Qwen 2.5-7B-Instruct model was evaluated on the MedQA test set using Qwen‑2.5‑72B as the judge:

Configuration	Format Accuracy	Answer Accuracy
Before fine-tuning	39%	84%
Before fine-tuning + search	56%	79%
After fine-tuning (200 steps) + search	92%	87%

Citation 📝

If you use this work in your research, please cite:

@misc{Agentic_RAG_RL,
  title       = {Agentic RAG-RL: Agentic Retrieval-Augmented Generation with Reinforcement Learning},
  author      = {Xinke Jiang, Jiaran Gao, Rihong Qiu, Wentao Zhang, Yue Fang, Hongxin Ding, Yifan Dai},
  year        = {2025},
  note        = {GitHub repository},
}

License 📄

This project is licensed under the Apache License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ArtSearch		ArtSearch
examples		examples
script		script
service		service
src		src
.env_format		.env_format
.gitignore		.gitignore
.gitmodules		.gitmodules
Important Instructions.md		Important Instructions.md
LICENSE.txt		LICENSE.txt
README.md		README.md
rag_rl_client.html		rag_rl_client.html
requirements.txt		requirements.txt
run		run
setup.py		setup.py
tester.html		tester.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Agentic-RAG-RL: Agentic Retrieval-Augmented Generation with Reinforcement Learning 🚀

Table of Contents

Introduction 🌟

What is Agentic RAG? 💡

Architecture 🏗️

Training Strategy 🧠

Rollout Generation 🔄

Installation 🛠️

Tools Environment (Optional) 🧰

Folder Structure 📁

Quick Start ⚡

Training

Inference

Features ✨

Results 📊

Experiment Log on Qwen 2.5-7B-Instruct

Results on MedQA Test Set 🏥

Citation 📝

License 📄

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Agentic-RAG-RL: Agentic Retrieval-Augmented Generation with Reinforcement Learning 🚀

Table of Contents

Introduction 🌟

What is Agentic RAG? 💡

Architecture 🏗️

Training Strategy 🧠

Rollout Generation 🔄

Installation 🛠️

Tools Environment (Optional) 🧰

Folder Structure 📁

Quick Start ⚡

Training

Inference

Features ✨

Results 📊

Experiment Log on Qwen 2.5-7B-Instruct

Results on MedQA Test Set 🏥

Citation 📝

License 📄

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages