- π€ Agentic-RAG-RL: Agentic Retrieval-Augmented Generation with Reinforcement Learning π
Agentic-RAG-RL is a personal project by meghanaNanuvala to build an Agentic Retrieval-Augmented Generation (RAG) system with autonomous search and reasoning skills through reinforcement learning.
Chinese Language Version:
English Language Version:
Agentic RAG combines two powerful concepts:
- RetrievalβAugmented Generation (RAG): Combines generative power with onβtheβfly retrieval from external knowledge bases, ensuring factual and upβtoβdate answers.
- Agentic AI: Gives the model the ability to decide when to retrieve, what to retrieve, and how to weave the retrieved evidence into its reasoning.
Our architecture is inspired by TCβRAG and features an agent memory stack that orchestrates the full deliberation loop, supporting the following actions:
- Plan (β)
- Reasoning (β )
- Backtrack (β )
- Summary (β )
- Tool Observation β wiki/document/knowledgeβgraph search, etc. (β )
- Conclusion (β )
Motivated by DeepSeek-R1, we apply GRPO (Generalized Relevance Policy Optimization) to reinforce the agent's choice of reasoning steps and retrieval actions, effectively boosting both search depth and answer quality.
We use conda to manage the environment. Follow these steps to set up:
conda create -n AgenticRAG python=3.11 -y
conda activate AgenticRAG
pip install -r requirements.txtWe provide our search tool repository ArtSearch as the search engine, which supports retrieval of information from Wikipedia. You can follow the instructions in that repository to deploy a local instance of the search system.
.
βββ ArtSearch # Search tool integration
βββ checkpoints # Model checkpoints
βββ examples # Example use cases
βββ experiments
β βββ evaluation # Evaluation scripts and results
β βββ training # Training configurations
βββ README.md
βββ requirements.txt
βββ script
β βββ evaluation # Evaluation scripts
β βββ run_server.sh # Server deployment script
β βββ training # Training scripts
βββ service
β βββ chat_client.py # Client for interacting with the model
β βββ chat_server.py # Server for hosting the model
βββ src
β βββ config # Configuration files
β βββ data # Data processing utilities
β βββ evaluation # Evaluation metrics and tools
β βββ models # Model definitions
β βββ train.py # Main training script
β βββ utils # Utility functions
Follow the steps below to get up and running with Agentic-RAG-RL.
Before you start, rename file ".env_format" to ".env" and fill the necessary OS environment variables.
- Zeroβ2 Mode
./script/training/train_zero2.sh
- Zeroβ3 Mode
./script/training/train_zero3.sh
- Example Mode
comming soon~
- Server Mode
Launch the chat server:
./script/run_server.sh
-
LoRA Tuning Support π§: Fine-tune efficiently with Low-Rank Adaptation
-
Model Quant Support π»: Support model quant to nf4 and ..
-
Custom Agent Tools π οΈ: Integrate your own tools and personal RAG datasets
-
Distributed Training π: Support for Deepspeed Zero 2 Stage and Zero 3 Stage
-
Efficient Resource Usage π»: Support for models up to 32B parameters using only 2 A100 GPUs
-
Tool Calling Reward π―: Enhanced reward model that includes:
- Accuracy reward
- Format reward
- RAG accuracy reward using the RAGAS framework
The total reward is calculated as:
$$r_{total} = r_{accuracy} + r_{format} + r_{rag}$$ -
TCRAG Integration π: Use TCRAG as the rollout generator
We have made our training logs publicly available at: SwanLab Training Log
Our Qwen 2.5-7B-Instruct model was evaluated on the MedQA test set using Qwenβ2.5β72B as the judge:
| Configuration | Format Accuracy | Answer Accuracy |
|---|---|---|
| Before fine-tuning | 39% | 84% |
| Before fine-tuning + search | 56% | 79% |
| After fine-tuning (200 steps) + search | 92% | 87% |
If you use this work in your research, please cite:
@misc{Agentic_RAG_RL,
title = {Agentic RAG-RL: Agentic Retrieval-Augmented Generation with Reinforcement Learning},
author = {Xinke Jiang, Jiaran Gao, Rihong Qiu, Wentao Zhang, Yue Fang, Hongxin Ding, Yifan Dai},
year = {2025},
note = {GitHub repository},
}This project is licensed under the Apache License. See the LICENSE file for details.






