GitHub - kyotoai/RMSearch: Reward-Model Search (RMSearch) toolkit that scores 'keys' (documents, agents, tools, steps) from 'queries' (questions, context) with graph acceleration.

RMSearch

RMSearch is a high intellectual search tool using reward model instead of semantic embedding model. Agentic search is a good application and RMSearch enables step by step CoT reasoning and optimize the reasoning path!
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

Search The Best Agent To Make Deep Reasoning

Here's the example of how SEIMEI works. Each agent interacts with LLM and document and makes inference. These inferences are automatically integrated by search engine and gives an answer of question.

By training search engine, we can optimize the thinking steps like o1 or deepseek-r1!!

(back to top)

The Most Intelligent Search Engine

Our proprietary search model performs better than semantic embedding model(so called vector search). The graph above shows the result of training our model (3b) and e5-mistral-7b model to search best agents. While the vector search model cannot really retrieve best agents(because problems and agents do not have similar sentences), our proprietary search model can learn what agents are needed to solve a question and retrive the best ones!!

See more details »

(back to top)

Achives State Of The Art Result

We acheved an improvement of bigcodebench/deepseek-r1 by our search engine!!

See more details »

(back to top)

Applications of SEIMEI

SEIMEI can be applied to make these useful functions!!

See more details »

(back to top)

Built With

(back to top)

Getting Started

This is an example of how you build RMSearch on local gpu or cloud server gpu. You can use it by downloading this directory into your local folder.

Prerequisites

This library requires cuda+torch environment with GPU. The memory of GPU should be higher than 12GB to run the sample.

Installation

by pip install

Install seimei (not prepared yet)
```
pip install rmsearch
```

by downloading SEIMEI repository

Download the repo

git clone https://github.com/kyotoai/RMSearch.git
cd RMSearch
pip install .

(back to top)

Usage

We are still developing this library. Please wait for it completing soon!

Quick Start

Define search instance

from rmsearch import Search
search = Search(model_name = "/workspace/llama3b-rm",
        tensor_parallel_size = 1,
        pipeline_parallel_size = 1,)

Search the most relevant keys

queries = ["How to make LLM?", "What's the capital of Japan?"] * 5
keys = ["LLM is Large Language Model which can be made ..."*7, "Japanese capital is ..."*7] * 5
output = await search(queries, keys)
print(output)

(back to top)

Train Reward Model For Your Own Tasks (For Developpers)

See the full code in /examples/example_train2.ipynb

Generate Tree Structural LLM Output

# This is an example of making agentic tree to optimize brainstorming
num_depth = 2
num_branch = 2
total_num_node = num_branch**(num_depth+1) - 2

bs_agent_tree = {}  

# Recursive function to make agentic tree
def build_bs_agent_tree(**kwargs):

    # make kwargs_ from kwargs

    def _grow(node, depth, **kwargs_):
        if depth == num_depth:
            return

        for b in range(num_branch):
            child = {
                "agent"   : "something",
                "node_ids" : node["node_ids"] + [b],
            }
            node["children"].append(child)

            _grow(child, depth + 1, **kwargs_)

    root = {"agent": "root", "node_ids": [], "children": []}
    _grow(root, 0, **kwargs_)

    return root

# Get output from LLM
def get_assistant_msg(node, **llm_kwargs):
    output = get_output(node["agent"], **llm_kwargs)  # LLM function
    return output

# Walk tree to add output
def populate_tree(node, **llm_kwargs):
    if node["agent"] is not "root":         # skip dummy root
        node["output"] = get_assistant_msg(
            node,
            **llm_kwargs,
        )

    for child in node["children"]:
        populate_tree(child, **llm_kwargs)

def check(output):
    # some function to check its novelty
    return evaluation

# Walk tree to add evaluation towards each node's output
def add_eval_to_tree(node, depth):

    if not node["output"] != None:
        if node["eval"] == []:
            node["eval"] = check(node["output"])

    for child in node["children"]:
        add_eval_to_tree(child, depth + 1)
        

# Asign agents to each node in bs_agent_tree
bs_agent_tree = build_bs_agent_tree(agents, num_depth=num_depth, num_branch=num_branch)  # bs_agent_tree : [{"agent":"root", "node_id":[0], "children":[{"agent":"agent1", "node_id":[0, 0], "children":[...]}, ...]

# Get llm output in each node
populate_tree(bs_agent_tree, **kwargs)  # bs_agent_tree = [{"agent":"root", "node_id":[0], "children":[{"agent":"agent1", "output":"...", "node_id":[0, 0], "children":[...]}, ...]

# Get evaluation of output in each node
add_eval_to_tree(bs_agent_tree, 0)  # bs_agent_tree = [{"agent":"root", "node_id":[0], "children":[{"agent":"agent1", "output":"...", "eval":"...", "node_id":[0, 0], "children":[...]}, ...]

Make dataset_list

# dataset_list should be 
# [{"query":str, "chosen_key":str, "rejected_key":str, **kwargs}, ...]
# or
# [{"query":[{"role":"user", "content":"context-and-query-to-get-key"}, ...], "chosen_key":[{"role":"assistant", "content":"chosen-LLM-agent"}, ...], "rejected_key":"chosen_key":[{"role":"assistant", "content":"rejected-LLM-agent"}, **kwargs}, ...]

def get_dataset_dict(task, chosen_msg, rejected_msg):
    if len(chosen_msg) != len(chosen_msg):
        raise Exception("chosen_msg and rejected_msg must be same size") 

    if len(chosen_msg) == 2:
        return {
            "query":[{'role': 'user', 'content':f"Give me a brainstorming sentence to solve the task below;\n\nTask:{task}"}],
            "chosen_key":[{'role': 'assistant', 'content': chosen_msg[0]["content"]}],
            "rejected_key":[{'role': 'assistant', 'content': rejected_msg[0]["content"]}],
        }
    elif len(chosen_msg) < 2:
        return None
    else:
        return {
            "query": chosen_msg[:-2] + [{'role': 'user', 'content':f"Give me a brainstorming sentence to solve the task below;\n\nTask:{task}"}],
            "chosen_key": [{'role': 'assistant', 'content': chosen_msg[-2]["content"]}],
            "rejected_key": [{'role': 'assistant', 'content': rejected_msg[-2]["content"]}],
        }


dataset_list = []
def walk(node, depth):
    global dataset_list

    if node["children"] == []:
        return 0

    else:
        num_novel = 0
        updated_score = 0
        scores = []
        bs_msgs = []
        for i, node_dict in enumerate(node["children"]):
            task = node_dict["task"]
            evaluation = node_dict["eval"]

            try:
                novelty = novelties[0]
                score = walk(node_dict, depth + 1)
                if novelty: score += 1
                updated_score += score

                for j, other_score in enumerate(scores):
                    #print(node_ids, score, other_score)
                    if score < other_score:
                        dataset_dict = get_dataset_dict(task, chosen_msg=bs_msgs[j], rejected_msg=bs_msg)
                    elif score > other_score:
                        dataset_dict = get_dataset_dict(task, chosen_msg=bs_msg, rejected_msg=bs_msgs[j])
                    else:
                        continue
                    
                    dataset_list.append(dataset_dict)
                    
                bs_msgs.append(bs_msg)
                scores.append(score)
            except:
                continue

        updated_score = updated_score / len(node["children"])

    return updated_score

for bs_agent_tree in bs_agent_trees:
    walk(bs_agent_tree, 0)

# dataset_list = [{"query":[{"role":"user", "content":"context-and-query-to-get-key"}, ...], "chosen_key":[{"role":"assistant", "content":"chosen-LLM-agent"}, ...], "rejected_key":"chosen_key":[{"role":"assistant", "content":"rejected-LLM-agent"}, **kwargs}, ...]

Train Reward Model

from rmsearch import RMTrainer
from trl import RewardConfig
from peft import LoraConfig, TaskType

model_name = "/workspace/llama3b-rm"
exp_dir = "/workspace/exp1"
model_save_dir = f"{exp_dir}/model1"
test_size = 6
num_gpus = 1
batch_size_per_device = 3
eval_batch_size_per_device = 1

rmtrainer = RMTrainer(model_name = model_name, num_gpus = num_gpus)
formatted_dataset = rmtrainer.prepare_dataset(dataset_list, base_dir = exp_dir, test_size = test_size)

training_args = RewardConfig(
    output_dir=model_save_dir,
    per_device_train_batch_size=batch_size_per_device,
    per_device_eval_batch_size=eval_batch_size_per_device,
    eval_strategy="steps",
    eval_steps=1,
    eval_on_start=True,
    save_steps=10,
    logging_steps=1,
    num_train_epochs = 50,
    report_to=None,
    remove_unused_columns=False,
)

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    target_modules=["k_proj","q_proj","o_proj", "v_proj","down_proj","gate_proj","up_proj",],
    layers_to_transform=[25,26,27],
    r=16,
    lora_alpha=16,
    lora_dropout=0.1,
)

rmtrainer.train(formatted_dataset, training_args = training_args, peft_config = peft_config)

(back to top)

Roadmap

Search

Async vLLM
- Automatic Compatibility Solver

Train

Reward Trainer
Examples of Making MCTS

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

Top contributors:

License

Distributed under the Apache-2.0 License. See LICENSE.txt for more information.

(back to top)

Contact

KyotoAI Inc. - office@kyotoai.org

KyotoAI homepage: https://kyotoai.org

Project Link: https://github.com/kyotoai/RMSearch

(back to top)

Acknowledgments

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
demo		demo
examples		examples
images		images
notes		notes
paper_writing		paper_writing
rmsearch		rmsearch
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RMSearch

About The Project

Search The Best Agent To Make Deep Reasoning

The Most Intelligent Search Engine

Achives State Of The Art Result

Applications of SEIMEI

Built With

Getting Started

Prerequisites

Installation

Usage

Quick Start

Train Reward Model For Your Own Tasks (For Developpers)

Roadmap

Search

Train

Contributing

Top contributors:

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

kyotoai/RMSearch

Folders and files

Latest commit

History

Repository files navigation

RMSearch

About The Project

Search The Best Agent To Make Deep Reasoning

The Most Intelligent Search Engine

Achives State Of The Art Result

Applications of SEIMEI

Built With

Getting Started

Prerequisites

Installation

Usage

Quick Start

Train Reward Model For Your Own Tasks (For Developpers)

Roadmap

Search

Train

Contributing

Top contributors:

License

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages