Create a basic scheduler by correaswebert · Pull Request #84 · georgia-tech-db/TokenSmith

correaswebert · 2026-02-20T15:31:52Z

This reruns historical queries with higher RAG parameters created in a new config file. Apart from normal retrieval flow with a cross-encoder, also looking into MMR reranking. MMR runs on top of the cross-encoder chunk outputs to reduce the possibility of chunk redundancy. This might arise due to higher number of chunks retrieved.

shahmeer99 · 2026-03-13T07:48:16Z

@correaswebert this PR is a bit confusing because I assume you forked the repo rather than create a branch. Firstly, I like the idea of having a scheduler and comparator so we can comparatively replay question responses across different configs. Secondly, I also see the potential value for having MMR ontop of CE reranking for relatively vague/generic queries.

However, there are several issues here:

The logging mechanism has significantly changes since this PR was written I assume. Log files are made PER question, not per session. Thus, each log file has the info for one question and does NOT record the session info. You may fix this by adding some kind of "session start timestamp" at the start of each session (for both main and api_server respectively). Ideally, you create the session ID once, whenever the logger is created because its only ever created once per session (confirm this though). After you add this session id or session start timestamp to each log file you may then gather question logs per session and compare them.
Validation for using MMR. You may look at PR 76 and see how specific test cases for the semantic cache were added (in the tests folder, both the yaml and .py file for it). Similarly, it is important to have specific test cases or questions for which MMR is outperforming plain CE reranking and no reranking. I imagine these questions will be high level concept explanation questions rather that "what is X" type questions where X is a specific term that has a one liner definition. Regardless, adding appropriate test cases to validate this is important.
IMPORTANT. These should be 2 separate branches (not PRs from forks). One branch for adding the scheduler + comparator and another for adding MMR reranking. For the former, think about how to restructure this usecase. Assuming there are logs for N different sessions, you first get the queries and configs for each session, check if their configs are different (thee configs are there in the log files currently), and then assuming you have M sessions with different configs you may display the results for the SAME questions across those M sessions. If questions are not the same across those M sessions you may have to run instances of those M session individually with the superset of all questions across those M sessions and then show the results or just do nothing if there are no overlapping questions. Regardless, its important to think about how an end user (or dev most likely) would easily view and compare different configs. Also, I dont see how the name scheduler fits here. Just keep it like a session/config comparator or something along those lines.

Overall, I am closing this PR because it is too far off the merge with main right now. Please review these points and make separate branches with the latest updated version of main and then PR those branches after implementing both the code for those changes and appropriate test cases.

correaswebert added 4 commits February 13, 2026 10:43

Add basic scheduler logic with custom overrides

7fd6a40

Merge branch 'georgia-tech-db:main' into main

def4116

Enable replaying past queries and comparing outputs

9472f13

Add higher config

f9a324f

correaswebert self-assigned this Feb 20, 2026

Implement MMR reranking

22b1689

shahmeer99 closed this Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a basic scheduler#84

Create a basic scheduler#84
correaswebert wants to merge 5 commits into
georgia-tech-db:mainfrom
correaswebert:main

correaswebert commented Feb 20, 2026

Uh oh!

shahmeer99 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

correaswebert commented Feb 20, 2026

Uh oh!

shahmeer99 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants