Semantic Caching with Bi-Direction Encoder and Cross Encoder by sureshkumarsrinath · Pull Request #76 · georgia-tech-db/TokenSmith

sureshkumarsrinath · 2026-02-06T06:48:32Z

Adding a simple Cross-Encoder similarity based semantic cache with cache benchmark tests. This is simple benchmark to establish the baseline for cache performance.

shahmeer99 · 2026-03-13T04:24:36Z

@@ -0,0 +1,159 @@
+from yaml import Node


shahmeer99 · 2026-03-13T04:24:45Z

+# Global cache and constants
+# -----------------------------
+SEMANTIC_CACHE: Dict[str, Deque[Dict[str, Any]]] = {}
+SEMANTIC_CACHE_THRESHOLD = 0.85


shahmeer99 · 2026-03-13T04:28:12Z

-
-    # Step 1: Get chunks (golden, retrieved, or none)
+
+    normalized_question = normalize_question(question)


Why are these definitions done if cfg.semantic_cache_enabled is not True. Wasted computation even when semantic caching is turned off

shahmeer99

I have merged the current main and added some comments in main to make it more clear (those were my second last and third last commits in this branch). I have updated the testing as well (this is my last commit) to cover the concern (CONCERN 1) I explain below. The testing (last commit) adds:

Adds "trick_variations" in cache benchmark YAML. These were partially model generated and partially generated by me. They are not the best at the moment and you should update them later (as will I). The idea here is that we want to check if the cache will give us a hit for a variation that sounds really similar to the original question but in reality does NOT have the same answer (this a hit will give us a false positive and results in the wrong answer). The script has been updated to reflect this part of the test as well. You original hit rate metric remains as is (only the output print has changed). I want the EMPHASIZE here that the trick_variations were quickly created by me and Claude but we need to come up with better trick_variation cases to really test this problem.

CONCERN 1: how to choose optimal thresholds?

Where are these 2 values coming from:
BI_ENCODER_THRESHOLD = 0.40
CROSS_ENCODER_THRESHOLD = 0.75

These thresholds essentially determine what constitutes a hit in the semantic cache and with that we run the risk of returning a completely irrelevant answer in case of a false positive hit (this is the worst case scenario!) or not getting any hits at all (wasted computations). We need a comprehensive benchmark and suite of tests to make sure we are hitting some kind of sweet spot between these 2 issues here. Could you elaborate on how you chose these 2 values to begin with?

shahmeer99 · 2026-03-13T05:08:55Z

@sureshkumarsrinath The idea is actually good and I think this is a feature that will help immensely in the long run. But we need some better testing about what makes a semantic cache good (threshold values, etc). I have left a detailed comment explaining my changes and my concerns and some future steps (also left some small syntax comments here and there). One more thing beyond that, have you considered using a ColBERT approach (maxSIM at token level across 2 queries)? It is a bit expensive but can be heavily parallelized and can be potentially used at the final stage but I am not sure if it'll help or not.

sureshkumarsrinath · 2026-03-13T05:38:44Z

@sureshkumarsrinath The idea is actually good and I think this is a feature that will help immensely in the long run. But we need some better testing about what makes a semantic cache good (threshold values, etc). I have left a detailed comment explaining my changes and my concerns and some future steps (also left some small syntax comments here and there). One more thing beyond that, have you considered using a ColBERT approach (maxSIM at token level across 2 queries)? It is a bit expensive but can be heavily parallelized and can be potentially used at the final stage but I am not sure if it'll help or not.
@shahmeer99 Thanks. I will note your suggestions here and work on improving this feature.

sureshkumarsrinath · 2026-04-02T21:34:23Z

I performed an exhaustive benchmark across several combinations of the threshold values against an adversarial dataset to determine the sweet spot between Accuracy (correctly retrieving the paraphrased queries) and False Positive Rate (returning entirely wrong answers).

As serving an irrelevant answer due to a false positive match is absolutely not acceptable and goes against the fundamental user experience, I've intentionally optimized the system for safety over aggressive caching. I've turned both of these values up high to ensure a 0% false positive rate:

BI Threshold	Cross Threshold	Accuracy	False Positive Rate
0.85	0.99	99.3%	1.3%
0.90	0.99	97.3%	0.7%
0.95	0.99	62.0%	0.0%

sureshkumarsrinath · 2026-04-02T23:26:23Z

This is the just the baseline cache implementation for the system. Further improvement to be done as flag enabled implementation

shahmeer99 · 2026-04-09T05:30:45Z

 use_double_prompt: false
 enable_history: true
 max_history_turns: 3
+semantic_cache_enabled: true


default should be false

shahmeer99 · 2026-04-09T06:03:48Z

@sureshkumarsrinath This looks good to me. Thanks for considering the extra false positive testing. Like you said, its a great start for this feature and can be improved upon in later iterations.

Here are some proposed changes:

MINOR - Change the ABC class name from Cache to something more specific. Cache itself is too generic and may be confusing.
MINOR - keep the default value for use semantic cache in config.yaml as false.
MAJOR - We still need a log file for queries that hit the cache. The log should have a bool (something like semantic_cache_hit = T/F) that captures whether this query hit the cache or not. But it should still have the same info regarding retrieved_chunks and config from the actual query that it takes the answer from (the one in the cache). This requires some thought: (1) do you copy paste the log of the original query with the semantic_cache_hit as T? (2) Do you just reference the log file of the query was used from the cache? (3) Is there another better way to do this? This requires some thought but it is important that each query generated a log file regardless of whether it hits the cache or not.

Note: PR 102 changes the logging structure. I am hoping that will be reviewed and merged by tomorrow so you can resolve point 3 regarding log file creation after that PR is merged ideally.

sureshkumarsrinath added 8 commits February 6, 2026 01:47

add semantic caching with Bi-Direction Encoder and Cross Encoder

2d68a87

initial cleanup

9cdf2a7

print the cache answer without straming

d07a908

code clean up

7e5af19

benchmark checkpoint

305b10d

fix benchmark variations

d8ee028

merge conflicts

f18adf9

add flag for semantic caching

e87e664

sureshkumarsrinath marked this pull request as ready for review March 6, 2026 03:07

sureshkumarsrinath requested review from RajShah-1, Rishit2000, correaswebert and jarulraj March 6, 2026 03:07

sureshkumarsrinath changed the title ~~[WIP] Semantic Caching with Bi-Direction Encoder and Cross Encoder~~ Semantic Caching with Bi-Direction Encoder and Cross Encoder Mar 6, 2026

Fix unit tests for semantic caching

76e02f5

correaswebert reviewed Mar 6, 2026

View reviewed changes

Comment thread src/cache.py Outdated

Comment thread src/cache.py Outdated

Comment thread src/cache.py Outdated

Comment thread src/cache.py Outdated

Comment thread tests/test_benchmarks.py Outdated

jarulraj requested a review from shahmeer99 March 6, 2026 16:26

sureshkumarsrinath and others added 5 commits March 11, 2026 02:16

resolve review comments

d539b0a

use RAGConfig methid get_config_state() to get payload for hashing key

8614e7d

add gpu support try for generator

d783bc3

Merge branch 'main' into semantic-caching

d28820c

added comments to get_answer() in mainso all steps are explained

1f27533

shahmeer99 reviewed Mar 13, 2026

View reviewed changes

Comment thread src/cache.py Outdated

@@ -0,0 +1,159 @@

from yaml import Node

Copy link
Copy Markdown

Contributor

shahmeer99 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

shahmeer99 reviewed Mar 13, 2026

View reviewed changes

Comment thread src/generator.py

updated semantic cache testing to include false positive hit checking

f433a41

shahmeer99 reviewed Mar 13, 2026

View reviewed changes

fix review comments

a477f41

merge main into feature branch for latest updates

c8f6fbb

shahmeer99 added the enhancement New feature or request label Mar 13, 2026

shahmeer99 mentioned this pull request Mar 13, 2026

Create a basic scheduler #84

Closed

sureshkumarsrinath added 2 commits April 1, 2026 10:01

resolve conflicts

7faef64

resolve merge conflicts

221e6f3

sureshkumarsrinath added 4 commits April 2, 2026 17:36

fix the cache threshold

af7d65e

fixed interface and add no-op cache

0e27032

use the same cache for every call

9a3f982

clean up PR

ce80f6e

Merge branch 'main' into semantic-caching

7a4003a

shahmeer99 reviewed Apr 9, 2026

View reviewed changes

sureshkumarsrinath added 4 commits April 9, 2026 08:33

fix review comments

bc765ba

resolve conflicts:

1ca44f2

fix the enabled flag

8315008

update with main

3743200

shahmeer99 approved these changes Apr 17, 2026

View reviewed changes

shahmeer99 merged commit f146e92 into main Apr 17, 2026
1 check passed


		# Step 1: Get chunks (golden, retrieved, or none)

		normalized_question = normalize_question(question)

Conversation

sureshkumarsrinath commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shahmeer99 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

shahmeer99 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

shahmeer99 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shahmeer99 left a comment

Choose a reason for hiding this comment

CONCERN 1: how to choose optimal thresholds?

Uh oh!

shahmeer99 commented Mar 13, 2026

Uh oh!

sureshkumarsrinath commented Mar 13, 2026

Uh oh!

sureshkumarsrinath commented Apr 2, 2026

Uh oh!

sureshkumarsrinath commented Apr 2, 2026

Uh oh!

shahmeer99 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

shahmeer99 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sureshkumarsrinath commented Feb 6, 2026 •

edited

Loading