- The contextual queries datasets are located in the
dataset_contextual_queriesdirectory - Additional datasets used in experiments are sourced from publicly available repositories. Please use them from their original locations to ensure compliance with their licenses and terms of use.
generate_scripts.py: This script generates the necessary scripts for training and evaluating the MeanCache model.fl_sim_train.py: This script is used for training the MeanCache model using Federated Learning simulation.cache_comparison.pyandeval.py: Contains basic cache comparison functions.utils.py: Contains utility functions used across the codebase.logs/: This directory contains logs generated during the training and evaluation processes.run.sh: contains configurations for running the training the scripts.
During my Redis internship, I trained embedding models for semantic caching (redis/langcache-embed-v1 and redis/langcache-embed-v2), reaching thousands of downloads on Huggingface. The corresponding research paper is Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data. This might be useful for those interested in semantic caching techniques and embedding models.