This project is a modularized version of the ChromaDB Retrieval-Augmented Generation (RAG) chat application based on the Hugging Face cookbook tutorial: Semantic Cache with Chroma Vector Database. The application combines semantic caching and language model-based text generation to provide relevant and contextual responses to user queries.
The ChromaDB RAG Chat Application utilizes the following components:
- Dataset: The application loads a dataset (
keivalya/MedQuad-MedicalQnADataset) using thedatasetslibrary and prepares it for further processing. - Vector Database: The loaded dataset is stored in a ChromaDB collection, which serves as a vector database for efficient retrieval of relevant documents based on user queries.
- Semantic Cache: The application implements a semantic cache (
SemanticCache) that stores previously asked questions, their embeddings, answers, and response texts. The cache uses the FAISS library for efficient similarity search and the Sentence Transformers library for encoding questions into embeddings. - Language Model: The application utilizes the
mistralai/Mistral-7B-Instruct-v0.1language model for generating responses based on the retrieved context from the vector database or the semantic cache.
- Modularized Structure: The application follows a modularized structure, separating different functionalities into individual files. This modular approach enhances code organization, reusability, and maintainability.
- Semantic Caching: The application employs a semantic cache that stores previous user queries, their embeddings, and corresponding responses. When a new query is asked, the cache is searched for similar questions using the FAISS library. If a similar question is found, the cached response is returned, reducing the need for database retrieval and language model inference.
- Vector Database: The application uses ChromaDB, a vector database, to store and retrieve relevant documents based on user queries. ChromaDB enables efficient similarity search, allowing the application to find the most relevant context for generating responses.
- Language Model Integration: The application integrates the
mistralai/Mistral-7B-Instruct-v0.1language model using theLLMModuleclass. The language model is used to generate contextual responses based on the retrieved context from the vector database or the semantic cache.
- Install the required dependencies:
datasetschromadbfaisssentence_transformerstransformers
- Run the
main.pyscript to start the chat application. - Enter user queries in the chat interface. The application will retrieve relevant context from the semantic cache or the ChromaDB vector database and generate responses using the language model.
- To exit the chat, type 'quit'.
dataset.py: Contains functions for loading and preparing the dataset.vectordb.py: Defines functions for creating and interacting with the ChromaDB vector database.semantic_cache.py: Implements the semantic cache functionality using FAISS and Sentence Transformers.llm_module.py: Defines theLLMModuleclass for loading and utilizing the language model.main.py: The main script that orchestrates the chat application by combining the dataset, vector database, semantic cache, and language model.
