This repository provides a Document Query System powered by Light-RAG (Lightweight Retrieval-Augmented Generation), Azure OpenAI APIs, and Streamlit. The application facilitates document ingestion and querying using advanced LLMs and embedding functions.
- Document Ingestion: Supports
.txtand.csvfiles for ingestion, with automatic summarization for structured data. - Querying: Allows users to perform local, global, and hybrid search queries across ingested documents.
- Azure OpenAI Integration: Utilizes Azure OpenAI for LLM-based summarization and embeddings.
- Streamlit Interface: Provides a user-friendly interface for uploading files and querying documents.
- Python 3.8 or higher.
- Azure OpenAI access with the following endpoints and keys configured in
.envfile:- Chat Completions
- Embeddings
Create a .env file with the following content:
AZURE_OPENAI_API_KEY=<your-api-key>
AZURE_OPENAI_API_VERSION=<api-version>
AZURE_OPENAI_ENDPOINT=<api-endpoint>
AZURE_OPENAI_DEPLOYMENT=<deployment-name>
AZURE_EMBEDDING_API_KEY=<your-embedding-api-key>
AZURE_EMBEDDING_API_VERSION=<api-version>
AZURE_EMBEDDING_ENDPOINT=<api-endpoint>
AZURE_EMBEDDING_DEPLOYMENT=<embedding-deployment-name>- Clone the repository:
git clone <repository-url> cd <repository-directory>
- Install dependencies:
pip install lightrag-hku
Start the Streamlit application:
streamlit run app.py- Upload
.txtor.csvfiles via the Document Ingestion section. - For
.csvfiles, the data is automatically summarized into concise paragraphs before ingestion.
- Enter a query in the Query Documents section.
- Choose a search type:
local: Search within specific documents.global: Search across all documents.hybrid: Combines local and global search results.
- View query results in a structured format.
-
Ingestion:
- CSV files are summarized into narrative paragraphs using Azure OpenAI.
- Text files are directly processed.
- Content is ingested into the Light-RAG system in the form of relationships in json format
- Note : You can also store the generated graphml & json files in MongoDB Cluster
-
Querying:
- Queries are executed using Light-RAG's
aqueryfunction. - Results are displayed in JSON or text format based on their structure.
- Queries are executed using Light-RAG's
-
Integration:
- Azure OpenAI APIs handle LLM queries and embeddings.
LightRAGprovides efficient document retrieval and query handling.
-
Streamlit Interface:
- File upload and query submission are handled through an intuitive web interface.
- Streamlit: For building the user interface.
- LightRAG: For lightweight RAG functionalities.
- Azure OpenAI: For LLM and embedding services.
- Pandas: For CSV processing.
- NumPy: For embedding array manipulation.
- Data Visualization using Neo4j-DB for Admin Purposes
- Add support for additional file types (e.g., PDFs).
- Include advanced query filters and ranking.
- Expand embedding support for multilingual documents.
This project uses LightRAG for efficient retrieval-augmented generation and Azure OpenAI for state-of-the-art LLM functionalities.