An assistant that provides constitutional analysis. This project uses Gradio to create a chatbot interface and leverages the OpenAI API (specifically the o4-mini model) for generating responses based on retrieved constitutional texts.
This is the main application file that runs the Gradio interface for the chatbot. It uses functions from the utils directory to retrieve relevant constitutional texts based on user queries. It then interacts with the OpenAI API (using the o4-mini model via the client.chat.completions.create method) to generate a legal analysis. The response function handles streaming the output from the API.
This text file contains the full text of the Constitution of India. It is used as the primary data source for the application.
This script is responsible for processing the Constitution_of_India.txt file and breaking it down into smaller, manageable chunks suitable for a Retrieval Augmented Generation (RAG) system.
Key functions:
count_tokens(text): Counts the number of tokens in a given text using thetiktokenlibrary with thecl100k_baseencoding.chunk_constitution(constitution_text): This is the main function that takes the entire constitution text as input. It splits the text hierarchically into parts, chapters, articles, and clauses. It aims to create chunks that are within aMAX_TOKENSlimit (currently 512).
The script reads from Constitution_of_India.txt and outputs a JSON file named rag_chunks_hierarchical.json containing the structured chunks.
To run this script:
python utils/create_rag_chunk.pyThis JSON file is generated by utils/create_rag_chunk.py. It stores the processed chunks of the Constitution, including metadata like part, chapter, article, and clause for each chunk. This file is used by utils/retrieve_context.py to find relevant context for user queries.
This script contains the retrieve_context function, which is responsible for finding and returning relevant sections of the constitution based on a user's query. It loads the pre-processed chunks from rag_chunks_hierarchical.json and uses sentence transformers (e.g., SentenceTransformer) to calculate semantic similarity between the query and the text chunks. It then returns the most relevant chunks, considering part-level relevance and specified limits on the number of parts and chunks.
This file lists all the Python dependencies required to run the project, such as gradio, openai, and sentence-transformers.
To install the dependencies, run:
pip install -r requirements.txtThis file is used to store environment variables, primarily the OPENAI_API.
- Set up Environment Variables: Create a
.envfile in the root directory and add your OpenAI API key:# .env OPENAI_API=your_openai_api_key - Install Dependencies:
pip install -r requirements.txt
- Generate RAG Chunks:
If
rag_chunks_hierarchical.jsonis not already present or needs to be updated:python utils/create_rag_chunk.py
- Run the Application:
This will start the Gradio application, accessible via a local URL displayed in the terminal.
python app.py