Skip to content

sreelekshman/constino

Repository files navigation

Constino 💬

An assistant that provides constitutional analysis. This project uses Gradio to create a chatbot interface and leverages the OpenAI API (specifically the o4-mini model) for generating responses based on retrieved constitutional texts.

Files

app.py

This is the main application file that runs the Gradio interface for the chatbot. It uses functions from the utils directory to retrieve relevant constitutional texts based on user queries. It then interacts with the OpenAI API (using the o4-mini model via the client.chat.completions.create method) to generate a legal analysis. The response function handles streaming the output from the API.

Constitution_of_India.txt

This text file contains the full text of the Constitution of India. It is used as the primary data source for the application.

utils/create_rag_chunk.py

This script is responsible for processing the Constitution_of_India.txt file and breaking it down into smaller, manageable chunks suitable for a Retrieval Augmented Generation (RAG) system.

Key functions:

  • count_tokens(text): Counts the number of tokens in a given text using the tiktoken library with the cl100k_base encoding.
  • chunk_constitution(constitution_text): This is the main function that takes the entire constitution text as input. It splits the text hierarchically into parts, chapters, articles, and clauses. It aims to create chunks that are within a MAX_TOKENS limit (currently 512).

The script reads from Constitution_of_India.txt and outputs a JSON file named rag_chunks_hierarchical.json containing the structured chunks.

To run this script:

python utils/create_rag_chunk.py

rag_chunks_hierarchical.json

This JSON file is generated by utils/create_rag_chunk.py. It stores the processed chunks of the Constitution, including metadata like part, chapter, article, and clause for each chunk. This file is used by utils/retrieve_context.py to find relevant context for user queries.

utils/retrieve_context.py

This script contains the retrieve_context function, which is responsible for finding and returning relevant sections of the constitution based on a user's query. It loads the pre-processed chunks from rag_chunks_hierarchical.json and uses sentence transformers (e.g., SentenceTransformer) to calculate semantic similarity between the query and the text chunks. It then returns the most relevant chunks, considering part-level relevance and specified limits on the number of parts and chunks.

requirements.txt

This file lists all the Python dependencies required to run the project, such as gradio, openai, and sentence-transformers.

To install the dependencies, run:

pip install -r requirements.txt

.env

This file is used to store environment variables, primarily the OPENAI_API.

How to Run

  1. Set up Environment Variables: Create a .env file in the root directory and add your OpenAI API key:
    # .env
    OPENAI_API=your_openai_api_key
    
  2. Install Dependencies:
    pip install -r requirements.txt
  3. Generate RAG Chunks: If rag_chunks_hierarchical.json is not already present or needs to be updated:
    python utils/create_rag_chunk.py
  4. Run the Application:
    python app.py
    This will start the Gradio application, accessible via a local URL displayed in the terminal.

About

A chatbot that provides constitutional analysis of given scenario.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors