Skip to content

AliSajid55/DevLens-Chat-with-Your-Codebase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” DevLens β€” Chat with Your Codebase

AI-powered developer assistant that lets you ask natural language questions about any GitHub repository and get answers with exact file paths and line numbers.

Python FastAPI LangChain Pinecone Gemini License


πŸ“Œ What is DevLens?

DevLens is a backend API that enables developers to chat with any GitHub codebase using natural language. Instead of manually searching through hundreds of files, you paste a repository URL, wait for indexing, and start asking questions.

Example:

❓ "Where is the database connection handled in this project?"

βœ… "The database connection is handled in app/database.py at lines 15–28. It uses SQLAlchemy's create_engine() with a connection pool..."

The system retrieves the exact file, line numbers, and a code snippet β€” grounded in the real source code, not hallucinated.


✨ Features

  • πŸ”— Load any public GitHub repo β€” just paste the URL
  • 🧠 RAG pipeline β€” answers are grounded in actual code, not guessed
  • ⚑ SSE streaming β€” Gemini responses stream token-by-token
  • πŸ“ Source attribution β€” every answer includes file path + line numbers
  • πŸ—‚οΈ Namespace isolation β€” multiple repos indexed independently in Pinecone
  • 🩺 Health check endpoint β€” live probes for Pinecone and Gemini
  • 🐳 Dockerized β€” runs with a single docker compose up command

πŸ—οΈ Architecture

User Question
     β”‚
     β–Ό
POST /api/v1/ask-question
     β”‚
     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  RAG Pipeline                   β”‚
β”‚                                                 β”‚
β”‚  Question β†’ Embed (text-embedding-004)          β”‚
β”‚           β†’ Search Pinecone (cosine similarity) β”‚
β”‚           β†’ Top-K code chunks retrieved         β”‚
β”‚           β†’ Build grounded prompt               β”‚
β”‚           β†’ Gemini 1.5 Pro generates answer     β”‚
β”‚           β†’ SSE stream β†’ client                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

POST /api/v1/load-repo
     β”‚
     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               Indexing Pipeline                 β”‚
β”‚                                                 β”‚
β”‚  GitHub URL β†’ GitPython shallow clone           β”‚
β”‚             β†’ LangChain TextLoader              β”‚
β”‚             β†’ RecursiveCharacterTextSplitter    β”‚
β”‚             β†’ Google text-embedding-004         β”‚
β”‚             β†’ Pinecone upsert (namespaced)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Tech Stack

Layer Technology Purpose
Web Framework FastAPI 0.111 Async API with auto OpenAPI docs
LLM Google Gemini 1.5 Pro Answer generation
Embeddings text-embedding-004 768-dim semantic code vectors
Vector DB Pinecone (Serverless) Fast cosine similarity search
Orchestration LangChain 0.2 Chunking, loading, retrieval
Repo Loading GitPython 3.1 Shallow git clone at runtime
Streaming SSE (sse-starlette) Token-by-token streaming
Logging structlog 24.2 Structured JSON logs
Containerisation Docker + Compose One-command deployment

πŸ“ Project Structure

devlens/
β”‚
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ .env                      # Environment variables (never commit)
β”‚   β”œβ”€β”€ Dockerfile                # Multi-stage production image
β”‚   β”œβ”€β”€ requirements.txt          # Pinned Python dependencies
β”‚   └── app/
β”‚       β”œβ”€β”€ main.py               # FastAPI app factory + startup
β”‚       β”œβ”€β”€ schema.py             # All Pydantic request/response models
β”‚       β”œβ”€β”€ helpers.py            # Pure utility functions
β”‚       β”œβ”€β”€ api/
β”‚       β”‚   β”œβ”€β”€ health.py         # GET  /health
β”‚       β”‚   └── routes.py         # POST /load-repo, /ask-question
β”‚       β”œβ”€β”€ core/
β”‚       β”‚   β”œβ”€β”€ config.py         # Pydantic-settings env loader
β”‚       β”‚   └── rag_pipeline.py   # Central RAG orchestrator
β”‚       └── services/
β”‚           β”œβ”€β”€ repo_loader.py    # Clone + LangChain document loading
β”‚           β”œβ”€β”€ embeddings.py     # Pinecone setup + Google embeddings
β”‚           └── retriever.py      # Semantic search over namespace
β”‚
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Makefile
└── README.md

πŸš€ Getting Started

Prerequisites

  • Python 3.11 (3.14 is not yet supported by all dependencies)
  • Git installed on your system
  • Google AI Studio API key (free tier works)
  • Pinecone account + API key (free starter plan works)

1. Clone the repository

git clone https://github.com/your-username/devlens.git
cd devlens/backend

2. Create a virtual environment with Python 3.11

# Windows
py -3.11 -m venv .venv
.venv\Scripts\activate

# macOS / Linux
python3.11 -m venv .venv
source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

cp .env .env.local    # or just edit .env directly

Open .env and fill in your keys:

GOOGLE_API_KEY=your_google_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_INDEX_NAME=devlens-codebase
PINECONE_CLOUD=aws
PINECONE_REGION=us-east-1

5. Run the server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Open http://localhost:8000/docs to access the interactive Swagger UI.


🐳 Docker Setup (Recommended)

# Build and start
docker compose up --build

# Backend β†’ http://localhost:8000
# Swagger β†’ http://localhost:8000/docs

# Tail logs
docker compose logs -f backend

# Stop
docker compose down

πŸ“‘ API Reference

GET /health

Check live connectivity status of Pinecone and Gemini.

{
  "status": "ok",
  "environment": "development",
  "version": "1.0.0",
  "services": {
    "pinecone": "ok",
    "gemini": "ok (47 models visible)"
  }
}

POST /api/v1/load-repo

Clone, chunk, embed, and index a GitHub repository into Pinecone.

Request:

{
  "repo_url": "https://github.com/tiangolo/fastapi",
  "branch": "master"
}

Response:

{
  "status": "ready",
  "namespace": "tiangolo_fastapi-a3f2c8d91b4e",
  "repo_name": "tiangolo_fastapi",
  "total_chunks": 842,
  "message": "Repository 'tiangolo_fastapi' indexed successfully. 842 code chunks are ready for querying."
}

⚠️ Save the namespace value β€” you need it for /ask-question.


POST /api/v1/ask-question (SSE Streaming)

Ask a natural language question about the indexed codebase.

Request:

{
  "question": "Where is the database connection handled?",
  "namespace": "tiangolo_fastapi-a3f2c8d91b4e"
}

SSE Event Stream:

event: token
data: The database connection is handled in

event: token
data:  app/db/session.py at lines 12–28...

event: sources
data: [{"file_path":"app/db/session.py","start_line":12,"end_line":28,"snippet":"..."}]

event: done
data: [DONE]

GET /api/v1/ask-question (JSON fallback)

Same as above but returns a single JSON response. Useful for testing.

GET /api/v1/ask-question?question=Where is auth handled?&namespace=your_namespace

βš™οΈ Configuration Reference

Variable Default Description
GOOGLE_API_KEY β€” Google AI Studio key (required)
PINECONE_API_KEY β€” Pinecone API key (required)
PINECONE_INDEX_NAME devlens-codebase Auto-created on first use
PINECONE_CLOUD aws Cloud provider for serverless index
PINECONE_REGION us-east-1 Must match your Pinecone project
APP_ENV development development or production
CHUNK_SIZE 1000 Max characters per code chunk
CHUNK_OVERLAP 150 Overlap between consecutive chunks
TOP_K_RESULTS 6 Chunks retrieved per question
EMBEDDING_DIMENSION 768 text-embedding-004 output size
REPO_CLONE_DIR /tmp/devlens_repos Temp dir for clones

πŸ§ͺ Quick Test Flow

Once the server is running, test the full pipeline in order:

# 1. Health check
curl http://localhost:8000/health

# 2. Index a repo (takes 30–90s depending on repo size)
curl -X POST http://localhost:8000/api/v1/load-repo \
  -H "Content-Type: application/json" \
  -d '{"repo_url": "https://github.com/tiangolo/fastapi", "branch": "master"}'

# 3. Ask a question (copy namespace from step 2 response)
curl "http://localhost:8000/api/v1/ask-question?question=Where+is+routing+handled&namespace=YOUR_NAMESPACE"

πŸ—ΊοΈ Roadmap

  • GitHub repo loading with GitPython
  • Code chunking with LangChain RecursiveCharacterTextSplitter
  • Google text-embedding-004 vector embeddings
  • Pinecone serverless vector store with namespace isolation
  • Gemini 1.5 Pro grounded answer generation
  • SSE streaming responses
  • Source attribution (file path + line numbers)
  • Health check with live service probes
  • Docker + docker-compose setup
  • Next.js frontend with chat UI
  • File explorer sidebar with code preview pane
  • API key authentication
  • Namespace management (list / delete indexed repos)
  • Rate limiting

Author

Ali Sajid

AI Engineer | Deep Learning | Computer Vision | GEN AI


🀝 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m 'Add some feature'
  4. Push to the branch: git push origin feature/my-feature
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


Built with ❀️ using FastAPI · LangChain · Google Gemini · Pinecone

About

Chat with any GitHub codebase in natural language. Built with FastAPI, LangChain, Google Gemini 1.5 Pro & Vector Database(Pinecone). Paste a repo URL, ask questions, get answers with exact file paths and line numbers. RAG pipeline with SSE streaming.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors