Build a knowledge-based AI support agent that uses RAG (Retrieval Augmented Generation) to answer customer queries. The system should demonstrate your ability to implement efficient document retrieval.
-
Document Processing Pipeline
- Implement document ingestion for PDF and markdown files
- Create efficient text chunking strategies
- Generate and store embeddings using a vector store
- Track document sources and maintain metadata
-
Retrieval System
- Implement semantic search using embeddings
- Create relevance scoring for retrieved chunks
- Manage context window size effectively
- Handle cases with multiple relevant documents
-
Response Generation
- Generate coherent responses using retrieved context
- Include source citations in responses
- Handle cases where no relevant information is found
- Ensure response accuracy against source material
- Use a vector database (ChromaDB, Pinecone, Qdrant, or Weaviate)
- Implement efficient embedding generation
- Create proper indexing structure
- Handle document updates
- Use FastAPI or Django REST Framework
- Create endpoints for:
- Document ingestion
- Query processing
- Knowledge base management
The system should handle:
- PDF documents
- Word Documents
-
Source Code
- Documented code with clear README
- Setup instructions
- Configuration examples
- Data preprocessing scripts
-
Technical Documentation
- RAG implementation details
- Email integration approach
- System architecture diagram
- API documentation
- 1 Week for completion
- Submit on the provided GitHub repository
- Process and query documents
- Handle edge cases (no irrelevant info, multiple sources handling)
- Show error handling
-
GitHub repository with:
- Complete source code on github
- Maintain the version history
- Instruction to run the code in the submission section
- Loom video link in the submission section
-
Demo showing:
- Document ingestion process
- Query-response examples
- Error handling scenarios
- Take any test dataset from the internet
- Focus on RAG quality
- Document your chunking strategy
- Explain context management approach
- Include ideas for future improvements
- Provide example queries and responses