Is your feature request related to a problem? Please describe.
The project has migrated its RAG backend to ChromaDB, and the active backend implementation uses ChromaDB for vector storage and retrieval. However, several parts of the documentation and legacy files still reference Pinecone.
For example:
README.md contains Pinecone setup references.
Some older Flask-based files still contain Pinecone-related code.
New contributors may become confused about which vector database is currently supported.
This inconsistency makes onboarding harder and can lead contributors to spend time configuring services that are no longer required by the active backend.
Describe the solution you'd like
I would like the documentation and repository structure to clearly reflect the currently supported vector database.
Possible improvements:
Review README.md for outdated Pinecone references.
Clearly document that the FastAPI backend uses ChromaDB.
Add notes indicating which files belong to the legacy Flask implementation.
Remove or mark obsolete Pinecone-related setup instructions where appropriate.
Improve contributor onboarding by providing a single source of truth for vector database configuration.
Describe alternatives you've considered
Contributors can manually inspect the codebase to determine which vector database is currently active, but this requires additional effort and may still cause confusion for new contributors.
Additional Context
During repository exploration, the active FastAPI RAG implementation was found to use ChromaDB components such as:
backend/app/rag/vectorstore.py
backend/app/rag/retriever.py
At the same time, Pinecone references still appear in documentation and legacy files, which may create ambiguity about the project's current architecture.
This issue focuses on improving documentation consistency and contributor experience.
GSSoC '26
Is your feature request related to a problem? Please describe.
The project has migrated its RAG backend to ChromaDB, and the active backend implementation uses ChromaDB for vector storage and retrieval. However, several parts of the documentation and legacy files still reference Pinecone.
For example:
README.md contains Pinecone setup references.
Some older Flask-based files still contain Pinecone-related code.
New contributors may become confused about which vector database is currently supported.
This inconsistency makes onboarding harder and can lead contributors to spend time configuring services that are no longer required by the active backend.
Describe the solution you'd like
I would like the documentation and repository structure to clearly reflect the currently supported vector database.
Possible improvements:
Review README.md for outdated Pinecone references.
Clearly document that the FastAPI backend uses ChromaDB.
Add notes indicating which files belong to the legacy Flask implementation.
Remove or mark obsolete Pinecone-related setup instructions where appropriate.
Improve contributor onboarding by providing a single source of truth for vector database configuration.
Describe alternatives you've considered
Contributors can manually inspect the codebase to determine which vector database is currently active, but this requires additional effort and may still cause confusion for new contributors.
Additional Context
During repository exploration, the active FastAPI RAG implementation was found to use ChromaDB components such as:
backend/app/rag/vectorstore.py
backend/app/rag/retriever.py
At the same time, Pinecone references still appear in documentation and legacy files, which may create ambiguity about the project's current architecture.
This issue focuses on improving documentation consistency and contributor experience.
GSSoC '26