This project allows you to upload your dataset, generate insights and recommendations, and interactively query your data using AI.
- Source: Google Play Store
- Raw Data:
googleplaystore.csv - Cleaned Data:
cleaned_googleplaystore.csv
The cleaning and formatting steps are documented in googleStoreData.ipynb - Creation of dataframe using Rapid API - Appstore Scrapper API:
Code implementation is indataframeThruAPI.ipynband the dataset created through it israpidAPIdataframe.csv. - Combined Dataset:
We have a combined datasetcombined_dataframe.csv. The implementation for combining datasets is in googleStoreData.ipynb; please refer to it.
We use a Retrieval-Augmented Generation (RAG) pipeline with:
- HuggingFace embeddings
- Gemini LLM
- FAISS vector store
- The cleaned dataset:
cleaned_googleplaystore.csv
- Insights: app_insights.json
- Recommendations: recommendations.json
The main code for generating these outputs is in insights.ipynb and insights.py. For the best experience, run insights.ipynb.
You can query the insights or dataset using a chatbot, also built with the RAG pipeline.
See chatSystem.ipynb (recommended) or chatsystem.py for implementation.