A Streamlit-ready, LangChain-powered agent that enables natural language analytics on structured sales data using large language models (LLMs), machine learning, and statistical tooling.
- 💬 Ask natural language questions about your sales data
- 📈 Forecast future sales using XGBoost
⚠️ Detect anomalies using Z-score method- 🌍 Breakdown regional sales performance
- 💰 Analyze profitability across product lines
- 📊 Decompose time series into trend/seasonality
- 📝 Generate natural language sales reports
llm_data_analytics_agent/
├── llm_powered_data_analytics_agent.py # Main logic (LangChain + Streamlit + Tools)
├── cleaned_sales_data.csv # Cleaned dataset used by the agent
├── memory.json # Memory file for conversation history
├── README.md # Project documentation
Install all dependencies in a virtual environment or Google Colab:
pip install -U openai langchain streamlit pandas numpy matplotlib seaborn plotly
pip install -U langchain-community langchain-openai langchain-experimental
pip install -U kaggle kagglehub pyngrok xgboost statsmodels mlxtend-
Obtain Required Keys:
- OpenAI API Key
- Kaggle API Key (
kaggle.json)
-
Environment Setup:
export OPENAI_API_KEY=your_openai_key
export KAGGLE_USERNAME=your_kaggle_username
export KAGGLE_KEY=your_kaggle_key- Kaggle Dataset Download (Optional):
kaggle datasets download -d kyanyoga/sample-sales-data
unzip sample-sales-data.zipThe system uses the LangChain create_pandas_dataframe_agent with extra tools:
SalesForecastingToolXGBoostAnomalyDetectionToolRegionSalesBreakdownToolProfitabilityAnalysisToolTimeSeriesDecompositionToolReportGeneratorTool
Each tool is wrapped with wrap_tool() to make it compatible with single-string input passed by the LangChain agent.
- Open
llm_powered_data_analytics_agent.pyin Colab - Make sure the required
userdata.get()entries are present - Run cells sequentially
- Use
dc_agent.run("Your question")to interact
You can convert the script into a Streamlit app with a UI:
streamlit run llm_powered_data_analytics_agent.pyUpdate the script to support Streamlit input/output rendering.
- "Forecast sales for the Motorcycles product line."
- "Detect any anomalies in the sales data."
- "What is the sales breakdown for APAC?"
- "Generate a summary report of performance."
- Handling single string input with
wrap_tool() - LLM tool selection using
ZERO_SHOT_REACT_DESCRIPTION - Formatting mixed outputs (tables, plots, markdown)
- Integrating memory (
ConversationBufferMemory) across runs
MIT License © 2025 — Anish Nilesh Rane