Conversation
Replace single AgentExecutor with 4-step pipeline: filter_node → planner_node → download_node → analysis_node. Downloads (ERA5 + DestinE) now run in parallel via ThreadPoolExecutor instead of sequential LLM tool calls. Analysis agent receives pre-fetched data, freeing full tool budget for REPL/plotting. Add @Traceable decorators for LangSmith visibility.
- Show per-variable start/done/fail progress via stream_handler - Remove arraylake_api_key from config dict; set env var instead - Add @Traceable(hide_inputs=True) on pipeline nodes for LangSmith
- Planner prompt now defaults to 1975-01-01–2024-12-31 for ERA5 - Explicitly close arraylake Client in finally block to avoid async finalizer error when running in ThreadPoolExecutor threads
…leaks - Instruct agent to analyze full ERA5 (1975-2024) and DestinE (2020-2039) ranges independently, not just the overlap period - Expand workflow from 2-3 plots to 4-6+ with trend analysis, extremes, distributions, seasonal decomposition, and anomaly plots - Clarify predefined plots already cover basic climatology — do not recreate - Strengthen output format: all computed numbers must appear in final text since raw REPL stdout is not forwarded to combine agent - Remove file paths from output format to prevent sandbox path leakage
The dest_api_key in planner_node now checks the api_key argument (passed from Streamlit UI) before falling back to config/env. Without this, users entering their OpenAI key via the UI would get DestinE search failures.
dmpantiu
approved these changes
Mar 4, 2026
dmpantiu
added a commit
that referenced
this pull request
Mar 4, 2026
…in (PR #199) - Complete CSS redesign: ultra-minimal scientific aesthetic - Scrollable tables with sticky headers (max 45vh) - Proper markdown rendering: bullet points, nested lists, headings - Download PDF → Download Markdown (.md format) - run.py auto-loads .env via python-dotenv - Remove Tailwind utility classes from App.tsx - Increase font sizes across all UI components - Merge origin/main into react_ui (parallel downloads, improved prompts)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restructure data_analysis_agent for parallel downloads
Problem
The data analysis agent downloaded ERA5 and DestinE variables sequentially — each variable was a separate LLM tool call processed one at a time by
AgentExecutor. Downloading 3 ERA5 + 2 DestinE variables meant 5 sequential network calls, wasting minutes of wall time.Solution
Replace the single
AgentExecutorwith a 4-step pipeline where downloads happen in parallel:ThreadPoolExecutorfetches all variables in parallel (up to 6 threads)AgentExecutorwith Python REPL, no download tools (all data pre-fetched)Pipeline steps are plain Python functions sharing a
gstatedict, connected with@traceablefor LangSmith visibility.What this changes
ARRAYLAKE_API_KEYenv var only, never stored in config or stateclient.close()to fix async finalizer errors in threadsFiles changed
data_analysis_agent.pystreamlit_interface.pyARRAYLAKE_API_KEYenv var instead of storing in configtools/era5_retrieval_tool.pyClientproperly for threaded use