Skip to content

Re agent#199

Merged
dmpantiu merged 10 commits intomainfrom
reAgent
Mar 4, 2026
Merged

Re agent#199
dmpantiu merged 10 commits intomainfrom
reAgent

Conversation

@kuivi
Copy link
Collaborator

@kuivi kuivi commented Mar 3, 2026

Restructure data_analysis_agent for parallel downloads

Problem

The data analysis agent downloaded ERA5 and DestinE variables sequentially — each variable was a separate LLM tool call processed one at a time by AgentExecutor. Downloading 3 ERA5 + 2 DestinE variables meant 5 sequential network calls, wasting minutes of wall time.

Solution

Replace the single AgentExecutor with a 4-step pipeline where downloads happen in parallel:

filter_node → planner_node → download_node → analysis_node
  1. filter_node — LLM condenses context into an analysis brief (same as before)
  2. planner_node — LLM decides which variables to download, returns structured JSON
  3. download_nodeThreadPoolExecutor fetches all variables in parallel (up to 6 threads)
  4. analysis_nodeAgentExecutor with Python REPL, no download tools (all data pre-fetched)

Pipeline steps are plain Python functions sharing a gstate dict, connected with @traceable for LangSmith visibility.

What this changes

  • Downloads don't count as tool calls — full tool budget goes to analysis/plotting
  • ERA5 default range — 1975–2024 (was 2015–2024), planner uses full range unless user asks otherwise
  • DestinE default range — 2020–2039
  • Download progress — per-variable timing shown in Streamlit status area
  • API key security — Arraylake key read from ARRAYLAKE_API_KEY env var only, never stored in config or state
  • Arraylake client cleanup — explicit client.close() to fix async finalizer errors in threads

Files changed

File Change
data_analysis_agent.py Restructured as 4-step parallel pipeline
streamlit_interface.py Set ARRAYLAKE_API_KEY env var instead of storing in config
tools/era5_retrieval_tool.py Init + close arraylake Client properly for threaded use

kuivi added 7 commits March 3, 2026 20:17
Replace single AgentExecutor with 4-step pipeline:
filter_node → planner_node → download_node → analysis_node.
Downloads (ERA5 + DestinE) now run in parallel via ThreadPoolExecutor
instead of sequential LLM tool calls. Analysis agent receives
pre-fetched data, freeing full tool budget for REPL/plotting.
Add @Traceable decorators for LangSmith visibility.
- Show per-variable start/done/fail progress via stream_handler
- Remove arraylake_api_key from config dict; set env var instead
- Add @Traceable(hide_inputs=True) on pipeline nodes for LangSmith
- Planner prompt now defaults to 1975-01-01–2024-12-31 for ERA5
- Explicitly close arraylake Client in finally block to avoid
  async finalizer error when running in ThreadPoolExecutor threads
@kuivi kuivi requested review from dmpantiu and koldunovn March 3, 2026 21:18
kuivi and others added 3 commits March 4, 2026 11:21
…leaks

- Instruct agent to analyze full ERA5 (1975-2024) and DestinE (2020-2039)
  ranges independently, not just the overlap period
- Expand workflow from 2-3 plots to 4-6+ with trend analysis, extremes,
  distributions, seasonal decomposition, and anomaly plots
- Clarify predefined plots already cover basic climatology — do not recreate
- Strengthen output format: all computed numbers must appear in final text
  since raw REPL stdout is not forwarded to combine agent
- Remove file paths from output format to prevent sandbox path leakage
The dest_api_key in planner_node now checks the api_key argument
(passed from Streamlit UI) before falling back to config/env.
Without this, users entering their OpenAI key via the UI would get
DestinE search failures.
@dmpantiu dmpantiu merged commit 9f04ee3 into main Mar 4, 2026
4 checks passed
dmpantiu added a commit that referenced this pull request Mar 4, 2026
…in (PR #199)

- Complete CSS redesign: ultra-minimal scientific aesthetic
- Scrollable tables with sticky headers (max 45vh)
- Proper markdown rendering: bullet points, nested lists, headings
- Download PDF → Download Markdown (.md format)
- run.py auto-loads .env via python-dotenv
- Remove Tailwind utility classes from App.tsx
- Increase font sizes across all UI components
- Merge origin/main into react_ui (parallel downloads, improved prompts)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants