Skip to content

+ Three Analysis Modes#197

Merged
koldunovn merged 9 commits intomainfrom
modeswitch
Mar 3, 2026
Merged

+ Three Analysis Modes#197
koldunovn merged 9 commits intomainfrom
modeswitch

Conversation

@kuivi
Copy link
Collaborator

@kuivi kuivi commented Mar 2, 2026

FIRST ACCEPT PR 196

Add DestinE Climate DT Data Retrieval + Three Analysis Modes

Summary

Two major additions to ClimSight's data analysis pipeline:

  1. DestinE Climate Digital Twin data retrieval — live download of high-resolution climate projection time series (SSP3-7.0, IFS-NEMO model, 2020-2039) for any point location, with 82 available parameters discoverable via RAG semantic search.

  2. Three analysis modes (fast / smart / deep) — configurable presets that control tool availability, budgets, and iteration limits, with per-toggle overrides in the UI.

DestinE retrieval tool

New file: src/climsight/tools/destine_retrieval_tool.py

  • Two-step workflow: search (RAG over 82 parameters via Chroma vector store) -> retrieve (download via earthkit.data + polytope API)
  • Point time series extraction with hourly resolution (24 timesteps/day)
  • Automatic caching — repeated requests return instantly from local Zarr store
  • Authentication via ~/.polytopeapirc token file (obtained by running desp-authentication.py)

Agent integration (data_analysis_agent.py)

  • Both tools registered when use_destine_data: true in config
  • Agent prompt describes the two-step workflow and available date range
  • Downloads default to full 2020-2039 period for maximum coverage
  • Prompt guides parallel downloads of multiple variables in a single response

UI integration (streamlit_interface.py)

  • Toggle to enable/disable DestinE data
  • Token file status indicator (found/not found)

State management (climsight_classes.py, sandbox_utils.py)

  • destine_data_dir and destine_tool_response fields in AgentState
  • Sandbox path for destine_data/ directory

Analysis modes

Setting fast smart deep
Python REPL OFF ON ON
ERA5 download OFF ON ON
DestinE download OFF OFF ON
Extra search OFF ON ON
Tool call limit 10 50 150
max_iterations 8 30 80
  • fast: Pre-downloaded data + predefined plots only. No REPL, no downloads. Fastest response.
  • smart: Adds Python REPL + ERA5 download + extra search. Moderate budgets.
  • deep: Everything enabled including DestinE. High budgets for thorough multi-variable analysis.

Implementation:

  • ANALYSIS_MODES dict + resolve_analysis_config() in data_analysis_agent.py
  • Mode radio selector outside the Streamlit form with on_change callback — toggles visually sync when mode changes
  • Individual toggles can still override mode defaults
  • config.yml: analysis_mode: "smart" setting
  • route_after_prepare() in climsight_engine.py uses resolved config for routing
  • Prompt budgets (hard limit, max per response, reflect limit, ideal calls) all adapt per mode

Other changes

  • intro_agent filtering relaxed — requests mentioning data analysis, tool usage, or statistics are no longer rejected as "technical/code" queries
  • Coordinate order fix — DestinE polytope API expects [lat, lon], not [lon, lat]

Testing

  • test/test_destine_tool.py — dedicated test suite (RAG search, data retrieval, caching, error handling)
  • Tests marked with destine marker, skipped by default in normal pytest runs
  • Run with: pytest -m destine -v
  • test/conftest.py — auto-skip logic for destine-marked tests

Utility scripts

  • src/climsight/scripts/download_destine_simple.py — single-request download example
  • src/climsight/scripts/download_destine_example.py — parallel yearly download with timing
  • src/climsight/scripts/era5_fetch.py — ERA5 data fetch utility

Requirements

  • earthkit-data package (for polytope API access)
  • langchain-chroma, chromadb, langchain-openai (for RAG parameter search)
  • ~/.polytopeapirc token file (run desp-authentication.py to obtain)
  • OPENAI_API_KEY environment variable (for embedding-based parameter search)

- New tool: destine_retrieval_tool.py with two-step workflow:
  1. search_destine_parameters: RAG semantic search over 82 DestinE parameters via Chroma vector store
  2. retrieve_destine_data: download point time series via earthkit.data + polytope
- Authentication via ~/.polytopeapirc token (from desp-authentication.py)
- UI toggle for DestinE data with token file status check
- DestinE test suite (pytest -m destine), skipped by default
- Updated README with DestinE authentication instructions
Move os.chdir(REPO_ROOT) from module level to an autouse fixture that
restores the original cwd after each test, preventing side effects on
other test files that use relative paths.
… add utility scripts

- Fix lat/lon swap in polytope request (was [lon, lat], now [lat, lon])
- Remove "keep date ranges SHORT" limits — default to full 2020-2039 period
- Simplify intro_agent prompt
- Add standalone DestinE download scripts (simple + parallel yearly)
- Add ERA5 fetch script and test utilities
…ring

- Guide data_analysis_agent to download ERA5/DestinE variables in parallel (all in one response)
- Relax intro_agent exclusion rules to allow analysis instructions (download data, plot time series, compute statistics)
…gets

- ANALYSIS_MODES dict defines presets for tool limits, max_iterations, and toggle defaults
- resolve_analysis_config() merges mode defaults with explicit UI overrides
- Mode radio selector outside form with on_change callback syncs toggles immediately
- Prompt budgets (hard limit, max per response, reflect limit) adapt per mode
@kuivi kuivi requested review from dmpantiu and koldunovn March 2, 2026 16:35
@kuivi kuivi mentioned this pull request Mar 2, 2026
@koldunovn
Copy link
Collaborator

@kuivi can you, please, resolve conflicts?

@kuivi
Copy link
Collaborator Author

kuivi commented Mar 3, 2026

@kuivi can you, please, resolve conflicts?
Done

@koldunovn koldunovn merged commit 37616dc into main Mar 3, 2026
4 checks passed
@kuivi kuivi deleted the modeswitch branch March 3, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants