A multi-agent AI framework for autonomous exoplanet atmospheric analysis. Detect molecular compositions from spectral data (FITS/CSV) and generate comparative spectral fingerprint visualizations (PKL) using a coordinated team of specialized AI agents with LLM-powered reasoning.
For first-time users, follow these steps:
- Install dependencies:
uv sync - Configure API keys: Copy
.env.exampleto.envand add yourGOOGLE_API_KEY - Start MLflow (Terminal 1):
uv run mlflow server --port 5000 --backend-store-uri sqlite:///mlruns.db - Start Spectral Service (Terminal 2):
cd "Spectral Service" && uv run uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload - Start Web UI (Terminal 3):
uv run streamlit run app.py - Upload test data at http://localhost:8501:
- Spectral Analysis: Upload FITS/CSV files from
test_data/spectral/(e.g.,earth_ir.fits) - Graphical Analysis: Upload PKL files from
test_data/graphical/(e.g.,jupiter_combined.pkl)
- Spectral Analysis: Upload FITS/CSV files from
Note: If models are missing, train them first using the instructions in the Training Models section.
Reference: See Planetary Atmospheric Composition.pdf for NASA/research-verified element probabilities used in model training.
This project uses LangGraph to coordinate 6 specialized agents:
- Orchestrator Agent: The brain. Analyzes input files and routes them to the correct analysis pipeline.
- Routes:
spectral(FITS/CSV β ML models) orimage(PKL β visualizations).
- Routes:
- Spectral Model Agent: Specializes in molecular composition prediction from UV/IR spectral data (FITS/CSV).
- Image Model Agent: Generates comparative spectral fingerprint visualizations (barcodes, similarity heatmaps, dendrograms) from planetary PKL data.
- Inference Agent: The synthesizer. Consolidates predictions from spectral models and builds a dynamic Knowledge Base.
- Validator Agent: The quality control. Checks confidence thresholds and flags consistency issues.
- Reporter Agent: The communicator. Generates human-readable scientific reports using Google Gemini LLM.
- Molecular Detection: Identify 28 UV species and 22 IR species using trained MLP models
- Domain Flexibility: Works with UV-only, IR-only, or combined data
- Physics-Based Training: Models trained with 75x augmentation per planet (noise, baseline shift, resolution variation)
- Multi-Modal Validation: Cross-validates UV and IR predictions for overlapping molecules
- Confidence Scoring: Threshold-based filtering with validation flags
- LLM Reporting: Natural language scientific reports via Google Gemini
- Spectral Fingerprints: Combined UV+IR barcode visualizations showing absorption patterns
- Similarity Analysis: Cosine distance heatmap with numerical values for planet comparison
- Hierarchical Clustering: Dendrogram showing spectral groupings and relationships
- Interactive Chat: LLM-powered Q&A about visualization patterns and planetary similarities
Training labels and validation data sourced from:
- NASA Planetary Fact Sheets: Verified atmospheric compositions
- Peer-reviewed spectroscopy papers: JWST, HST, and ground-based observations
- Reference Document: See
Planetary Atmospheric Composition.pdffor complete element probability tables
Supported Molecules: COβ, HβO, CHβ, Oβ, Nβ, Oβ, Ar, SOβ, HβS, NHβ, HCl, and 18+ additional species
Core Stack:
- LangGraph - Multi-agent orchestration and state management
- Google Gemini LLM - Natural language reasoning and report generation
- PyTorch + Scikit-learn - ML model training and inference
- FastAPI - High-performance backend API
- Streamlit - Interactive web interface
- MLflow - Experiment tracking and model registry
Scientific Libraries:
- Astropy - FITS file handling and astronomical data processing
- NumPy/Pandas - Numerical computing and data manipulation
- Matplotlib/Seaborn - Scientific visualization
- SciPy - Signal processing (Savitzky-Golay filtering, hierarchical clustering)
- Python 3.11+
uv(Fast Python package manager)
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh-
Clone the repository:
git clone <repo-url> cd ad-astrAI
-
Install dependencies:
uv sync
-
Configure API keys in
.env(see.env.example):GOOGLE_API_KEY=your_gemini_key MLFLOW_TRACKING_URI=http://127.0.0.1:5000 SPECTRAL_SERVICE_URL=http://localhost:8001
Create a .env file in the project root with the following variables:
| Variable | Description | Required | Default |
|---|---|---|---|
GOOGLE_API_KEY |
Gemini API key for LLM-powered agents | Yes | - |
MLFLOW_TRACKING_URI |
MLflow server URL for experiment tracking | No | http://127.0.0.1:5000 |
SPECTRAL_SERVICE_URL |
Spectral Service backend URL | No | http://localhost:8001 |
How to get a Gemini API key:
- Visit Google AI Studio
- Sign in with your Google account
- Create a new API key
- Copy the key to your
.envfile
The application consists of three main services that need to be started in separate terminal windows:
Start the local MLflow server to track agent execution traces and model training experiments.
uv run mlflow server --port 5000 --backend-store-uri sqlite:///mlruns.dbView Trace UI at: http://127.0.0.1:5000
The Spectral Service provides the machine learning backend for spectral analysis. It runs a FastAPI server that hosts the trained UV and IR spectral models.
Navigate to the Spectral Service directory and start the server:
cd "Spectral Service"
uv run uvicorn app.main:app --host 0.0.0.0 --port 8001 --reloadAPI Documentation available at: http://localhost:8001/docs
Note: The service will automatically load trained models from Spectral Service/models/:
uv_mlp.pt- UV spectral model (28 species)ir_mlp.pt- IR spectral model (22 species)
If models are not found, you will need to train them first (see Training Models section below).
Launch the Streamlit interface for interactive analysis.
In a new terminal, navigate back to the project root:
cd ..
uv run streamlit run app.pyWeb UI will open at: http://localhost:8501
π¬ Spectral Analysis (Recommended for Unknown Exoplanets)
- Upload: Single FITS or CSV file containing spectral data
- Works with: UV-only OR IR-only data
- Output: Molecular composition predictions (COβ, HβO, CHβ, Oβ, etc.)
- Test Files: Use files from
test_data/spectral/(e.g.,earth_ir.fits,mars_uv.csv)
π¨ Graphical Analysis (Known Planets Only)
- Upload: PKL files containing both UV and IR spectral data
- Requires: Complete UV+IR data for comparative analysis
- Output: Spectral fingerprint visualizations (barcodes, similarity heatmaps, hierarchical clustering)
- Test Files: Use files from
test_data/graphical/(e.g.,jupiter_combined.pkl,earth_uv.pkl+earth_ir.pkl)
For Spectral Analysis:
- Mission Report: AI-generated scientific summary
- Agent Trace: Execution path (Orchestrator β Spectral β Inference β Validator β Reporter)
- Consolidated Predictions: Element detection table with confidence scores
- Knowledge Base: Multi-modal element validation across UV/IR domains
- Chat with Data: Ask questions about molecular composition (e.g., "Which molecules were detected with high confidence?")
For Graphical Analysis:
- Spectral Fingerprints: UV+IR barcode visualizations for each planet
- Similarity Matrix: Cosine distance heatmap with values showing spectral similarity
- Hierarchical Clustering: Dendrogram grouping planets by spectral patterns
- Chat with Visualizations: Ask about patterns, planet similarities, and molecular fingerprints
The test_data/ directory contains sample files for both analysis modes:
- FITS format:
earth_ir.fits,mars_uv.fits- Standard astronomical spectral data - CSV format:
jupiter_ir.csv,venus_uv.csv- Tabular wavelength/flux data - Use case: Test molecular composition prediction on unknown exoplanets
- Combined files:
jupiter_combined.pkl,mars_combined.pkl- Single file with UV+IR data - Separate files:
earth_uv.pkl+earth_ir.pkl- Upload both for complete analysis - Use case: Generate comparative spectral fingerprint visualizations
File Format Requirements:
- FITS: Must contain wavelength and flux columns
- CSV: Requires
wavelength(orwave) andflux(orradiance) columns - PKL: Python pickle format with structured numpy arrays or dictionaries
The spectral analysis system uses two separate machine learning models trained on real planetary spectroscopic data with physics-based augmentation.
- Real spectral data files must be present in
Spectral Service/data/real/- UV spectra:
*_uv.pklfiles - IR spectra:
*_ir.pklfiles
- UV spectra:
cd "Spectral Service/training"
uv run python train_uv.pyModel specifications:
- Input: UV spectral data (3-channel preprocessed: normalized, 1st derivative, 2nd derivative)
- Output: 28 molecular species detection probabilities
- Architecture: Multi-Layer Perceptron with Optuna hyperparameter optimization
- Training: Physics-based augmentation (75 variations per planet) with planet-level validation split
cd "Spectral Service/training"
uv run python train_ir.pyModel specifications:
- Input: IR spectral data (3-channel preprocessed)
- Output: 22 molecular species detection probabilities
- Architecture: Multi-Layer Perceptron with Optuna hyperparameter optimization
- Training: Physics-based augmentation (75 variations per planet) with planet-level validation split
Note: Training uses MLflow for experiment tracking. Ensure MLflow server is running to view training metrics, hyperparameters, and model artifacts.
Expected Training Time:
- UV Model: ~30-60 minutes (depends on number of Optuna trials)
- IR Model: ~30-60 minutes
Trained models will be saved to:
Spectral Service/models/uv_mlp.pt+uv_config.jsonSpectral Service/models/ir_mlp.pt+ir_config.json
astraAI/
βββ agent/ # Multi-agent system
β βββ agents/ # Source code for all 6 agents
β β βββ orchestrator.py # Routing agent
β β βββ spectral_model.py # Spectral model inference agent
β β βββ image_model.py # Spectral fingerprint visualization agent
β β βββ inference_agent.py # Prediction consolidation
β β βββ validator_agent.py # Quality control
β β βββ reporter_agent.py # LLM-powered report generation
β βββ graph.py # LangGraph definitions and routing logic
β βββ state.py # Shared state schema
β
βββ Spectral Service/ # Backend ML service
β βββ app/ # FastAPI application
β β βββ main.py # API entry point
β β βββ routers/
β β β βββ analyze.py # Spectral analysis endpoints
β β βββ utils/
β β βββ io.py # FITS/CSV/PKL data loaders
β βββ training/ # Model training pipeline
β β βββ train_uv.py # UV model training
β β βββ train_ir.py # IR model training
β β βββ augmentation.py # Physics-based augmentation
β β βββ expanded_species.py # Species definitions & planet labels
β β βββ mlflow_utils.py # MLflow integration
β βββ data/
β β βββ real/ # Real planetary spectra (training data)
β β βββ earth_uv.pkl # Earth UV spectrum
β β βββ earth_ir.pkl # Earth IR spectrum
β β βββ jupiter_uv.pkl # Jupiter UV spectrum
β β βββ jupiter_ir.pkl # Jupiter IR spectrum
β β βββ ... # Other planets (Mars, Venus, etc.)
β βββ models/ # Trained models (generated after training)
β βββ uv_mlp.pt # UV model weights
β βββ uv_config.json # UV model configuration
β βββ ir_mlp.pt # IR model weights
β βββ ir_config.json # IR model configuration
β
βββ test_data/ # Sample test files for users
β βββ spectral/ # Test files for Spectral Analysis
β β βββ earth_ir.fits # Earth IR spectrum (FITS)
β β βββ mars_uv.csv # Mars UV spectrum (CSV)
β β βββ ... # Other test spectra
β βββ graphical/ # Test files for Graphical Analysis
β βββ jupiter_combined.pkl # Jupiter combined UV+IR
β βββ earth_uv.pkl # Earth UV only (requires pair)
β βββ earth_ir.pkl # Earth IR only (requires pair)
β βββ ... # Other planet PKL files
β
βββ app.py # Streamlit frontend application
βββ experiments/ # Jupyter notebooks for prototyping
βββ pyproject.toml # Project dependencies (uv package manager)
βββ .env # Environment configuration (API keys)
βββ GCP_DEPLOY.md # Google Cloud Platform deployment guide
βββ Planetary Atmospheric Composition.pdf # NASA/research reference data
βββ README.md # This file
This project uses MLflow Tracing for deep observability.
- Spans: Track every agent's execution time and inputs/outputs.
- Metrics: Monitor token usage, latency, and tool calls.
- Artifacts: Store generated reports and data snapshots.
Follow the Quick Start section above.
For production deployment on GCP Virtual Machines, see the comprehensive guide:
π GCP_DEPLOY.md - Complete deployment instructions including:
- VM setup with firewall configuration
- UV package manager installation
- tmux-based service management
- Cost optimization (starts at $10/month with spot instances)
- Troubleshooting and monitoring
Quick Deploy Summary:
- Create GCP VM (n1-standard-2 recommended)
- Open firewall ports: 5000 (MLflow), 8001 (Spectral Service), 8501 (Streamlit)
- SSH into VM and clone repository
- Install UV:
curl -LsSf https://astral.sh/uv/install.sh | sh - Install dependencies:
uv sync - Start services in tmux (see GCP_DEPLOY.md for commands)
- Access via
http://YOUR_VM_IP:8501
Issue: Web UI shows "Failed to connect to Spectral Service"
Solution:
- Ensure Spectral Service is running on port 8001
- Check
.envfile hasSPECTRAL_SERVICE_URL=http://localhost:8001 - Verify models are trained and present in
Spectral Service/models/
Issue: Spectral Service returns "Model not found" error
Solution:
- Train the models using
train_uv.pyandtrain_ir.py - Verify
.ptand.jsonfiles exist inSpectral Service/models/ - Restart the Spectral Service after training
Issue: During training, validation loss shows exactly 0.0000
Solution: This indicates data leakage or overfitting:
- For planet-level split: Need 4+ planets minimum
- For β€3 planets: System uses sample-level split (expected behavior)
- Add more real planetary spectra to
Spectral Service/data/real/
Issue: Validation loss is very high during training
Solution:
- Check if you have enough training data (recommended: 4+ planets)
- Verify spectral data quality in
.pklfiles - Increase
N_AUGMENT_PER_PLANETparameter in training scripts
Issue: Agent fails to route correctly or returns JSON parsing errors
Solution:
- Verify
GOOGLE_API_KEYis valid and active - Check Gemini API quota limits
- Review MLflow traces to see exact LLM responses
Issue: "Address already in use" error when starting services
Solution:
- MLflow (5000): Change port in command:
mlflow server --port 5001 - Spectral Service (8001): Change port in command and update
.env - Streamlit (8501): Streamlit will auto-increment to 8502
Issue: "Missing: planet_ir.pkl" error when uploading single PKL file
Solution:
- Option 1: Upload a combined file (e.g.,
jupiter_combined.pkl) - Option 2: Upload both UV and IR files together (e.g.,
jupiter_uv.pkl+jupiter_ir.pkl) - Option 3: For unknown exoplanets with incomplete data, use Spectral Analysis mode instead
Note: Graphical Analysis is designed for comparative visualization of known planets with complete UV+IR data. For molecular composition prediction on single-domain data, use Spectral Analysis mode.
Contributions are welcome! Please ensure:
- Code follows existing patterns and structure
- All tests pass before submitting
- Documentation is updated for new features
- Commit messages are clear and descriptive
This project is part of an academic research initiative.

