Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,20 @@ Thumbs.db
docs/candidates.json
docs/validation_results.json

# Unpublished submission materials (keep local)
rnaas_submission/
METHODOLOGY.md

# New sector data (re-downloadable/re-generable)
data/lightcurves_new_sectors/
candidates_new_sectors.json
results/new_sectors/

# Temporary analysis scripts
python/tls_remaining.py

# TRILEGAL cache files
*_TRILEGAL.csv

# Keep the data directory structure
!data/lightcurves/.gitkeep
36 changes: 35 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,14 @@ python3.11 -m pytest tests/ -v # Python only (32 tests)
```

### Rust test modules:

- `bls::tests` — BLS period recovery, SNR estimation, phase math, median (18 tests)
- `validate::tests` — All 5 false-positive tests, scoring, integration (23 tests)
- `crossmatch::tests` — Catalog indexing, lookup, CSV loading (13 tests)
- `io::tests` — CSV parsing, NaN handling, file discovery (8 tests)

### Python test files:

- `tests/test_validate_candidates.py` — Validation functions, scoring (24 tests)
- `tests/test_analyze_candidates.py` — Plotting, cross-matching, binning (8 tests)

Expand Down Expand Up @@ -231,7 +233,39 @@ lc.to_csv('data/lightcurves/custom_target.csv')
5. **Reproducibility matters** — BLS results are deterministic; running 4x gives same top candidates within ~1% SNR
6. **Cross-reference ExoFOP** before making any claims about a specific target

## Project Roadmap

### Phase 1: Independent Validation Pipeline + RNAAS (DONE)

- Pipeline built: download → BLS (Rust) → validate (Rust) → deep validate (Python)
- 200 unconfirmed TOIs → 197 detections → 17 high-confidence → 3 deep-validated
- TOI 133.01 passes all tests (TLS SDE=28.4, no centroid offset, no Gaia contaminants, no secondary eclipse)
- TOI 210.01 upgraded to Strong (52-sector secondary eclipse = no detection)
- TRICERATOPS run on TOI 133.01: FPP=0.566, TP most probable scenario (35.7%)
- RNAAS LaTeX note ready in `rnaas_submission/` (local, not committed)
- **TODO:** Submit RNAAS note at https://aas.msubmit.net, merge PR #4

### Phase 2: New Planet Discovery (NEXT)

- Current pipeline only re-validates existing TOIs (stars TESS already flagged)
- To discover NEW planets: download full TESS sectors (all stars, not just TOI list)
- Target: stars with no existing TOI designation
- Any BLS detection on a non-TOI star = potentially new → submit as community TOI (cTOI) to ExoFOP
- Focus on less-studied sectors or southern continuous viewing zone

### Phase 3: Follow-up + Collaboration (LATER)

- Run VESPA/TRICERATOPS with real TRILEGAL data on Phase 2 discoveries
- Contact Planet Hunters TESS team (Nora Eisner) for collaboration
- Coordinate ground-based RV follow-up for mass determination
- Write discovery paper if new candidates are found

## ExoFOP vs RNAAS

- **RNAAS** (Research Notes of the AAS): Publish methodology papers. 1500-word limit, gets a DOI, editor-reviewed. For reporting what the pipeline does and its results. https://aas.msubmit.net
- **ExoFOP-TESS**: Submit candidate observations. Upload phase-folded LCs, validation results, plots as supporting observations for existing TOIs, or submit new cTOIs. https://exofop.ipac.caltech.edu/tess/

## Dependencies

- **Rust** 1.75+ with: rayon, serde, serde_json, csv, clap, indicatif, anyhow, ordered-float
- **Python** 3.11+ with: lightkurve, astroquery, pandas, numpy, matplotlib, tqdm, scipy
- **Python** 3.11+ with: lightkurve, astroquery, pandas, numpy, matplotlib, tqdm, scipy, transitleastsquares, triceratops
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,14 @@ Applied to 200 unconfirmed TESS Objects of Interest (TOIs), the pipeline:

After deep validation (centroid analysis, Gaia DR3 contamination check, TLS independent confirmation, multi-sector secondary eclipse search), three candidates remain as physically plausible planet signals:

| Target | Period | Rp (R⊕) | TLS SDE | Centroid | Gaia | Sec. Eclipse | Assessment |
| ------------------------------ | -------- | ------------- | -------- | -------- | ---------------- | ---------------- | ------------- |
| **TOI 133.01** / TIC 219338557 | 8.2065 d | **1.9** | **28.4** | Pass | Clear | Pass | **Strong** |
| **TOI 155.01** / TIC 129637892 | 5.4504 d | **5.3** | **20.1** | Pass | Clear | Marginal† | **Strong** |
| **TOI 210.01** / TIC 141608198 | 8.9884 d | **2.2** | **7.1** | Pass | 1 faint neighbor | Marginal† | **Promising** |

<sup>&dagger; Secondary eclipse depths of 0.002&ndash;0.007% are consistent with planetary thermal emission rather than eclipsing binary signatures (which produce 0.1&ndash;10% depths).</sup>
| Target | Period | Rp (R&#8853;) | TLS SDE | Centroid | Gaia | Sec. Eclipse | Assessment |
| ------------------------------ | -------- | ------------- | -------- | -------- | ---------------- | ---------------- | ---------- |
| **TOI 133.01** / TIC 219338557 | 8.2065 d | **1.9** | **28.4** | Pass | Clear | Pass | **Strong** |
| **TOI 155.01** / TIC 129637892 | 5.4504 d | **5.3** | **20.1** | Pass | Clear | Marginal&dagger; | **Strong** |
| **TOI 210.01** / TIC 141608198 | 8.9884 d | **2.2** | **7.1** | Pass | 1 faint neighbor | Pass&Dagger; | **Strong** |

<sup>&dagger; Secondary eclipse depth of 0.002% is consistent with planetary thermal emission rather than eclipsing binary signatures (which produce 0.1&ndash;10% depths).</sup>
<sup>&Dagger; An initial 10-sector analysis showed a marginal 3.9&sigma; secondary eclipse. A follow-up analysis using all 52 available sectors (1.8M data points) yielded no detection (&minus;2.8&sigma;), confirming the earlier signal was noise.</sup>

**Pipeline validation:** TOI 125.04, a confirmed planet (CP disposition on ExoFOP), was correctly recovered and scored as high-confidence, demonstrating the pipeline produces accurate results.

Expand Down
30 changes: 15 additions & 15 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Exohuntr - We Pointed a Laptop at NASA Data and Found Planet Candidates</title>
<meta name="description" content="197 exoplanet transit candidates discovered using Rust-powered BLS detection on NASA TESS data. Open-source citizen science.">
<meta property="og:title" content="Exohuntr - Hunting Exoplanets with Rust and NASA Data">
<meta property="og:description" content="We analyzed 200+ TESS light curves and found 197 transit candidates. Built with Rust for speed, Python for visualization.">
<title>Exohuntr - Open-Source Transit Detection and Validation for NASA TESS Data</title>
<meta name="description" content="Independent transit detection and validation of 200 TESS Objects of Interest. 3 deep-validated candidates including a 1.9 R_Earth super-Earth.">
<meta property="og:title" content="Exohuntr - Independent Transit Detection Pipeline for NASA TESS">
<meta property="og:description" content="BLS detection in Rust, 5-test validation, deep analysis with TLS + centroid + Gaia DR3. 3 validated candidates from 200 unconfirmed TOIs.">
<meta property="og:type" content="website">
<meta name="twitter:card" content="summary_large_image">
<style>
Expand Down Expand Up @@ -282,16 +282,16 @@
<div class="badge"><span class="dot"></span> Live Results from NASA TESS Data</div>
<h1>Exohuntr</h1>
<p class="tagline">
An open-source pipeline that downloads NASA satellite data, scans for planetary transits
at <strong>50x the speed of Python</strong>, and independently rediscovered signals that
match NASA's own detections &mdash; including <strong>17 validated candidates</strong>
orbiting distant stars.
An open-source transit detection and validation pipeline for NASA TESS data.
BLS detection in Rust, false-positive validation, and deep analysis with TLS, centroid,
and Gaia DR3 checks &mdash; producing <strong>3 deep-validated planet candidates</strong>
from 200 unconfirmed TESS Objects of Interest.
</p>
<div class="stats-bar">
<div class="stat"><div class="number">197</div><div class="label">Signals Detected</div></div>
<div class="stat"><div class="number orange">17</div><div class="label">Validated Candidates</div></div>
<div class="stat"><div class="number purple">5</div><div class="label">Tests Per Candidate</div></div>
<div class="stat"><div class="number green">70%+</div><div class="label">Period Match with NASA</div></div>
<div class="stat"><div class="number orange">3</div><div class="label">Deep-Validated Candidates</div></div>
<div class="stat"><div class="number purple">5+4</div><div class="label">Validation + Deep Tests</div></div>
<div class="stat"><div class="number green">200</div><div class="label">TOIs Analyzed</div></div>
</div>
<div class="scroll-hint">
<svg viewBox="0 0 24 24"><path d="M12 5v14M5 12l7 7 7-7"/></svg>
Expand All @@ -301,7 +301,7 @@ <h1>Exohuntr</h1>
<section id="how">
<div class="container">
<div class="section-header">
<h2>How We Hunt <span class="accent">Planets</span></h2>
<h2>How It <span class="accent">Works</span></h2>
<p>When a planet crosses in front of its star, the star's brightness dips. We detect these dips algorithmically in NASA data.</p>
</div>
<div class="transit-demo">
Expand All @@ -321,7 +321,7 @@ <h3>Download <span class="tech python">Python</span></h3>
<div class="pipeline-step">
<div class="step-num">02</div>
<h3>Detect <span class="tech rust">Rust</span></h3>
<p>Parallel BLS (Box-fitting Least Squares) scans 15,000 trial periods per star using Rayon. <strong>10-50x faster</strong> than Python.</p>
<p>Parallel BLS (Box-fitting Least Squares) scans 15,000 trial periods per star using Rayon for multi-core parallel processing.</p>
</div>
<div class="pipeline-step">
<div class="step-num">03</div>
Expand Down Expand Up @@ -422,7 +422,7 @@ <h4>Data Source</h4>
<div class="method-card">
<h4>Performance</h4>
<p>Rust + Rayon parallel processing. The entire 200-star dataset was scanned in under 30 seconds on 8 CPU cores.</p>
<div class="mono">~10-50x faster than pure Python</div>
<div class="mono">Parallel via Rayon (Rust)</div>
</div>
</div>
</div>
Expand Down Expand Up @@ -467,7 +467,7 @@ <h2>Run Your Own Hunt</h2>
<footer>
<div class="container">
<p>
Built with Rust, Python, and Claude Code. Data from NASA TESS via MAST.<br>
Built with Rust and Python. Data from NASA TESS via MAST.<br>
<a href="https://github.com/humancto/exohuntr">Exohuntr</a> &mdash; Open-source exoplanet hunting.
</p>
</div>
Expand Down
136 changes: 136 additions & 0 deletions python/download_new_sectors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
#!/usr/bin/env python3.11
"""Download light curves from less-studied TESS sectors (80-96) for new detections.

Strategy: Query ExoFOP TOI list, filter for TOIs observed in sectors 80-96,
and download those that are still unconfirmed (PC disposition).
"""
if __name__ == '__main__':
import warnings
warnings.filterwarnings('ignore')
import os
import sys
import numpy as np
import pandas as pd
from pathlib import Path
from tqdm import tqdm

OUTPUT_DIR = Path('data/lightcurves_new_sectors')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

SECTOR_MIN = 80
SECTOR_MAX = 96
LIMIT = 200

print("=" * 60)
print(f"Downloading TOIs from TESS Sectors {SECTOR_MIN}-{SECTOR_MAX}")
print("=" * 60)

# Step 1: Get TOI catalog from ExoFOP
print("\n[1/3] Fetching TOI catalog from ExoFOP...", flush=True)
try:
toi_url = "https://exofop.ipac.caltech.edu/tess/download_toi.php?sort=toi&output=csv"
tois = pd.read_csv(toi_url, comment="#")
print(f" Total TOIs in catalog: {len(tois)}", flush=True)
except Exception as e:
print(f" ERROR: Could not fetch TOI list: {e}", flush=True)
sys.exit(1)

# Step 2: Filter for unconfirmed candidates in target sectors
print(f"\n[2/3] Filtering for unconfirmed TOIs in sectors {SECTOR_MIN}-{SECTOR_MAX}...", flush=True)

# Keep only Planet Candidates
candidates = tois[tois["TFOPWG Disposition"].isin(["PC", ""])]
print(f" Unconfirmed candidates: {len(candidates)}", flush=True)

# Filter by sector — the "Sectors" column contains comma-separated sector numbers
def in_target_sectors(sectors_str):
try:
sectors = [int(s.strip()) for s in str(sectors_str).split(',')]
return any(SECTOR_MIN <= s <= SECTOR_MAX for s in sectors)
except (ValueError, AttributeError):
return False

if 'Sectors' in candidates.columns:
target_candidates = candidates[candidates['Sectors'].apply(in_target_sectors)]
elif 'Sector' in candidates.columns:
target_candidates = candidates[candidates['Sector'].apply(in_target_sectors)]
else:
# Try to find the right column
print(f" Available columns: {list(candidates.columns)}", flush=True)
print(" WARNING: No 'Sectors' column found. Downloading general unconfirmed TOIs.", flush=True)
# Fall back: take TOIs with high TOI numbers (newer, likely from later sectors)
candidates_sorted = candidates.sort_values('TOI', ascending=False)
target_candidates = candidates_sorted.head(LIMIT)

target_candidates = target_candidates.head(LIMIT)
print(f" TOIs in sectors {SECTOR_MIN}-{SECTOR_MAX}: {len(target_candidates)}", flush=True)

if len(target_candidates) == 0:
print(" No TOIs found in target sectors. Trying newest unconfirmed TOIs instead...", flush=True)
candidates_sorted = candidates.sort_values('TOI', ascending=False)
target_candidates = candidates_sorted.head(LIMIT)
print(f" Using {len(target_candidates)} newest unconfirmed TOIs", flush=True)

# Step 3: Download light curves
print(f"\n[3/3] Downloading {len(target_candidates)} light curves...", flush=True)
import lightkurve as lk

downloaded = 0
failed = 0
already_have = 0

for _, row in tqdm(target_candidates.iterrows(), total=len(target_candidates), desc=" Downloading"):
try:
tic_id = f"TIC {int(row['TIC ID'])}"
toi_num = str(row.get('TOI', 'unknown'))

# Check if we already have this
filename = f"TOI_{toi_num}_{tic_id.replace(' ', '_')}.csv"
filepath = OUTPUT_DIR / filename

# Also check the original data dir
orig_path = Path('data/lightcurves') / filename
if filepath.exists() or orig_path.exists():
already_have += 1
continue

search = lk.search_lightcurve(tic_id, mission='TESS', author='SPOC')
if len(search) == 0:
failed += 1
continue

lc = search[0].download(quality_bitmask='hardest')
if lc is None:
failed += 1
continue

lc = lc.remove_nans().remove_outliers(sigma=5).normalize()
if len(lc.time.value) < 100:
failed += 1
continue

flux_err = lc.flux_err.value if lc.flux_err is not None else np.full(len(lc.time.value), 0.001)
df = pd.DataFrame({
'time': lc.time.value,
'flux': lc.flux.value,
'flux_err': flux_err,
})
df.to_csv(filepath, index=False)
downloaded += 1

except Exception as e:
failed += 1
continue

print(f"\n Downloaded: {downloaded}")
print(f" Already had: {already_have}")
print(f" Failed: {failed}")
print(f" Output: {OUTPUT_DIR}/")

if downloaded > 0:
print(f"\n Next: Run BLS on these:")
print(f" cargo build --release")
print(f" ./target/release/hunt search -i {OUTPUT_DIR} -o candidates_new_sectors.json --snr-threshold 6.0")
print(f" ./target/release/hunt validate -i candidates_new_sectors.json -l {OUTPUT_DIR} -o results/")

print("\nDone!", flush=True)
Loading
Loading