Skip to content

HowardLiYH/GAUSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

110 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 GAUSE: Emergent Specialization in Learner Populations

Generalist-averse Affinity Update for Specialist Emergence

Emergent Specialization - 6 Domain Learners



Python 3.9+ License: MIT Data: 100% Real

Niche Partitioning Without Explicit Coordination

Paper β€’ Installation β€’ Quick Start β€’ Experiments β€’ Results β€’ Citation


πŸ“– Abstract

GAUSE (Generalist-averse Affinity Update for Specialist Emergence)† is a population-based learning system where learners spontaneously specialize to different environmental regimes without explicit supervision. Drawing from ecological niche theory, it couples competitive exclusion with niche affinity to create evolutionary pressure for strategy-space partitioning.

† The name also honors G. F. Gause, whose competitive exclusion principle (1934) the mechanism operationalizes.

Core Thesis (v4.1): Under bounded per-agent capacity and non-stationarity, retention of dormant-regime knowledge tracks a single property: whether capacity assignment is independent of current task reward. Reward-chasing allocations (a capacity-bounded monolith, a learned Mixture-of-Experts router) forget dormant regimes and relearn them on reactivation; reward-independent assignments (fixed random niches, an EOI/CDS-style intrinsic-diversity objective, converged competitive exclusion) retain them. Competition is the most parsimonious route to reward-independent specialization: no gate to train, no diversity objective to tune, no freezing schedule to pick β€” the assignment is the equilibrium the dynamics converge to. Supporting claim (unchanged): competition alone, without explicit diversity incentives, suffices to induce emergent specialization (mean SI = 0.65 at Ξ» = 0).

Note on Terminology: We use "learner" to denote individual units in the population, each implementing Thompson Sampling over prediction methods. This distinguishes our approach from LLM-based "agents" which are autoregressive language models.

Validated on 6 domains (100% REAL DATA):

  • πŸ“ˆ Crypto - Bybit Exchange (44,000+ bars) βœ… Real
  • πŸ“Š Commodities - FRED US Government (5,630 daily prices) βœ… Real
  • 🌀️ Weather - Open-Meteo (9,105 observations) βœ… Real
  • β˜€οΈ Solar - Open-Meteo Satellite (116,834 hourly) βœ… Real
  • πŸš• Traffic - NYC TLC Yellow Taxi (2,879 hourly trips) βœ… Real
  • 🌬️ Air Quality - Open-Meteo PM2.5 (2,880 hourly readings) βœ… Real

🎯 Key Results (v4.3 β€” real-data validated)

Changelog (v4.2.0, June 2026): Addressed three reviewer concerns and added the oracle fixed-assignment skyline (GAUSE matches it without the assignment: within 0.030 at K=1, indistinguishable for Kβ‰₯3 and at every K under the soft model), a proved reallocation-rate proposition (replacing the hand-wave necessity claim), and a class-incremental (label-free) variant β€” the retention result survives dropping the regime label (SI 0.75, post-react 0.38 vs. 1.07 for a label-free monolith). New tooling: experiments/exp_oracle_fixed.py, exp_latent_regime.py, exp_real_data_gas.py (UCI Gas Sensor Array Drift, real data, --data PATH), exp_robustness_sweep.py. Explainer tightened (soft-model, population-sizing, and per-domain SI moved to an appendix). Full detail in CHANGELOG.md.

Changelog (v4.3.0, June 2026) β€” REAL DATA, the honest split. Ran the full retention + label-free + CL pipeline on the real UCI Gas Sensor Array Drift stream (13,910 samples, 128-d, 6 classes, 10 batches / 36 months). The result moves the contribution off the performance axis and onto the mechanism/lens axis, and we report it straight: (1) a full-capacity online linear classifier does not catastrophically forget on real gas features (naive/replay 0.68 post-react, 0.93 overall; EWC worse at 0.44 because its anchor fights drift) β€” so the forgetting dissociation is relocated to representation-sharing neural experts (permuted-digits: router 0.576 vs GAUSE 0.218, +62%), not arbitrary streams; (2) label-free recovery collapses under real overlap (GAUSE-LF post-react 0.00, SI 0.53, coverage 0.83; the label-free monolith is also 0.00, so the dissociation vanishes) β€” the synthetic 1:1 method↔regime signature was load-bearing, bounding the class-incremental claim to separable-signal regimes; (3) framing confirmed: on real data this is a mechanism/parsimony/lens paper, not a performance paper. Full numbers, figure (paper/figures/fig_real_data_gas.pdf), and per-concern verdict in CHANGELOG.md (v4.3.0).

Note (v4.0.0): As of June 2026, the niche affinity update has been upgraded from the V3 additive heuristic to the canonical exponentiated-gradient (Hedge / multiplicative-weights) update. The numbers below are V4 (rescaled-Ξ·). For the V3-era numbers and the rationale for the transition, see CHANGELOG.md and docs/V4_FINAL_REPORT.md. All qualitative findings are preserved; quantitative magnitudes are substantially strengthened.

⭐ Headline (v4.1): Coverage and Retention Under Bounded Capacity

Five capacity-allocation mechanisms compared at matched per-agent capacity K (each agent can master only K of R regimes), on a non-stationary stream where regimes go dormant and reactivate:

Arm (K=1, hard LRU model) Assignment signal Post-reactivation error Retains dormant regimes?
Monolith (capacity K) current active regime 1.015 ❌
MoE learned router current task reward 0.928 ❌
Random fixed niches none (frozen) 0.603 ⚠️ (coverage gaps)
EOI/CDS-style diversity intrinsic identity reward 0.322 βœ…
GAUSE (ours) converged identity 0.283 βœ…
  • Retention tracks reward-independence five-for-five. The specialized population beats the capacity-matched monolith by +33% overall / +71% post-reactivation (p < 10⁻³⁢ at K=3); the reward-driven router fails like the monolith (p ~ 10⁻³⁡ vs. ours at K=1).
  • Why: a dormant regime emits no reward gradient, so a reward-driven gate gets no signal to reserve capacity for it (idealized Observation in the paper). A converged specialist simply idles through dormancy and retains its niche structurally.
  • Robust to the memory model: the same dissociation holds when hard LRU eviction is replaced by soft interference decay (--soft; monolith forgets at 0.85–0.92, ours retains at β‰ˆ0.25, p ~ 10⁻⁴⁰).
  • Spatial coverage is a commodity β€” competition, explicit diversity, and the router all achieve it (competition wins +57% synthetic / +8.7% traffic at K=1 over random diversity but only matches purpose-built baselines for Kβ‰₯2). Competition's value there is parsimony, not dominance.
  • Corroborates MoE continual-learning theory (ICLR'25, arXiv:2406.16437): their proof that the gate must be frozen for CL convergence is, in our terms, making the assignment reward-independent β€” competition reaches that state emergently.

Reproduce: python experiments/exp_capacity_division.py and python experiments/exp_nonstationary_capacity.py --soft.

πŸ§ͺ Closing the Design Space (v4.1): Function Approximation, a Hybrid Router, Drift, Sizing, and the Label-Free Variant

Six further experiments stress-test the reward-independence story across architectures, fixes, failure modes, and the harder dropped-label setting:

Experiment Script Headline finding
Function-approx CL (gradient-trained MLP experts, permuted-digits, R=5) exp_function_approx_cl.py Router forgetting is architecture-agnostic: at E=R, GAUSE post-reactivation test error 0.218 vs MoE router 0.576 (+62% lower, p ~ 10⁻⁹); the router does not improve with more experts
Hybrid router + reservation term exp_hybrid_router.py Reservation recovers most retention at Kβ‰₯2 (+48% at K=3, p ~ 10⁻¹¹) but cannot help at K=1 (no spare slot); GAUSE retains even at K=1 β€” reward-independence of protection is the operative property
Intra-regime concept drift exp_intra_regime_drift.py Stale specialists quantified: standard GAUSE overall error 0.90 (worse than a relearning K=3 monolith, 0.35); a lightweight staleness trigger recovers most of it (0.43, ~53% lower)
Off-diagonal population sizing exp_population_sizing.py Scarce agents (N<R) under-cover, governed by N not NΒ·K; surplus agents (N≫R) idle harmlessly (redundant = Nβˆ’R). Rule: provision N ≳ R
Oracle fixed-assignment skyline exp_oracle_fixed.py GAUSE recovers the hand-assigned skyline without being given it: within 0.030 at K=1 (0.283 vs 0.253 floor), statistically indistinguishable by K=3
Class-incremental (label-free) GAUSE exp_latent_regime.py The central retention result survives dropping the regime label: SI 0.75, coverage 0.86, post-reactivation 0.38 vs 1.07 for a label-free forgetting monolith (+65%), reactivation detected from input similarity alone

Reproduce: python experiments/exp_function_approx_cl.py (needs torch), then exp_hybrid_router.py, exp_intra_regime_drift.py, exp_population_sizing.py, exp_oracle_fixed.py, exp_latent_regime.py.

Cross-Domain Experimental Results (Unified Pipeline β€” 30 Trials Each, V4)

All experiments run with identical configuration across all 6 domains:

  • 30 independent trials per experiment
  • 500 iterations per trial
  • 8 learners per population
  • Same random seeds for reproducibility
  • V4 (EG) affinity update with rate rescaling Ξ·_V4(R) = Ξ·_V3Β·(RΒ²βˆ’R+1)/(Rβˆ’1)
Domain Data Source Records Regimes SI (Niche) SI (Homo) Cohen's d p-value
Crypto Bybit Exchange 8,766 4 0.991Β±0.02 0.002 58.29 <0.001***
Commodities FRED (US Gov) 5,630 4 0.990Β±0.02 0.002 70.48 <0.001***
Weather Open-Meteo 9,105 4 0.991Β±0.02 0.002 60.91 <0.001***
Solar Open-Meteo 116,834 4 0.995Β±0.01 0.002 100.93 <0.001***
Traffic NYC TLC 2,879 6 0.995Β±0.02 0.003 95.56 <0.001***
Air Quality Open-Meteo 2,880 4 0.987Β±0.03 0.002 54.35 <0.001***
AVERAGE β€” 145,294 β€” 0.992 0.002 73.4 βœ… All

Key Findings (V4):

  • All 6 domains converge to SI β‰₯ 0.98 with statistically significant specialization (p < 0.001)
  • Mean SI = 0.992; mean Cohen's d = 73.4 (every domain β‰₯ 54)
  • Std across seeds halved relative to V3 (no clamp-driven drag)
  • Traffic (R = 6) is no longer the lowest-SI domain β€” under V4 it reaches 0.995 on par with R = 4 domains

Lambda Ablation Study (All 6 Domains, 30 Trials Each, V4)

Ξ» Crypto Commodities Weather Solar Traffic Air Quality Avg
0.0 0.613 0.588 0.614 0.499 0.739 0.844 0.650
0.1 0.887 0.862 0.915 0.841 0.903 0.980 0.898
0.2 0.979 0.983 0.982 0.976 0.984 0.999 0.984
0.3 0.991 0.990 0.991 0.995 0.995 0.987 0.992
0.4 0.995 0.983 0.988 0.992 0.996 0.964 0.986
0.5 0.956 0.952 0.968 0.970 0.981 0.879 0.951

Key Finding (V4): Even with Ξ» = 0 (no niche bonus), competition alone induces mean SI = 0.650 across all domains, with every domain exceeding SI = 0.49 β€” confirming our core thesis that competition is sufficient for emergent specialization. Peak performance occurs at Ξ» ∈ [0.2, 0.4], with Ξ» = 0.5 showing mild over-specialization in Air Quality.

Task Performance Metrics (Illustrative)

⚠️ The illustrative metrics below come from experiments/exp_task_performance.py, which is a synthetic Monte-Carlo with hardcoded per-domain base rates and does not exercise the GAUSE algorithm. They are retained here only as a rough visualization. The honest task-level performance numbers are in the Method Specialization table below (which does run the real algorithm).

Method Specialization Experiment (V4)

Learners choose among 5 prediction methods per domain and specialize through competition. The per-regime method-preference update uses the V4 EG (multiplicative + renormalize) rule:

Domain Methods MSI Coverage Niche Perf Homo Perf Ξ”% p-value
Crypto 5 0.388 79% 0.883 0.626 +41.2% <0.001***
Commodities 5 0.393 75% 0.886 0.648 +36.7% <0.001***
Weather 5 0.426 99% 0.863 0.675 +27.9% <0.001***
Solar 5 0.375 93% 0.919 0.786 +16.9% <0.001***
Traffic 5 0.331 99% 0.915 0.740 +23.6% <0.001***
Air Quality 5 0.384 69% 0.912 0.834 +9.3% <0.001***
Average 5 0.383 86% β€” β€” +25.9% βœ… All

Key Findings (V4):

  1. Emergent Method Specialization: Learners develop preferences for specific prediction methods (MSI = 0.383)
  2. Division of Labor: Population uses 86% of available methods on average
  3. Performance Benefit: Diverse populations outperform homogeneous by +25.9% on average
  4. Robust to update-rule choice: V4 numbers are within Β±2% of V3-era numbers on every metric, confirming that method specialization is not an artifact of the affinity-update implementation.

MARL Head-to-Head (V4, 4 Domains, 5000 Episodes Γ— 10 Trials)

Direct comparison against IQL, VDN, QMIX, MAPPO under V4. All methods use 8 learners and identical state/action spaces.

Method Crypto Commodities Weather Traffic
GAUSE (Ours) 1.000 1.000 1.000 1.000
IQL 0.008 0.007 0.016 0.011
VDN 0.009 0.007 0.015 0.011
QMIX 0.009 0.007 0.014 0.011
MAPPO 0.000 0.000 0.000 0.000

Key Finding: GAUSE reaches the maximum SI (= 1.000) in every domain while every MARL baseline stays at ≀ 0.02 β€” a β‰₯ 100Γ— qualitative gap. On rare-regime task rewards, GAUSE also beats the closest MARL method (IQL) by +5.1% to +8.3% per regime (+6.7% averaged).

Method Distribution Examples

Crypto Domain:

  • mean_revert: 45.8% of learners
  • momentum_long: 38.3% of learners
  • trend: 10.8% of learners
  • momentum_short: 4.6% of learners
  • naive: 0.4% of learners

Traffic Domain (best diversity):

  • rush_hour: 33.8% of learners
  • exp_smooth: 20.4% of learners
  • weekly_pattern: 19.2% of learners
  • hourly_avg: 13.8% of learners
  • persistence: 12.9% of learners

πŸ“ Prediction Methods (Mathematical Formulas)

Each domain has 5 prediction methods. Learners learn which method works best for each regime through Thompson sampling.

πŸ“ˆ Crypto Domain

Method Description Formula
naive Persistence pΜ‚β‚œ = pβ‚œβ‚‹β‚
momentum_short 5-period momentum pΜ‚β‚œ = pβ‚œβ‚‹β‚ + 0.1 Γ— (pβ‚œβ‚‹β‚ - pβ‚œβ‚‹β‚…)
momentum_long 20-period momentum pΜ‚β‚œ = pβ‚œβ‚‹β‚ + 0.05 Γ— (pβ‚œβ‚‹β‚ - pβ‚œβ‚‹β‚‚β‚€)
mean_revert Mean reversion to MA20 pΜ‚β‚œ = pβ‚œβ‚‹β‚ + 0.2 Γ— (MAβ‚‚β‚€ - pβ‚œβ‚‹β‚)
trend Linear trend extrapolation pΜ‚β‚œ = pβ‚œβ‚‹β‚ + slope(pβ‚œβ‚‹β‚β‚€:β‚œ)

πŸ“Š Commodities Domain

Method Description Formula
naive Persistence pΜ‚β‚œ = pβ‚œβ‚‹β‚
ma5 5-day moving average pΜ‚β‚œ = (1/5) Γ— Ξ£α΅’β‚Œβ‚β΅ pβ‚œβ‚‹α΅’
ma20 20-day moving average pΜ‚β‚œ = (1/20) Γ— Ξ£α΅’β‚Œβ‚Β²β° pβ‚œβ‚‹α΅’
mean_revert Mean reversion (Ξ±=0.3) pΜ‚β‚œ = pβ‚œβ‚‹β‚ + 0.3 Γ— (MAβ‚‚β‚€ - pβ‚œβ‚‹β‚)
trend 5-day trend extrapolation pΜ‚β‚œ = pβ‚œβ‚‹β‚ + (pβ‚œβ‚‹β‚ - pβ‚œβ‚‹β‚…)/5

🌀️ Weather Domain

Method Description Formula
naive Persistence TΜ‚β‚œ = Tβ‚œβ‚‹β‚
ma3 3-day moving average TΜ‚β‚œ = (1/3) Γ— Ξ£α΅’β‚Œβ‚Β³ Tβ‚œβ‚‹α΅’
ma7 7-day moving average TΜ‚β‚œ = (1/7) Γ— Ξ£α΅’β‚Œβ‚β· Tβ‚œβ‚‹α΅’
seasonal Same day last week TΜ‚β‚œ = Tβ‚œβ‚‹β‚‡
trend 3-day trend extrapolation TΜ‚β‚œ = Tβ‚œβ‚‹β‚ + (Tβ‚œβ‚‹β‚ - Tβ‚œβ‚‹β‚ƒ)/3

β˜€οΈ Solar Domain

Method Description Formula
naive Persistence Δœβ‚œ = Gβ‚œβ‚‹β‚
ma6 6-hour moving average Δœβ‚œ = (1/6) Γ— Ξ£α΅’β‚Œβ‚βΆ Gβ‚œβ‚‹α΅’
clear_sky Clear sky model Δœβ‚œ = G_clear(t) (theoretical max)
seasonal Same hour yesterday Δœβ‚œ = Gβ‚œβ‚‹β‚‚β‚„
hybrid Weighted blend Δœβ‚œ = 0.6 Γ— Gβ‚œβ‚‹β‚ + 0.4 Γ— G_clear(t)

πŸš• Traffic Domain

Method Description Formula
persistence Last value vΜ‚β‚œ = vβ‚œβ‚‹β‚
hourly_average Historical hourly mean vΜ‚β‚œ = vΜ„_h(t) where h(t) = hour of day
weekly_pattern Same hour last week vΜ‚β‚œ = vβ‚œβ‚‹β‚β‚†β‚ˆ (168 = 24Γ—7 hours)
rush_hour_model Regime-based prediction vΜ‚β‚œ = vΜ„_regime(t)
exponential_smoothing EMA (Ξ±=0.3) vΜ‚β‚œ = 0.3Β·vβ‚œβ‚‹β‚ + 0.7Β·vΜ‚β‚œβ‚‹β‚

🌬️ Air Quality Domain

Method Description Formula
persistence Last value qΜ‚β‚œ = qβ‚œβ‚‹β‚
hourly_average Historical hourly mean qΜ‚β‚œ = qΜ„_h(t)
moving_average 24-hour MA qΜ‚β‚œ = (1/24) Γ— Ξ£α΅’β‚Œβ‚Β²β΄ qβ‚œβ‚‹α΅’
regime_average AQI regime-based qΜ‚β‚œ = qΜ„_regime(qβ‚œβ‚‹β‚)
exponential_smoothing EMA (Ξ±=0.3) qΜ‚β‚œ = 0.3Β·qβ‚œβ‚‹β‚ + 0.7Β·qΜ‚β‚œβ‚‹β‚

Method Categories

Category Methods Best For
Baseline naive, persistence Stable regimes, hard to beat
Smoothing ma3, ma5, ma7, ma20, moving_average Noisy data, reduces variance
Momentum momentum_short, momentum_long, trend Trending regimes
Mean Reversion mean_revert Volatile regimes, overshoots
Seasonal seasonal, weekly_pattern, hourly_average Predictable patterns
Adaptive exponential_smoothing, hybrid Balance between recent and history

Experimental Rigor Checklist

Requirement Status
Same trials across all domains βœ… 30 trials
Same iterations per trial βœ… 500 iterations
Same number of learners βœ… 8 learners
Same methods per domain βœ… 5 methods
Lambda ablation on ALL domains βœ… 6 Ξ» values Γ— 6 domains
Method specialization on ALL domains βœ… 8 learners Γ— 5 methods Γ— 6 domains
Statistical tests on ALL domains βœ… t-test, Cohen's d, p-value
Random baseline on ALL domains βœ… 30 trials each
Homogeneous baseline on ALL domains βœ… 30 trials each
100% Real data βœ… All 6 domains

Data Source Verification

Domain Source Verification
πŸ“ˆ Crypto Bybit Exchange βœ… Real exchange data with funding rates, OI, basis
πŸ“Š Commodities fred.stlouisfed.org βœ… US Government official data (captured -$36.98 oil on 2020-04-20)
🌀️ Weather Open-Meteo API βœ… ERA5 reanalysis + weather stations
β˜€οΈ Solar Open-Meteo Solar βœ… CAMS satellite-derived irradiance

πŸ—οΈ Architecture

GAUSE/
β”œβ”€β”€ πŸ“ src/                           # Core implementation
β”‚   β”œβ”€β”€ agents/                       # ⭐ Core algorithm (GAUSE)
β”‚   β”‚   └── niche_population.py       # NicheAgent + NichePopulation class
β”‚   β”‚                                 #   (implements GAUSE; V4 EG default, V3 legacy)
β”‚   β”œβ”€β”€ domains/                      # Multi-domain data adapters
β”‚   β”‚   β”œβ”€β”€ crypto.py / commodities.py
β”‚   β”‚   β”œβ”€β”€ weather.py / solar.py
β”‚   β”‚   └── traffic.py / air_quality.py
β”‚   β”œβ”€β”€ baselines/                    # Comparison baselines (IQL, VDN, QMIX, MAPPO)
β”‚   β”œβ”€β”€ analysis/                     # SI, regret, diagnostic helpers
β”‚   └── theory/                       # Formal propositions (Python form)
β”œβ”€β”€ πŸ“ experiments/                   # Reproducible experiments
β”‚   β”œβ”€β”€ _affinity_update.py           # ⭐ Shared V3/V4 update helper
β”‚   β”œβ”€β”€ exp_unified_pipeline.py       # ⭐ Main 6-domain pipeline (V4)
β”‚   β”œβ”€β”€ exp_capacity_division.py      # ⭐ Coverage under bounded capacity (5 arms + overlap sweep)
β”‚   β”œβ”€β”€ exp_nonstationary_capacity.py # ⭐ Retention / catastrophic forgetting (5 arms; --soft model)
β”‚   β”œβ”€β”€ exp_function_approx_cl.py     # ⭐ Architecture-agnostic forgetting (gradient-trained MLP experts)
β”‚   β”œβ”€β”€ exp_hybrid_router.py          # Reward-driven router + reservation term
β”‚   β”œβ”€β”€ exp_intra_regime_drift.py     # Stale-specialist drift + staleness trigger
β”‚   β”œβ”€β”€ exp_oracle_fixed.py           # Oracle fixed-assignment skyline
β”‚   β”œβ”€β”€ exp_latent_regime.py          # Class-incremental (label-free) GAUSE
β”‚   β”œβ”€β”€ exp_population_sizing.py      # Off-diagonal Nβ‰ R coverage/retention sweep
β”‚   β”œβ”€β”€ download_gas_data.py          # ⭐ Fetch UCI Gas Sensor Array Drift (v4.3)
β”‚   β”œβ”€β”€ exp_real_data_gas.py          # ⭐ Real-data retention + label-free + CL baselines (v4.3)
β”‚   β”œβ”€β”€ exp_robustness_sweep.py       # Robustness sweep (--data for real gas) (v4.3)
β”‚   β”œβ”€β”€ exp_split_cifar_cl.py         # ⭐ Split-CIFAR-100 CNN experts (v4.3)
β”‚   β”œβ”€β”€ plot_real_data_gas.py         # Render fig_real_data_gas.pdf
β”‚   β”œβ”€β”€ exp_method_specialization.py  # Method specialization (V4)
β”‚   β”œβ”€β”€ exp_marl_comparison.py        # MARL head-to-head (V4)
β”‚   β”œβ”€β”€ exp_lambda_ablation.py        # Ξ» ablation (V4)
β”‚   β”œβ”€β”€ exp_lambda_zero_real.py       # Ξ» = 0 emergence on real data (V4)
β”‚   β”œβ”€β”€ exp_v4_v3_comparison.py       # V3 vs V4 diagnostic ablation
β”‚   └── exp_task_performance.py       # (synthetic / illustrative)
β”œβ”€β”€ πŸ“ tests/                         # Unit tests
β”‚   └── test_eg_update.py             # 19 tests for V4 EG properties
β”œβ”€β”€ πŸ“ data/                          # Datasets (committed: 6 domains; downloaded: gas, cifar)
β”‚   β”œβ”€β”€ bybit/         commodities/   weather/
β”‚   β”œβ”€β”€ solar/         traffic/       air_quality/
β”‚   β”œβ”€β”€ gas_sensor/                   # UCI Gas Sensor Array Drift (download_gas_data.py; gitignored)
β”‚   └── cifar/                        # CIFAR-100 (auto-downloaded by exp_split_cifar_cl.py; gitignored)
β”œβ”€β”€ πŸ“ results/                       # Experiment outputs
β”‚   β”œβ”€β”€ unified_pipeline/             # Main pipeline outputs (V4)
β”‚   β”œβ”€β”€ capacity_division/            # Coverage results (results.json + overlap sweep)
β”‚   β”œβ”€β”€ nonstationary_capacity/       # Retention results (+ oracle-fixed, latent-regime JSON)
β”‚   β”œβ”€β”€ function_approx_cl/           # Gradient-trained MLP experts (permuted-digits)
β”‚   β”œβ”€β”€ split_cifar_cl/               # ⭐ Split-CIFAR-100 CNN experts (v4.3)
β”‚   β”œβ”€β”€ real_data/                    # ⭐ Real gas retention + robustness JSONs (v4.3)
β”‚   β”œβ”€β”€ hybrid_router/                # Reward-driven router + reservation term
β”‚   β”œβ”€β”€ intra_regime_drift/           # Drift + staleness trigger
β”‚   β”œβ”€β”€ population_sizing/            # Off-diagonal Nβ‰ R sweep
β”‚   β”œβ”€β”€ v4_v3_comparison_matched_rate/  # V3 vs V4 ablation
β”‚   β”œβ”€β”€ real_marl_comparison/         # MARL head-to-head outputs
β”‚   └── method_specialization/        # Method specialization outputs
β”œβ”€β”€ πŸ“ paper/                         # LaTeX paper sources (build: latexmk -pdf in paper/)
β”‚   β”œβ”€β”€ main.tex                      # Canonical paper (42 pages, v4.3)
β”‚   β”œβ”€β”€ method_deep_dive.tex          # Deep-dive companion (93 pages)
β”‚   β”œβ”€β”€ gause_explainer.tex           # System explainer (29 pages)
β”‚   β”œβ”€β”€ figures/                      # All paper figures (committed PDFs)
β”‚   └── references.bib
β”œβ”€β”€ πŸ“ docs/                          # Reports + research docs
β”‚   β”œβ”€β”€ V4_FINAL_REPORT.md            # Comprehensive V4 renovation report
β”‚   β”œβ”€β”€ V4_EG_RENOVATION_AUDIT.md     # V3 defect audit + V4 derivation
β”‚   β”œβ”€β”€ AUDIT_REPORT.md               # Repo-wide audit report
β”‚   └── ARXIV_SUBMISSION_GUIDE.md     # arXiv packaging instructions
└── πŸ“ scripts/                       # Data download + plotting utilities
    β”œβ”€β”€ download_real_*.py            # Data downloaders
    β”œβ”€β”€ plot_v4_v3_comparison.py      # V4 vs V3 plots
    └── generate_neurips_figures.py

πŸš€ Quick Start

Installation

# Clone repository
git clone https://github.com/HowardLiYH/GAUSE.git
cd GAUSE

# Create conda environment
conda create -n emergent python=3.10
conda activate emergent

# Install dependencies
pip install -e .

Download Real Data

# Weather (Open-Meteo - no API key needed)
python scripts/download_real_weather.py

# Solar (Open-Meteo - no API key needed)
python scripts/download_real_solar.py

# Commodities (FRED - no API key needed)
python scripts/download_fred_commodities_real.py

Run Experiments

# Main 6-domain pipeline (Table 1 in the paper, V4)
python experiments/exp_unified_pipeline.py

# Headline: coverage under bounded capacity (5 arms + method-overlap sweep, fig6)
python experiments/exp_capacity_division.py

# Headline: retention under non-stationarity (5 arms, fig7; --soft adds the
# interference-model robustness check, fig8)
python experiments/exp_nonstationary_capacity.py --soft

# Closing the design space (v4.1):
python experiments/exp_function_approx_cl.py   # architecture-agnostic forgetting (needs torch)
python experiments/exp_hybrid_router.py        # router + reservation term
python experiments/exp_intra_regime_drift.py   # stale specialists + staleness trigger
python experiments/exp_oracle_fixed.py         # oracle fixed-assignment skyline
python experiments/exp_latent_regime.py        # class-incremental (label-free) GAUSE
python experiments/exp_population_sizing.py    # off-diagonal N≠R sizing

# Real-data validation (v4.3) β€” data is downloaded on demand, not committed:
python experiments/download_gas_data.py                          # -> data/gas_sensor/batch1..10.dat (UCI, ~5s)
python experiments/exp_real_data_gas.py --data data/gas_sensor   # retention + label-free + CL baselines on real gas
python experiments/exp_robustness_sweep.py --data data/gas_sensor # robustness sweep on the real stream
python experiments/plot_real_data_gas.py                         # -> paper/figures/fig_real_data_gas.pdf
python experiments/exp_split_cifar_cl.py                          # Split-CIFAR-100 CNN experts (needs torch+torchvision; auto-downloads CIFAR)

# Method specialization (Table 2 in the paper, V4)
python experiments/exp_method_specialization.py

# MARL head-to-head (Table 3 in the paper, V4)
python experiments/exp_marl_comparison.py

# Lambda ablation (V4)
python experiments/exp_lambda_ablation.py

# V3 vs V4 diagnostic ablation (clamp invocations, mass drift, etc.)
python experiments/exp_v4_v3_comparison.py --matched-rate

# Generate publication figures
python scripts/generate_neurips_figures.py

Unit Tests

python -m pytest tests/test_eg_update.py -v
# 19/19 passing: simplex preservation, interior preservation,
# no-clamp invariance, V3/V4 first-order step-size ratio.

πŸ“ˆ SI-Performance Correlation (V3-era numbers; V4 re-derivation pending)

The correlation analysis below was computed under V3. Because V4 collapses the SI distribution close to 1.0 in nearly every trial, a direct re-run of the same Pearson correlation under V4 is dominated by ceiling effects and is less informative. The qualitative conclusion (higher SI β†’ better task performance) is preserved; a more diagnostic V4 version using Ξ»-swept SI (where SI varies in [0.5, 1.0]) is on the v4.1 roadmap.

Metric Value (V3) Interpretation
Pearson r 0.525 Moderate-strong positive correlation
p-value < 0.0001 Highly significant
Regression Ξ”% = 52.9 Γ— SI βˆ’ 14.2 Higher SI β†’ Better performance
RΒ² 0.276 SI explains 28% of performance variance

Per-Domain Correlation (V3):

Domain r p-value Interpretation
Crypto +0.411 0.024* Moderate
Commodities +0.591 0.0006*** Strong
Weather +0.349 0.059 Boundary condition (P3)
Solar +0.515 0.004** Strong

Note on Weather under V3: Weather was reported in v1.0–v3.x as a Proposition-3 boundary condition (mono-regime collapse) with the lowest SI. Under V4, Weather reaches SI = 0.991 (matching the other R = 4 domains), so the "boundary condition" framing applies to the V3 implementation rather than the underlying competitive-specialization mechanism.


πŸ”¬ Theoretical Foundation (Formal Proofs)

Core Propositions

Proposition 1: Competitive Exclusion (Game-Theoretic Proof)

In a winner-take-all game with n learners competing across k regimes, complete competitors cannot coexist at Nash equilibrium.

Proof: When identical strategies yield payoff V/n βˆ’ c, deviation to empty niche yields V βˆ’ c > V/n βˆ’ c for n β‰₯ 2. No symmetric Nash equilibrium exists.

Proposition 2: SI Lower Bound (Optimization Proof)

For niche bonus Ξ» > 0 and k regimes: E[SI] β‰₯ Ξ»/(1+Ξ») Β· (1 βˆ’ 1/k)

Proof: Using Lagrangian optimization on the learner's reward function with entropy constraint. For Ξ» = 0.3, k = 4: SI β‰₯ 0.173. Our V4 observed SI (β‰ˆ 0.99) exceeds this bound by a large margin (the bound is conservative).

Proposition 3: Mono-Regime Collapse (Limit Analysis)

As dominant regime fraction Ξ· β†’ 1, meaningful SI β†’ 0.

Proof: k_eff = exp(H(regime_dist)). As Ξ· β†’ 1, k_eff β†’ 1, leaving nothing to specialize between.

Additional V4-era propositions (deep-dive companion)

The full mathematical treatment is in paper/method_deep_dive.tex (72 pages, compiled method_deep_dive.pdf):

  • Prop 9.1–9.3 β€” Structural defects of the V3 additive heuristic (mass drift, eventual negativity, state-dependent effective rate).
  • Prop 9.4–9.6 β€” V4 EG update preserves the simplex by construction, preserves the interior strictly, and reduces to replicator dynamics in the small-Ξ· limit.
  • Theorem 9.1 β€” Hedge regret bound: the V4 update inherits the canonical $O(\sqrt{T \log R})$ regret guarantee via the Arora–Hazan–Kale potential-function argument.

πŸ“Š Figures

Five publication-quality figures in results/figures/:

  1. fig1_cross_domain_si.pdf - Cross-domain SI comparison
  2. fig2_marl_comparison.pdf - MARL baseline comparison
  3. fig3_improvement_scatter.pdf - SI vs improvement correlation
  4. fig4_regime_distribution.pdf - Regime distributions by domain
  5. fig5_summary_heatmap.pdf - Summary heatmap

πŸ“‹ Changelog

v4.3.0 (2026-06-11) β€” Real Data: The Honest Split ⭐⭐⭐

Major Update: synthetic-only validation replaced with real data; the contribution is reframed as a mechanism/lens, not a performance win

  • βœ… Real retention test on UCI Gas Sensor Array Drift (13,910 samples, 128-d, 6 classes, 10 batches / 36 months; exp_real_data_gas.py --data). The honest three-way split:
    1. CL forgetting β€” refuted for linear-on-features, relocated to the neural regime. A full-capacity online linear classifier does not catastrophically forget on real gas features (naive/replay 0.68 post-react, 0.93 overall; EWC worse at 0.44 β€” its anchor fights drift). The dissociation lives in representation-sharing neural experts.
    2. Label-free recovery β€” collapses under real overlap. GAUSE-LF post-react 0.00 (SI 0.53, cov 0.83); the label-free monolith is also 0.00, so the dissociation vanishes. The synthetic 1:1 method↔regime signature was load-bearing β†’ class-incremental claim bounded to separable-signal regimes.
    3. Framing. On real data the contribution is a mechanism and lens, not a performance win.
  • βœ… Split-CIFAR-100 with CNN experts (exp_split_cifar_cl.py, MPS/CUDA/CPU): the neural router-forgetting dissociation reproduces on a standard benchmark β€” GAUSE 0.581 vs router 0.748 post-react at E=R (+22%, p~10⁻³); honestly attenuated vs permuted-digits (+62%).
  • βœ… Robustness sweep on real drift (--data flag): EWC degrades monotonically with its anchor (0.68β†’0.44β†’diverges); GAUSE has no knob.
  • βœ… Papers + figures updated, abstracts/limitations aligned to the real outcome; reviewer math fixes (reallocation tail-bound factor K; soft-model decay exponent). All three PDFs recompiled (explainer 29pp, main 42pp, deep-dive 93pp).

v4.2.0 (2026-06-10) β€” Oracle Skyline, Class-Incremental Variant, Real-Data Scaffolding ⭐⭐

  • βœ… Oracle fixed-assignment skyline (exp_oracle_fixed.py): GAUSE recovers the hand-assigned partition without being given it (within 0.030 at K=1, indistinguishable for Kβ‰₯3).
  • βœ… Class-incremental (label-free) variant (exp_latent_regime.py): the retention result survives dropping the regime label on synthetic streams (SI 0.75, post-react 0.38 vs 1.07).
  • βœ… Reallocation-rate proposition replacing the hand-wave necessity claim; scope made explicitly task-incremental in all three papers.
  • βœ… Real-data tooling scaffolded (validated on a faithful surrogate; run for real in v4.3).

v4.1.0 (2026-06-09) β€” Reward-Independence Reframe + Purpose-Built Baselines ⭐⭐⭐

Major Update: the thesis is reframed around retention under bounded capacity, benchmarked against purpose-built baselines

  • βœ… New headline result: across five capacity-allocation arms, retention of dormant regimes tracks reward-independence of assignment (monolith and learned MoE router forget; random/EOI-diversity/competition retain). +71% post-reactivation vs. monolith (p < 10⁻³⁢); router fails at p ~ 10⁻³⁡.
  • βœ… Two new purpose-built baselines in the capacity experiments: an EOI/CDS-style learned-diversity arm and a Mixture-of-Experts learned gating router.
  • βœ… Method-overlap sweep: competition's edge over learned diversity grows monotonically with method exclusivity (βˆ’4.8% β†’ +29.3%).
  • βœ… Idealized Observation (paper): why a reward-driven router forgets β€” dormant regimes emit no protective reward signal.
  • βœ… Soft interference capacity model (--soft): the dissociation survives removing LRU eviction entirely (not an artifact of discrete eviction).
  • βœ… Catastrophic-forgetting framing with continual-learning citations; engagement with MoE-CL theory (ICLR'25, arXiv:2406.16437) β€” gate-freezing ⇔ reward-independent assignment.
  • βœ… Paper restructure: new title (Reward-Independent Capacity Assignment as a Defense Against Catastrophic Forgetting); coverage + retention promoted to Main Results; 95% CI error bars on figs 6–8; honest-claim softening in the intro.
  • βœ… New explainer document: paper/gause_explainer.pdf (17 pp) β€” full-system walkthrough of architecture (with diagrams), mechanisms, the reward-independence principle, experiments, and potential applications.
  • βœ… Six design-space experiments added (exp_function_approx_cl.py, exp_hybrid_router.py, exp_intra_regime_drift.py, exp_oracle_fixed.py, exp_latent_regime.py, exp_population_sizing.py): the router's forgetting is architecture-agnostic (gradient-trained MLP experts, +62% at E=R); a reservation term recovers retention only with spare capacity (reward-independence of protection is the operative property); intra-regime drift quantifies stale specialists and a staleness-trigger remedy; an oracle fixed-assignment skyline shows GAUSE recovers the hand-assigned partition without being given it (within 0.030 at K=1); a class-incremental (label-free) variant retains (0.38 vs 1.07) with the regime label removed; and a population-sizing sweep yields the rule provision N ≳ R.

v4.0.0 (2026-06-04) β€” Exponentiated-Gradient Canonical Renovation ⭐⭐⭐

Major Update: replace the V3 additive + clamp heuristic with the canonical Hedge / multiplicative-weights update

  • βœ… Algorithm: niche affinity update is now the canonical exponentiated-gradient (EG) update on the regime simplex. Preserves the simplex by construction, no clamp needed, $O(\sqrt{T \log R})$ Hedge regret bound.
  • βœ… Theory: full derivation, structural proofs of V3's mass-drift / negativity / state-dependent-rate defects, Hedge regret-bound derivation, and small-Ξ· replicator-dynamics limit (paper/method_deep_dive.tex, 72 pages).
  • βœ… Headline numbers strengthened (V3 β†’ V4):
    • Mean SI: 0.747 β†’ 0.992
    • Mean Cohen's d vs. homogeneous: β‰ˆ23 β†’ β‰ˆ73
    • Mean SI at Ξ» = 0: 0.329 β†’ 0.650
    • GAUSE vs. MARL SI gap: 4.3Γ— β†’ β‰₯100Γ— (1.000 vs. ≀ 0.02)
    • Traffic (R = 6): 0.573 (lowest) β†’ 0.995 (no longer outlier)
  • βœ… Tests: 19/19 passing in tests/test_eg_update.py.
  • βœ… All experiments converted to V4; V3 retained behind update_rule="v3_additive" for ablation/comparison.
  • βœ… Reports: docs/V4_FINAL_REPORT.md, docs/V4_EG_RENOVATION_AUDIT.md.
  • βœ… Release: tagged v4.0.0 with main.pdf and method_deep_dive.pdf attached.

v3.0.0 (2026-01-16) - Learner Populations Reframing ⭐

Major Update: Reframed from "Multi-Agent Systems" to "Learner Populations"

  • βœ… Terminology Update: "agents" β†’ "learners" throughout
  • βœ… Paper Title: "Emergent Specialization in Learner Populations"
  • βœ… Clearer Positioning: Distinguishes from LLM-based agents
  • βœ… arXiv Ready: Updated paper ready for submission

v2.0.0 (2024-12-23) - Real Data Validation

Major Update: All experiments now use 100% verified real data

  • βœ… 4 Real Data Domains: Crypto, Commodities, Weather, Solar
  • βœ… 175K+ real records across all domains
  • βœ… MARL Comparison: GAUSE beats IQL by 2-4x
  • βœ… 5 Publication Figures generated
  • βœ… 3 Theoretical Propositions with proof sketches
  • βœ… Limitations Section for honest assessment

v1.7.0 (2024-12-22) - Unified Prediction & Mechanistic Analysis

  • πŸ“Š Unified prediction experiment across domains
  • πŸ”¬ Mechanistic analysis: why specialization works
  • ⚑ Computational benchmarks: 2-4Γ— faster than MARL

v1.6.0 (2024-12-22) - Multi-Domain Validation

  • πŸš• NYC Taxi (Traffic): SI = 0.73
  • ⚑ EIA Energy: SI = 0.88
  • πŸ“ˆ Bybit Finance: SI = 0.86

πŸ”¬ Reproducibility

Setting Value
Random Seeds 0-29 (30 trials per experiment)
Statistical Tests Bonferroni-corrected (Ξ± = 0.05/k)
Confidence Intervals 95% Bootstrap CI
Effect Sizes Cohen's d reported

All data sources are free and publicly accessible without API keys.


πŸ“š Citation

@misc{li2026gause,
  title     = {{GAUSE}: Emergent Specialization in Learner Populations ---
               Reward-Independent Capacity Assignment as a Defense
               Against Catastrophic Forgetting},
  author    = {Li, Yuhao},
  year      = {2026},
  howpublished = {\url{https://github.com/HowardLiYH/GAUSE}},
  note      = {arXiv preprint}
}

πŸ“„ License

MIT License - See LICENSE for details.


⭐ Star this repo if you find it useful!

Report Bug β€’ Request Feature

About

A router-free, reward-independent alternative to MoE gating that resists catastrophic forgetting in continual/non-stationary settings.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages