Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
f6ab75e
upgrade
Mar 13, 2026
ecec105
Merge pull request #384 from centreformicrosimulation/improve-documen…
hk-2029 Mar 13, 2026
84e404c
update
Mar 13, 2026
4a0bb37
fix: set heading colours to navy, fix grey faded headings
Mar 14, 2026
b90c278
docs: trim redundancy, fix accuracy issues, improve navigation
Mar 14, 2026
1bf0811
docs: annotate default.yml and expand configuration.md
Mar 14, 2026
83af776
docs: fix factual errors, add validation and data-pipeline guides
Mar 14, 2026
24e06c6
docs: rename scenario-cookbook.md to run-configuration.md
Mar 14, 2026
9c757d2
docs: write file-organisation page for Developer Guide internals
Mar 16, 2026
75d611f
docs: fix IO description in file-organisation, remove incorrect modul…
Mar 16, 2026
ebad3db
docs: correct training data description — de-identified synthetic, no…
Mar 16, 2026
dd994aa
docs: fix script/class counts, remove stale sentence, add missing uti…
Mar 16, 2026
a41a221
docs: condense overlong table entries in file-organisation
Mar 16, 2026
00d3a0a
docs: delete architecture.md, merge run-configuration into configuration
Mar 16, 2026
b6b0f4e
docs: remove country selection references, UK only
Mar 16, 2026
70c3d34
docs: replace data-and-outputs with repository-structure, remove gett…
Mar 16, 2026
c4374ba
Enhance file organization documentation with details
hk-2029 Mar 16, 2026
6963f52
Delete documentation/repository-structure.md
hk-2029 Mar 16, 2026
54110cf
Update recommended reading order in README
hk-2029 Mar 16, 2026
3f42a5d
docs: restructure file-organisation, remove cli-reference and reposit…
Mar 16, 2026
2b3a8ac
Condense file organiation
hk-2029 Mar 17, 2026
434fb06
docs: fold troubleshooting into configuration, delete troubleshooting.md
Mar 17, 2026
b256711
docs: add mkdocs.yml and fix deploy workflow for automatic site builds
Mar 17, 2026
0ff255f
docs: consolidate into single README.md, remove redundant files
Mar 17, 2026
99fc809
docs: move quick start to root README, slim documentation/README to d…
Mar 17, 2026
ed6e4cd
Merge pull request #386 from centreformicrosimulation/improve-documen…
dav-sonn Mar 17, 2026
c8c8c97
Documentation harmonisation and repo guide
dav-sonn Mar 17, 2026
d25b046
Merge branch 'main' into develop
dav-sonn Mar 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.
94 changes: 94 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Build Commands

```bash
# Build (skip tests)
mvn clean package -DskipTests

# Run unit tests
mvn test

# Run a single test class
mvn test -Dtest=PersonTest

# Run all tests including integration tests
mvn verify
```

The build produces two runnable JARs:
- `target/singlerun.jar` — single simulation run (GUI or headless)
- `target/multirun.jar` — batch runs from a YAML config file

## Running the Simulation

```bash
# Single run (headless, UK, setup from scratch)
java -jar target/singlerun.jar -g false -c UK -Setup

# Multi-run batch from config
java -jar target/multirun.jar -config config/default.yml -g false
```

Key CLI flags: `-c` (country), `-s` (start year), `-e` (end year), `-g` (GUI true/false), `-Setup` (rebuild database), `-r` (random seed), `-p` (population size).

## Architecture

SimPaths is a discrete-time (annual steps) agent-based microsimulation framework built on the [JAS-mine](https://www.jas-mine.net/) engine. It projects life histories forward across labour, family, health, and financial domains.

### Agent Hierarchy

```
Household → BenefitUnit(s) → Person(s)
```

- **Person** (`simpaths/model/Person.java`) — individual agent; carries all demographics, health, education, labour, and income state.
- **BenefitUnit** (`simpaths/model/BenefitUnit.java`) — tax/benefit assessment unit (one or two adults + dependents).
- **Household** (`simpaths/model/Household.java`) — grouping of benefit units at the same address.

### Package Map

| Package | Responsibility |
|---|---|
| `simpaths/experiment/` | Entry points and orchestration: `SimPathsStart`, `SimPathsMultiRun`, `SimPathsCollector`, `SimPathsObserver` |
| `simpaths/model/` | Core simulation logic: agent classes, annual process methods, alignment, labour market, tax evaluation, intertemporal decisions |
| `simpaths/data/` | Parameters, setup routines, input parsers, filters, statistics helpers, regression managers, EUROMOD donor matching |

### Simulation Engine

`SimPathsModel.java` is the central manager registered with JAS-mine. It owns all agent collections and builds the ordered event schedule. Each simulated year runs **44 ordered processes** covering:
1. Year setup / parameter updates
2. Demographic events (ageing, mortality, fertility, education)
3. Labour market transitions
4. Partnership dynamics (cohabitation, separation, union matching via `UnionMatching.java`)
5. Health and wellbeing
6. Tax-benefit evaluation (via EUROMOD donor matching in `TaxEvaluation.java`)
7. Financial outcomes and aggregate alignment to calibration targets

### Configuration System

Runtime parameters live in `config/default.yml` (template) and are loaded by `SimPathsMultiRun`. The layered override order is: **class defaults → YAML values → CLI flags**.

Key top-level YAML keys: `maxNumberOfRuns`, `executeWithGui`, `randomSeed`, `startYear`, `endYear`, `popSize`. Model-specific keys toggle alignment, time-trend controls, and individual module switches.

### Data / Database

The initial population and EUROMOD donor data are stored in an embedded **H2 database** built during the `-Setup` phase. Integration tests that rebuild or query the database are in `src/test/java/simpaths/integrationtest/`.

## Key Tech

- **Java 19**, Maven 3.x
- **JAS-mine 4.3.25** — microsimulation engine and GUI
- **JUnit 5 + Mockito 5** for tests
- **Apache Commons Math3, CLI, CSV** and **SnakeYAML** for utilities

## Documentation

Detailed guides are in `documentation/`:
- `model-concepts.md` — agent lifecycle and annual-cycle detail
- `configuration.md` — YAML structure, config keys, and how to write your own
- `data-pipeline.md` — how input data is prepared and loaded
- `validation-guide.md` — model validation procedures
- `cli-reference.md` — full CLI argument reference
248 changes: 168 additions & 80 deletions config/default.yml
Original file line number Diff line number Diff line change
@@ -1,89 +1,177 @@
# This file can be used to override defaults for multirun arguments.
# Arguments of the SimPathsMultiRun object overridden by the command-line

maxNumberOfRuns: 1
executeWithGui: false
randomSeed: 606
startYear: 2019
endYear: 2022
popSize: 50000
# countryString: "United Kingdom"
# integrationTest: false

# Arguments passed to the SimPathsModel
# SimPaths multi-run configuration file.
# Uncomment and edit any field to override its default value.
# CLI flags take final precedence over anything set here.

# ── Top-level run arguments ────────────────────────────────────────────────────

maxNumberOfRuns: 1 # number of sequential simulation runs
executeWithGui: false # true = launch JAS-mine GUI; false = headless (required on servers/CI)
randomSeed: 606 # seed for the first run; incremented automatically if randomSeedInnov is true
startYear: 2019 # first year of simulation (must have matching input/donor data)
endYear: 2022 # last year of simulation (inclusive)
popSize: 50000 # simulated population size (larger = more accurate, slower)
# countryString: "United Kingdom" # "United Kingdom" or "Italy" (auto-detected from donor DB if omitted)
# integrationTest: false # true = write output to a fixed folder for comparison in CI tests


# ── model_args: passed to SimPathsModel ───────────────────────────────────────
# All keys map directly to @GUIparameter fields on SimPathsModel.
# Values shown are the class defaults.

model_args:
# maxAge: 130
# fixTimeTrend: true
# timeTrendStopsIn: 2017
# timeTrendStopsInMonetaryProcesses: 2017
# fixRandomSeed: true
# sIndexTimeWindow: 5
# sIndexAlpha: 2
# sIndexDelta: 0
# savingRate: 0
# initialisePotentialEarningsFromDatabase: true
# useWeights: false
# useSBAMMatching:
# projectMortality: true
# alignPopulation: true
# alignFertility: true
# alignEducation: false
# alignInSchool: false
# alignCohabitation: false
# labourMarketCovid19On: false
# projectFormalChildcare: true
# donorPoolAveraging: true
# alignEmployment: false
# projectSocialCare: false
# addRegressionStochasticComponent: true
# fixRegressionStochasticComponent: false
# flagSuppressChildcareCosts: false
# flagSuppressSocialCareCosts: false

# --- Time trend controls ---
# maxAge: 130 # maximum age kept in simulation; persons above this are removed
# fixTimeTrend: true # if true, freezes the time trend in regression equations
# timeTrendStopsIn: 2017 # year at which the time trend is frozen (if fixTimeTrend: true)
# timeTrendStopsInMonetaryProcesses: 2017 # same freeze year applied to monetary/income regressions only

# --- Random number controls ---
# fixRandomSeed: true # if true, each run uses the same fixed seed (randomSeedIfFixed)

# --- Income security (S-Index) ---
# The S-Index is an economic (in)security index computed from a rolling window of
# equivalised consumption, discounted and weighted by a risk-aversion parameter.
# SIndex_p50 is reported in Statistics1.csv each year.
# sIndexTimeWindow: 5 # length of rolling window in years (default 5)
# sIndexAlpha: 2 # coefficient of relative risk aversion (higher = more sensitivity to drops)
# sIndexDelta: 0.98 # annual discount factor applied to past consumption observations

# --- Savings ---
# savingRate: 0.056 # fraction of equivalised disposable income saved (used when IO is disabled);
# default is OECD average UK household saving rate 2000–2019

# --- Wage initialisation ---
# initialisePotentialEarningsFromDatabase: true # initialise wage potential from donor DB rather than input CSV

# --- Population weighting ---
# useWeights: false # if true, apply survey weights in alignment and statistics calculations

# --- Matching method ---
# useSBAMMatching: # if true, use SBAM instead of standard union-matching algorithm

# --- Demographic projections ---
# projectMortality: true # if false, disables stochastic mortality (population does not die)

# --- Alignment flags ---
# See model-concepts.md for a full explanation of what alignment does.
# alignPopulation: true # align age-sex-region totals to official population projections
# alignFertility: true # scale birth probabilities to match projected fertility rates
# alignEducation: false # align completed education distribution to targets
# alignInSchool: false # align school participation rate (age 16–29) to targets
# alignCohabitation: false # align share of cohabiting individuals to targets
# alignEmployment: false # align employment share to targets

# --- Labour market modules ---
# labourMarketCovid19On: false # enable reduced-form month-by-month COVID-19 labour market module
# (applies to years 2020–2021 in the baseline parameterisation)

# --- Social care and childcare ---
# projectFormalChildcare: true # simulate formal childcare costs
# projectSocialCare: false # simulate social care receipt and provision module
# flagSuppressChildcareCosts: false # if true, set formal childcare costs to zero (scenario use)
# flagSuppressSocialCareCosts: false # if true, set social care costs to zero (scenario use)

# --- Tax-benefit imputation ---
# donorPoolAveraging: true # if true, average disposable income over k nearest-neighbour donors
# rather than using the single closest donor; reduces imputation volatility

# --- Regression stochasticity ---
# addRegressionStochasticComponent: true # include the residual draw in regression predictions
# fixRegressionStochasticComponent: false # if true, draw the residual once and hold it fixed
# across years (currently applies to wage equations only)

# --- Time-series defaults ---
# flagDefaultToTimeSeriesAverages: false # if true, use the sample average of time-series variables
# rather than the year-specific value when data is unavailable

# --- Intertemporal optimisation (IO) ---
# Enables backward-induction life-cycle solution for consumption and labour supply.
# Decision grids are pre-computed in year 0; agents look up optimal choices each year.
# Computationally intensive — disabled by default.
# enableIntertemporalOptimisations: true
# flagDefaultToTimeSeriesAverages: true
# responsesToLowWageOffer: true
# responsesToPension: false
# saveImperfectTaxDBMatches: false
# useSavedBehaviour: false
# readGrid: "laptop serial"
# saveBehaviour: true
# employmentOptionsOfPrincipalWorker: 3
# employmentOptionsOfSecondaryWorker: 3
# responsesToEducation: true
# responsesToRetirement: false
# responsesToHealth: true
# responsesToDisability: false
# minAgeForPoorHealth: 50
# responsesToRegion: false
# ignoreTargetsAtPopulationLoad: false

# Arguments that alter processing of the SimPathsMultiRun object

# IO state-space: which characteristics agents respond to when choosing labour/consumption.
# Each flag adds a dimension to the grid and increases solve time.
# responsesToHealth: true # include physical health in IO state space
# responsesToDisability: false # include disability status in IO state space
# responsesToEducation: true # include student and education level in IO state space
# responsesToPension: false # include private pension wealth in IO state space
# responsesToRetirement: false # include retirement state (and private pension) in IO state space
# responsesToLowWageOffer: true # include unemployment/low-wage-offer risk in IO state space
# responsesToRegion: false # include geographic region in IO state space
# minAgeForPoorHealth: 45 # minimum age from which less-than-perfect health enters state space

# IO employment options
# employmentOptionsOfPrincipalWorker: 3 # number of discrete hours options for the principal earner
# employmentOptionsOfSecondaryWorker: 3 # number of discrete hours options for the secondary earner

# IO grid persistence — save/reuse pre-computed grids across runs
# saveBehaviour: true # save decision grids to output folder after solving
# useSavedBehaviour: false # load grids from a previous run instead of recomputing
# readGrid: "test1" # name of the run whose grids to load (must match a folder in output/)

# IO diagnostics
# saveImperfectTaxDBMatches: false # log cases where tax-benefit donor matching falls back to a coarser regime

# --- Population load ---
# ignoreTargetsAtPopulationLoad: false # if true, skip alignment-target checks when loading the initial population


# ── innovation_args: parameter variation across sequential runs ────────────────
# These flags control how parameters change between run 0, run 1, run 2, etc.
# Useful for sensitivity analysis and uncertainty quantification.

innovation_args:
# randomSeedInnov: false
# flagDatabaseSetup: false
# intertemporalElasticityInnov: false
# labourSupplyElasticityInnov: true
# randomSeedInnov: true # if true, increment randomSeed by 1 for each successive run
# (default true — each run gets a distinct seed)
# flagDatabaseSetup: false # if true, run database setup instead of simulation
# (equivalent to -DBSetup on the command line)
# intertemporalElasticityInnov: false # if true, applies interest rate shocks across runs:
# run 1: +0.0075 (higher return to saving)
# run 2: -0.0075 (lower return to saving)
# requires maxNumberOfRuns >= 3 to see all variants
# labourSupplyElasticityInnov: false # if true, applies disposable income shocks across runs:
# run 1: +0.01 (higher net labour income)
# run 2: -0.01 (lower net labour income)
# requires maxNumberOfRuns >= 3 to see all variants


# ── collector_args: output collection and export ───────────────────────────────
# Controls what SimPathsCollector writes to CSV / database each year.
#
# Output files:
# Statistics1.csv — income distribution: Gini coefficients, income percentiles, median EDI, S-Index
# Statistics2.csv — demographic validation: partnership rates, employment, health, disability by age/gender
# Statistics3.csv — alignment diagnostics: simulated vs target rates and adjustment factors
# EmploymentStatistics.csv — labour market transitions and participation rates
# HealthStatistics.csv — health measures (SF-12, GHQ-12, EQ-5D) by age/gender

collector_args:
# calculateGiniCoefficients: false
# exportToDatabase: false
# exportToCSV: true
# persistStatistics: true
# persistStatistics2: true
# persistStatistics3: true
# persistPersons: false
# persistBenefitUnits: false
# persistHouseholds: false
# persistEmploymentStatistics: false
# dataDumpStartTime: 0L
# dataDumpTimePeriod: 1.0
# calculateGiniCoefficients: false # compute Gini coefficients (also populates GUI charts); off by default for speed
# exportToDatabase: false # write outputs to H2 database (in addition to or instead of CSV)
# exportToCSV: true # write outputs to CSV files under output/<run>/csv/
# persistStatistics: true # write Statistics1.csv (income distribution)
# persistStatistics2: true # write Statistics2.csv (demographic validation outputs)
# persistStatistics3: true # write Statistics3.csv (alignment diagnostics)
# persistPersons: false # write one row per person per year (large files)
# persistBenefitUnits: false # write one row per benefit unit per year (large files)
# persistHouseholds: false # write one row per household per year
# persistEmploymentStatistics: false # write EmploymentStatistics.csv
# dataDumpStartTime: 0L # first year to write output (0 = startYear)
# dataDumpTimePeriod: 1.0 # output frequency in years (1.0 = every year)


# ── parameter_args: file paths and global flags ───────────────────────────────

parameter_args:
# input_directory: input
# input_directory_initial_populations: input/InitialPopulations
# euromod_output_directory: input/EUROMODoutput
# trainingFlag: false
# includeYears:
# input_directory: input # path to input data folder
# input_directory_initial_populations: input/InitialPopulations # path to initial population CSVs
# euromod_output_directory: input/EUROMODoutput # path to EUROMOD/UKMOD output files
# trainingFlag: false # if true, use training data from input/…/training/ subfolders
# (set automatically by test configs; do not set for research runs)
# includeYears: # list of policy years for which EUROMOD donor data is available;
# only these years will be included in the donor database
# - 2011
# - 2012
# - 2013
Expand All @@ -96,4 +184,4 @@ parameter_args:
# - 2020
# - 2021
# - 2022
# - 2023
# - 2023
Loading
Loading