diff --git a/.DS_Store b/.DS_Store index f18e97d97..1a9d76103 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/.github/workflows/deploy-docs.yml b/.github/workflows/deploy-docs.yml new file mode 100644 index 000000000..66dce2744 --- /dev/null +++ b/.github/workflows/deploy-docs.yml @@ -0,0 +1,48 @@ +name: Deploy Documentation + +on: + push: + branches: + - main + paths: + - 'documentation/wiki/**' + - 'mkdocs.yml' + workflow_dispatch: + +permissions: + contents: read + pages: write + id-token: write + +concurrency: + group: "pages" + cancel-in-progress: false + +jobs: + build-and-deploy: + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.x' + + - name: Install MkDocs and dependencies + run: pip install "mkdocs>=1.6,<2.0" mkdocs-material + + - name: Build documentation site + run: mkdocs build --strict + + - name: Upload Pages artifact + uses: actions/upload-pages-artifact@v3 + with: + path: site/ + + - name: Deploy to GitHub Pages + id: deployment + uses: actions/deploy-pages@v4 diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 000000000..fbfd20763 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,94 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Build Commands + +```bash +# Build (skip tests) +mvn clean package -DskipTests + +# Run unit tests +mvn test + +# Run a single test class +mvn test -Dtest=PersonTest + +# Run all tests including integration tests +mvn verify +``` + +The build produces two runnable JARs: +- `target/singlerun.jar` — single simulation run (GUI or headless) +- `target/multirun.jar` — batch runs from a YAML config file + +## Running the Simulation + +```bash +# Single run (headless, UK, setup from scratch) +java -jar target/singlerun.jar -g false -c UK -Setup + +# Multi-run batch from config +java -jar target/multirun.jar -config config/default.yml -g false +``` + +Key CLI flags: `-c` (country), `-s` (start year), `-e` (end year), `-g` (GUI true/false), `-Setup` (rebuild database), `-r` (random seed), `-p` (population size). + +## Architecture + +SimPaths is a discrete-time (annual steps) agent-based microsimulation framework built on the [JAS-mine](https://www.jas-mine.net/) engine. It projects life histories forward across labour, family, health, and financial domains. + +### Agent Hierarchy + +``` +Household → BenefitUnit(s) → Person(s) +``` + +- **Person** (`simpaths/model/Person.java`) — individual agent; carries all demographics, health, education, labour, and income state. +- **BenefitUnit** (`simpaths/model/BenefitUnit.java`) — tax/benefit assessment unit (one or two adults + dependents). +- **Household** (`simpaths/model/Household.java`) — grouping of benefit units at the same address. + +### Package Map + +| Package | Responsibility | +|---|---| +| `simpaths/experiment/` | Entry points and orchestration: `SimPathsStart`, `SimPathsMultiRun`, `SimPathsCollector`, `SimPathsObserver` | +| `simpaths/model/` | Core simulation logic: agent classes, annual process methods, alignment, labour market, tax evaluation, intertemporal decisions | +| `simpaths/data/` | Parameters, setup routines, input parsers, filters, statistics helpers, regression managers, EUROMOD donor matching | + +### Simulation Engine + +`SimPathsModel.java` is the central manager registered with JAS-mine. It owns all agent collections and builds the ordered event schedule. Each simulated year runs **44 ordered processes** covering: +1. Year setup / parameter updates +2. Demographic events (ageing, mortality, fertility, education) +3. Labour market transitions +4. Partnership dynamics (cohabitation, separation, union matching via `UnionMatching.java`) +5. Health and wellbeing +6. Tax-benefit evaluation (via EUROMOD donor matching in `TaxEvaluation.java`) +7. Financial outcomes and aggregate alignment to calibration targets + +### Configuration System + +Runtime parameters live in `config/default.yml` (template) and are loaded by `SimPathsMultiRun`. The layered override order is: **class defaults → YAML values → CLI flags**. + +Key top-level YAML keys: `maxNumberOfRuns`, `executeWithGui`, `randomSeed`, `startYear`, `endYear`, `popSize`. Model-specific keys toggle alignment, time-trend controls, and individual module switches. + +### Data / Database + +The initial population and EUROMOD donor data are stored in an embedded **H2 database** built during the `-Setup` phase. Integration tests that rebuild or query the database are in `src/test/java/simpaths/integrationtest/`. + +## Key Tech + +- **Java 19**, Maven 3.x +- **JAS-mine 4.3.25** — microsimulation engine and GUI +- **JUnit 5 + Mockito 5** for tests +- **Apache Commons Math3, CLI, CSV** and **SnakeYAML** for utilities + +## Documentation + +Detailed guides are in `documentation/`: +- `model-concepts.md` — agent lifecycle and annual-cycle detail +- `configuration.md` — YAML structure, config keys, and how to write your own +- `data-pipeline.md` — how input data is prepared and loaded +- `validation-guide.md` — model validation procedures +- `cli-reference.md` — full CLI argument reference \ No newline at end of file diff --git a/README.md b/README.md index 6417b97a6..bf5c76ffe 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,33 @@ SimPaths is an open-source framework for modelling individual and household life SimPaths models currently exist for the UK, Greece, Hungary, Italy, and Poland. This page refers to the UK model; the other European models are available at the corresponding [SimPathsEU](https://github.com/centreformicrosimulation/SimPathsEU) page. -The entire SimPaths documentation is available on its [WikiPage](https://github.com/centreformicrosimulation/SimPaths/wiki), which includes: a detailed description of its building blocks; instructions on how to set up and run the model; information about contributing to the model's development. +The entire SimPaths documentation is available on its [website](https://centreformicrosimulation.github.io/SimPaths/), which includes: a detailed description of its building blocks; instructions on how to set up and run the model; information about contributing to the model's development. + +## Quick start + +### Prerequisites + +- Java 19 +- Maven 3.8+ +- Optional IDE: IntelliJ IDEA (import as a Maven project) + +### Build and run + +```bash +mvn clean package +java -jar multirun.jar -DBSetup +java -jar multirun.jar +``` + +The first command builds the JARs. The second creates the H2 donor database from the input data. The third runs the simulation using `default.yml`. + +To use a different config file: + +```bash +java -jar multirun.jar -config my_run.yml +``` + +For configuration options, see the annotated `config/default.yml`. For the data pipeline and further reference, see [`documentation/`](documentation/README.md). diff --git a/config/default.yml b/config/default.yml index 631b016c3..ca449ce17 100644 --- a/config/default.yml +++ b/config/default.yml @@ -1,89 +1,177 @@ -# This file can be used to override defaults for multirun arguments. -# Arguments of the SimPathsMultiRun object overridden by the command-line - -maxNumberOfRuns: 1 -executeWithGui: false -randomSeed: 606 -startYear: 2019 -endYear: 2022 -popSize: 50000 -# countryString: "United Kingdom" -# integrationTest: false - -# Arguments passed to the SimPathsModel +# SimPaths multi-run configuration file. +# Uncomment and edit any field to override its default value. +# CLI flags take final precedence over anything set here. + +# ── Top-level run arguments ──────────────────────────────────────────────────── + +maxNumberOfRuns: 1 # number of sequential simulation runs +executeWithGui: false # true = launch JAS-mine GUI; false = headless (required on servers/CI) +randomSeed: 606 # seed for the first run; incremented automatically if randomSeedInnov is true +startYear: 2019 # first year of simulation (must have matching input/donor data) +endYear: 2022 # last year of simulation (inclusive) +popSize: 50000 # simulated population size (larger = more accurate, slower) +# countryString: "United Kingdom" # "United Kingdom" or "Italy" (auto-detected from donor DB if omitted) +# integrationTest: false # true = write output to a fixed folder for comparison in CI tests + + +# ── model_args: passed to SimPathsModel ─────────────────────────────────────── +# All keys map directly to @GUIparameter fields on SimPathsModel. +# Values shown are the class defaults. + model_args: -# maxAge: 130 -# fixTimeTrend: true -# timeTrendStopsIn: 2017 -# timeTrendStopsInMonetaryProcesses: 2017 -# fixRandomSeed: true -# sIndexTimeWindow: 5 -# sIndexAlpha: 2 -# sIndexDelta: 0 -# savingRate: 0 -# initialisePotentialEarningsFromDatabase: true -# useWeights: false -# useSBAMMatching: -# projectMortality: true -# alignPopulation: true -# alignFertility: true -# alignEducation: false -# alignInSchool: false -# alignCohabitation: false -# labourMarketCovid19On: false -# projectFormalChildcare: true -# donorPoolAveraging: true -# alignEmployment: false -# projectSocialCare: false -# addRegressionStochasticComponent: true -# fixRegressionStochasticComponent: false -# flagSuppressChildcareCosts: false -# flagSuppressSocialCareCosts: false + + # --- Time trend controls --- +# maxAge: 130 # maximum age kept in simulation; persons above this are removed +# fixTimeTrend: true # if true, freezes the time trend in regression equations +# timeTrendStopsIn: 2017 # year at which the time trend is frozen (if fixTimeTrend: true) +# timeTrendStopsInMonetaryProcesses: 2017 # same freeze year applied to monetary/income regressions only + + # --- Random number controls --- +# fixRandomSeed: true # if true, each run uses the same fixed seed (randomSeedIfFixed) + + # --- Income security (S-Index) --- + # The S-Index is an economic (in)security index computed from a rolling window of + # equivalised consumption, discounted and weighted by a risk-aversion parameter. + # SIndex_p50 is reported in Statistics1.csv each year. +# sIndexTimeWindow: 5 # length of rolling window in years (default 5) +# sIndexAlpha: 2 # coefficient of relative risk aversion (higher = more sensitivity to drops) +# sIndexDelta: 0.98 # annual discount factor applied to past consumption observations + + # --- Savings --- +# savingRate: 0.056 # fraction of equivalised disposable income saved (used when IO is disabled); + # default is OECD average UK household saving rate 2000–2019 + + # --- Wage initialisation --- +# initialisePotentialEarningsFromDatabase: true # initialise wage potential from donor DB rather than input CSV + + # --- Population weighting --- +# useWeights: false # if true, apply survey weights in alignment and statistics calculations + + # --- Matching method --- +# useSBAMMatching: # if true, use SBAM instead of standard union-matching algorithm + + # --- Demographic projections --- +# projectMortality: true # if false, disables stochastic mortality (population does not die) + + # --- Alignment flags --- + # See model-concepts.md for a full explanation of what alignment does. +# alignPopulation: true # align age-sex-region totals to official population projections +# alignFertility: true # scale birth probabilities to match projected fertility rates +# alignEducation: false # align completed education distribution to targets +# alignInSchool: false # align school participation rate (age 16–29) to targets +# alignCohabitation: false # align share of cohabiting individuals to targets +# alignEmployment: false # align employment share to targets + + # --- Labour market modules --- +# labourMarketCovid19On: false # enable reduced-form month-by-month COVID-19 labour market module + # (applies to years 2020–2021 in the baseline parameterisation) + + # --- Social care and childcare --- +# projectFormalChildcare: true # simulate formal childcare costs +# projectSocialCare: false # simulate social care receipt and provision module +# flagSuppressChildcareCosts: false # if true, set formal childcare costs to zero (scenario use) +# flagSuppressSocialCareCosts: false # if true, set social care costs to zero (scenario use) + + # --- Tax-benefit imputation --- +# donorPoolAveraging: true # if true, average disposable income over k nearest-neighbour donors + # rather than using the single closest donor; reduces imputation volatility + + # --- Regression stochasticity --- +# addRegressionStochasticComponent: true # include the residual draw in regression predictions +# fixRegressionStochasticComponent: false # if true, draw the residual once and hold it fixed + # across years (currently applies to wage equations only) + + # --- Time-series defaults --- +# flagDefaultToTimeSeriesAverages: false # if true, use the sample average of time-series variables + # rather than the year-specific value when data is unavailable + + # --- Intertemporal optimisation (IO) --- + # Enables backward-induction life-cycle solution for consumption and labour supply. + # Decision grids are pre-computed in year 0; agents look up optimal choices each year. + # Computationally intensive — disabled by default. # enableIntertemporalOptimisations: true -# flagDefaultToTimeSeriesAverages: true -# responsesToLowWageOffer: true -# responsesToPension: false -# saveImperfectTaxDBMatches: false -# useSavedBehaviour: false -# readGrid: "laptop serial" -# saveBehaviour: true -# employmentOptionsOfPrincipalWorker: 3 -# employmentOptionsOfSecondaryWorker: 3 -# responsesToEducation: true -# responsesToRetirement: false -# responsesToHealth: true -# responsesToDisability: false -# minAgeForPoorHealth: 50 -# responsesToRegion: false -# ignoreTargetsAtPopulationLoad: false - -# Arguments that alter processing of the SimPathsMultiRun object + + # IO state-space: which characteristics agents respond to when choosing labour/consumption. + # Each flag adds a dimension to the grid and increases solve time. +# responsesToHealth: true # include physical health in IO state space +# responsesToDisability: false # include disability status in IO state space +# responsesToEducation: true # include student and education level in IO state space +# responsesToPension: false # include private pension wealth in IO state space +# responsesToRetirement: false # include retirement state (and private pension) in IO state space +# responsesToLowWageOffer: true # include unemployment/low-wage-offer risk in IO state space +# responsesToRegion: false # include geographic region in IO state space +# minAgeForPoorHealth: 45 # minimum age from which less-than-perfect health enters state space + + # IO employment options +# employmentOptionsOfPrincipalWorker: 3 # number of discrete hours options for the principal earner +# employmentOptionsOfSecondaryWorker: 3 # number of discrete hours options for the secondary earner + + # IO grid persistence — save/reuse pre-computed grids across runs +# saveBehaviour: true # save decision grids to output folder after solving +# useSavedBehaviour: false # load grids from a previous run instead of recomputing +# readGrid: "test1" # name of the run whose grids to load (must match a folder in output/) + + # IO diagnostics +# saveImperfectTaxDBMatches: false # log cases where tax-benefit donor matching falls back to a coarser regime + + # --- Population load --- +# ignoreTargetsAtPopulationLoad: false # if true, skip alignment-target checks when loading the initial population + + +# ── innovation_args: parameter variation across sequential runs ──────────────── +# These flags control how parameters change between run 0, run 1, run 2, etc. +# Useful for sensitivity analysis and uncertainty quantification. + innovation_args: -# randomSeedInnov: false -# flagDatabaseSetup: false -# intertemporalElasticityInnov: false -# labourSupplyElasticityInnov: true +# randomSeedInnov: true # if true, increment randomSeed by 1 for each successive run + # (default true — each run gets a distinct seed) +# flagDatabaseSetup: false # if true, run database setup instead of simulation + # (equivalent to -DBSetup on the command line) +# intertemporalElasticityInnov: false # if true, applies interest rate shocks across runs: + # run 1: +0.0075 (higher return to saving) + # run 2: -0.0075 (lower return to saving) + # requires maxNumberOfRuns >= 3 to see all variants +# labourSupplyElasticityInnov: false # if true, applies disposable income shocks across runs: + # run 1: +0.01 (higher net labour income) + # run 2: -0.01 (lower net labour income) + # requires maxNumberOfRuns >= 3 to see all variants + + +# ── collector_args: output collection and export ─────────────────────────────── +# Controls what SimPathsCollector writes to CSV / database each year. +# +# Output files: +# Statistics1.csv — income distribution: Gini coefficients, income percentiles, median EDI, S-Index +# Statistics2.csv — demographic validation: partnership rates, employment, health, disability by age/gender +# Statistics3.csv — alignment diagnostics: simulated vs target rates and adjustment factors +# EmploymentStatistics.csv — labour market transitions and participation rates +# HealthStatistics.csv — health measures (SF-12, GHQ-12, EQ-5D) by age/gender collector_args: -# calculateGiniCoefficients: false -# exportToDatabase: false -# exportToCSV: true -# persistStatistics: true -# persistStatistics2: true -# persistStatistics3: true -# persistPersons: false -# persistBenefitUnits: false -# persistHouseholds: false -# persistEmploymentStatistics: false -# dataDumpStartTime: 0L -# dataDumpTimePeriod: 1.0 +# calculateGiniCoefficients: false # compute Gini coefficients (also populates GUI charts); off by default for speed +# exportToDatabase: false # write outputs to H2 database (in addition to or instead of CSV) +# exportToCSV: true # write outputs to CSV files under output//csv/ +# persistStatistics: true # write Statistics1.csv (income distribution) +# persistStatistics2: true # write Statistics2.csv (demographic validation outputs) +# persistStatistics3: true # write Statistics3.csv (alignment diagnostics) +# persistPersons: false # write one row per person per year (large files) +# persistBenefitUnits: false # write one row per benefit unit per year (large files) +# persistHouseholds: false # write one row per household per year +# persistEmploymentStatistics: false # write EmploymentStatistics.csv +# dataDumpStartTime: 0L # first year to write output (0 = startYear) +# dataDumpTimePeriod: 1.0 # output frequency in years (1.0 = every year) + + +# ── parameter_args: file paths and global flags ─────────────────────────────── parameter_args: -# input_directory: input -# input_directory_initial_populations: input/InitialPopulations -# euromod_output_directory: input/EUROMODoutput -# trainingFlag: false -# includeYears: +# input_directory: input # path to input data folder +# input_directory_initial_populations: input/InitialPopulations # path to initial population CSVs +# euromod_output_directory: input/EUROMODoutput # path to EUROMOD/UKMOD output files +# trainingFlag: false # if true, use training data from input/…/training/ subfolders + # (set automatically by test configs; do not set for research runs) +# includeYears: # list of policy years for which EUROMOD donor data is available; + # only these years will be included in the donor database # - 2011 # - 2012 # - 2013 @@ -96,4 +184,4 @@ parameter_args: # - 2020 # - 2021 # - 2022 -# - 2023 \ No newline at end of file +# - 2023 diff --git a/documentation/README.md b/documentation/README.md index b36ace796..c9756cd50 100644 --- a/documentation/README.md +++ b/documentation/README.md @@ -1,34 +1,105 @@ -# SimPaths Documentation +# Data Pipeline Reference -This documentation is structured to support both first-time users and contributors. +For building and running SimPaths, see the [root README](../README.md). For the full model documentation, see the [website](https://centreformicrosimulation.github.io/SimPaths/). -## Recommended reading order +--- -1. [Getting Started](getting-started.md) -2. [CLI Reference](cli-reference.md) -3. [Configuration](configuration.md) -4. [Scenario Cookbook](scenario-cookbook.md) -5. [Data and Outputs](data-and-outputs.md) -6. [Troubleshooting](troubleshooting.md) +This section explains how the simulation-ready input files in `input/` are generated from raw survey data, and what to do if you need to update or extend them. -For contributors and advanced users: +The pipeline has three independent parts: (1) initial populations, (2) regression coefficients, (3) alignment targets. Each can be re-run separately. -- [Architecture](architecture.md) -- [Development and Testing](development.md) -- [GUI Guide](gui-guide.md) +### Data sources -## Scope +| Source | Description | Access | +|--------|-------------|--------| +| **UKHLS** (Understanding Society) | Main household panel survey; waves 1 to O (UKDA-6614-stata) | Requires EUL licence from UK Data Service | +| **BHPS** (British Household Panel Survey) | Historical predecessor to UKHLS; used for pre-2009 employment history | Bundled with UKHLS EUL | +| **WAS** (Wealth and Assets Survey) | Biennial survey of household wealth; waves 1 to 7 (UKDA-7215-stata) | Requires EUL licence from UK Data Service | +| **EUROMOD / UKMOD** | Tax-benefit microsimulation system | See [Tax-Benefit Donors (UK)](wiki/getting-started/data/tax-benefit-donors-uk.md) on the website | -These guides cover: +### Part 1 — Initial populations (`input/InitialPopulations/compile/`) -- Building SimPaths with Maven -- Running single-run and multi-run workflows -- Configuring model, collector, and runtime behavior via YAML -- Understanding expected input/output files and generated artifacts -- Running unit and integration tests locally and in CI +**What it produces:** Annual CSV files `population_initial_UK_.csv` used as the starting population for each simulation run. -## Conventions +**Master script:** `input/InitialPopulations/compile/00_master.do` -- Commands are shown from the repository root. -- Paths are relative to the repository root. -- `default.yml` refers to `config/default.yml`. +The pipeline runs in numbered stages: + +| Script | What it does | +|--------|-------------| +| `01_prepare_UKHLS_pooled_data.do` | Pools and standardises UKHLS waves | +| `02_create_UKHLS_variables.do` | Constructs all required variables (demographics, labour, health, income, wealth flags) and applies simulation-consistency rules (retirement as absorbing state, education age bounds, work/hours consistency) | +| `02_01_checks.do` | Data quality checks | +| `03_social_care_received.do` | Social care receipt variables | +| `04_social_care_provided.do` | Informal care provision variables | +| `05_create_benefit_units.do` | Groups individuals into benefit units (tax units) following UK tax-benefit rules | +| `06_reweight_and_slice.do` | Reweighting and year-specific slicing | +| `07_was_wealth_data.do` | Prepares Wealth and Assets Survey data | +| `08_wealth_to_ukhls.do` | Merges WAS wealth into UKHLS records | +| `09_finalise_input_data.do` | Final cleaning and formatting | +| `10_check_yearly_data.do` | Per-year consistency checks | +| `99_training_data.do` | Produces the de-identified training population committed to `input/InitialPopulations/training/` | + +#### Employment history sub-pipeline (`compile/do_emphist/`) + +Reconstructs each respondent's monthly employment history from January 2007 onwards by combining UKHLS and BHPS interview records. The output variable `liwwh` (months employed since Jan 2007) feeds into the labour supply models. + +| Script | Purpose | +|--------|---------| +| `00_Master_emphist.do` | Master; sets parameters and calls sub-scripts | +| `01_Intdate.do` – `07_Empcal1a.do` | Sequential stages: interview dating, BHPS linkage, employment spell reconstruction, new-entrant identification | + +### Part 2 — Regression coefficients (`input/InitialPopulations/compile/RegressionEstimates/`) + +**What it produces:** The `reg_*.xlsx` coefficient tables read by `Parameters.java` at simulation startup. + +**Master script:** `input/InitialPopulations/compile/RegressionEstimates/master.do` + +> **Note:** Income and union-formation regressions depend on predicted wages, so `reg_wages.do` must complete before `reg_income.do` and `reg_partnership.do`. All other scripts can run in any order. + +**Required Stata packages:** `fre`, `tsspell`, `carryforward`, `outreg2`, `oparallel`, `gologit2`, `winsor`, `reghdfe`, `ftools`, `require` + +| Script | Module | Method | +|--------|--------|--------| +| `reg_wages.do` | Hourly wages | Heckman selection model (males and females separately) | +| `reg_income.do` | Non-labour income | Hurdle model (selection + amount); requires predicted wages | +| `reg_partnership.do` | Partnership formation/dissolution | Probit; requires predicted wages | +| `reg_education.do` | Education transitions | Generalised ordered logit | +| `reg_fertility.do` | Fertility | Probit | +| `reg_health.do` | Physical health (SF-12 PCS) | Linear regression | +| `reg_health_mental.do` | Mental health (GHQ-12, SF-12 MCS) | Linear regression | +| `reg_health_wellbeing.do` | Life satisfaction | Linear regression | +| `reg_home_ownership.do` | Homeownership transitions | Probit | +| `reg_retirement.do` | Retirement | Probit | +| `reg_leave_parental_home.do` | Leaving parental home | Probit | +| `reg_socialcare.do` | Social care receipt and provision | Probit / ordered logit | +| `reg_unemployment.do` | Unemployment transitions | Probit | +| `reg_financial_distress.do` | Financial distress | Probit | +| `programs.do` | Shared utility programs called by the estimation scripts | — | +| `variable_update.do` | Prepares and recodes variables before estimation | — | + +After running, output Excel files are placed in `input/` (overwriting the existing `reg_*.xlsx` files). + +### Part 3 — Alignment targets (`input/DoFilesTarget/`) + +**What it produces:** The `align_*.xlsx` and `*_targets.xlsx` files that the alignment modules use to rescale simulated rates. + +| Script | Output file | +|--------|------------| +| `01_employment_shares_initpopdata.do` | `input/employment_targets.xlsx` — employment shares by benefit-unit subgroup and year | +| `01_inSchool_targets_initpopdata.do` | `input/inSchool_targets.xlsx` — school participation rates by year | +| `03_calculate_partneredShare_initialPop_BUlogic.do` | `input/partnered_share_targets.xlsx` — partnership shares by year | +| `03_calculate_partnership_target.do` | Supplementary partnership targets | +| `02_person_risk_employment_stats.do` | `employment_risk_emp_stats.csv` — person-level at-risk diagnostics used for employment alignment group construction | + +Population projection targets (`align_popProjections.xlsx`) and fertility/mortality projections (`projections_*.xlsx`) come from ONS published projections and are not generated by these scripts. + +### When to re-run each part + +| Situation | What to re-run | +|-----------|---------------| +| Adding a new data year to the simulation | Part 1 (re-slice the population for the new year) + Part 3 (update alignment targets) | +| Re-estimating a behavioural module | Part 2 (the affected `reg_*.do` script only) + Stage 1 validation | +| Updating employment alignment targets | Part 3 (`01_employment_shares_initpopdata.do`) | + +After re-running any part, re-run setup (`singlerun -Setup` or `multirun -DBSetup`) to rebuild `input/input.mv.db` before running the simulation. diff --git a/documentation/architecture.md b/documentation/architecture.md deleted file mode 100644 index a0e168edf..000000000 --- a/documentation/architecture.md +++ /dev/null @@ -1,44 +0,0 @@ -# Architecture - -## High-level module map - -Core package layout under `src/main/java/simpaths/`: - -- `experiment/`: simulation entry points and orchestration -- `model/`: core simulation entities and yearly process logic -- `data/`: parameters, setup routines, filters, statistics helpers - -## Primary entry points - -- `simpaths.experiment.SimPathsStart` - - Builds/refreshes setup artifacts - - Launches single simulation run (GUI or headless) -- `simpaths.experiment.SimPathsMultiRun` - - Loads YAML config - - Iterates runs with optional seed/innovation logic - - Supports persistence mode switching - -## Runtime managers - -The simulation engine registers: - -- `SimPathsModel`: state evolution and process scheduling -- `SimPathsCollector`: statistics computation and export -- `SimPathsObserver`: GUI observation layer (when GUI is enabled) - -## Data flow - -1. Setup stage prepares policy schedule and input database. -2. Runtime model loads parameters and input maps. -3. Collector computes and exports statistics at scheduled intervals. -4. Output files are written to run folders under `output/`. - -## Configuration flow - -`SimPathsMultiRun` combines: - -- defaults in class fields -- overrides from `config/.yml` -- final CLI overrides at invocation time - -This layered strategy supports reproducible batch runs with targeted command-line changes. diff --git a/documentation/cli-reference.md b/documentation/cli-reference.md deleted file mode 100644 index 7535ba00b..000000000 --- a/documentation/cli-reference.md +++ /dev/null @@ -1,90 +0,0 @@ -# CLI Reference - -## `singlerun.jar` (`SimPathsStart`) - -Usage: - -```bash -java -jar singlerun.jar [options] -``` - -### Options - -| Option | Meaning | -|---|---| -| `-c`, `--country ` | Country code (`UK` or `IT`) | -| `-s`, `--startYear ` | Simulation start year | -| `-Setup` | Setup only (do not run simulation) | -| `-Run` | Run only (skip setup) | -| `-r`, `--rewrite-policy-schedule` | Rebuild policy schedule from policy files | -| `-g`, `--showGui ` | Enable or disable GUI | -| `-h`, `--help` | Print help | - -Notes: - -- `-Setup` and `-Run` are mutually exclusive. -- For non-GUI environments, use `-g false`. - -### Examples - -Setup only: - -```bash -java -jar singlerun.jar -c UK -s 2019 -g false -Setup --rewrite-policy-schedule -``` - -Run only (after setup exists): - -```bash -java -jar singlerun.jar -g false -Run -``` - -## `multirun.jar` (`SimPathsMultiRun`) - -Usage: - -```bash -java -jar multirun.jar [options] -``` - -### Options - -| Option | Meaning | -|---|---| -| `-p`, `--popSize ` | Simulated population size | -| `-s`, `--startYear ` | Start year | -| `-e`, `--endYear ` | End year | -| `-DBSetup` | Database setup mode | -| `-n`, `--maxNumberOfRuns ` | Number of sequential runs | -| `-r`, `--randomSeed ` | Seed for first run | -| `-g`, `--executeWithGui ` | Enable or disable GUI | -| `-config ` | Config file in `config/` (default: `default.yml`) | -| `-f` | Write stdout and logs to `output/logs/` | -| `-P`, `--persist ` | Persistence strategy for processed dataset | -| `-h`, `--help` | Print help | - -Persistence modes: - -- `root` (default): persist to root input area for reuse -- `run`: persist per run output folder -- `none`: no processed-data persistence - -### Examples - -Create setup database using config: - -```bash -java -jar multirun.jar -DBSetup -config test_create_database.yml -``` - -Run two simulations with root persistence: - -```bash -java -jar multirun.jar -config test_run.yml -P root -``` - -Run without persistence and with file logging: - -```bash -java -jar multirun.jar -config default.yml -P none -f -``` diff --git a/documentation/configuration.md b/documentation/configuration.md deleted file mode 100644 index 4e8a1426a..000000000 --- a/documentation/configuration.md +++ /dev/null @@ -1,101 +0,0 @@ -# Configuration - -SimPaths multi-run behavior is controlled by YAML files in `config/`. - -Examples in this repository include: - -- `default.yml` -- `test_create_database.yml` -- `test_run.yml` -- `create database.yml` -- `sc analysis*.yml` -- `intertemporal elasticity.yml` -- `labour supply elasticity.yml` - -For command-by-command guidance for each provided config, see [Scenario Cookbook](scenario-cookbook.md). - -## How config is applied - -`SimPathsMultiRun` loads `config/` and applies values in two stages: - -1. YAML values initialize runtime fields and argument maps. -2. CLI flags override those values if provided. - -## Top-level keys - -### Core run arguments - -Common fields: - -- `countryString` -- `maxNumberOfRuns` -- `executeWithGui` -- `randomSeed` -- `startYear` -- `endYear` -- `popSize` -- `integrationTest` - -### `model_args` - -Passed into `SimPathsModel` via reflection. - -Typical toggles include: - -- alignment flags (`alignPopulation`, `alignFertility`, `alignEmployment`, ...) -- behavioral switches (`enableIntertemporalOptimisations`, `responsesToHealth`, ...) -- persistence of behavioral grids (`saveBehaviour`, `useSavedBehaviour`, `readGrid`) - -### `collector_args` - -Controls output collection and export behavior (via `SimPathsCollector`), including: - -- `persistStatistics`, `persistStatistics2`, `persistStatistics3` -- `persistPersons`, `persistBenefitUnits`, `persistHouseholds` -- `exportToCSV`, `exportToDatabase` - -### `innovation_args` - -Controls iteration logic across runs, such as: - -- `randomSeedInnov` -- `intertemporalElasticityInnov` -- `labourSupplyElasticityInnov` -- `flagDatabaseSetup` - -### `parameter_args` - -Overrides values from `Parameters` (paths and model-global flags). - -Common examples: - -- `trainingFlag` -- `working_directory` -- `input_directory` -- `input_directory_initial_populations` -- `euromod_output_directory` - -## Minimal example - -```yaml -maxNumberOfRuns: 2 -executeWithGui: false -randomSeed: 100 -startYear: 2019 -endYear: 2022 -popSize: 20000 - -collector_args: - persistStatistics: true - persistStatistics2: true - persistStatistics3: true - persistPersons: false - persistBenefitUnits: false - persistHouseholds: false -``` - -Run it: - -```bash -java -jar multirun.jar -config test_run.yml -``` diff --git a/documentation/data-and-outputs.md b/documentation/data-and-outputs.md deleted file mode 100644 index 0e7ef0d13..000000000 --- a/documentation/data-and-outputs.md +++ /dev/null @@ -1,56 +0,0 @@ -# Data and Outputs - -## Data availability model - -- Source code and documentation are open. -- Full research input datasets are not freely redistributable. -- Training data is included to support development, local testing, and CI. - -## Input directory layout - -Key paths: - -- `input/`: - - regression and scenario Excel files (`reg_*.xlsx`, `scenario_*.xlsx`, `align_*.xlsx`) - - generated setup files (`input.mv.db`, `EUROMODpolicySchedule.xlsx`, `DatabaseCountryYear.xlsx`) -- `input/InitialPopulations/`: - - `training/population_initial_UK_2019.csv` - - `compile/` scripts for preparing initial-population inputs -- `input/EUROMODoutput/`: - - `training/*.txt` policy outputs and schedule artifacts - -## Setup-generated artifacts - -Running setup mode (`singlerun` setup or `multirun -DBSetup`) creates or refreshes: - -- `input/input.mv.db` -- `input/EUROMODpolicySchedule.xlsx` -- `input/DatabaseCountryYear.xlsx` - -## Output directory layout - -Simulation runs produce timestamped folders under `output/`, typically with: - -- `csv/` generated statistics and exported entities -- `database/` run-specific persistence output -- `input/` copied or persisted run input artifacts - -Common CSV files include: - -- `Statistics1.csv` -- `Statistics21.csv` -- `Statistics31.csv` -- `EmploymentStatistics1.csv` -- `HealthStatistics1.csv` - -## Logging output - -If `-f` is enabled with `multirun.jar`, logs are written to: - -- `output/logs/run_.txt` (stdout capture) -- `output/logs/run_.log` (log4j output) - -## Validation and analysis assets - -- `validation/` contains validation artifacts and graph assets. -- `analysis/` contains `.do` scripts and spreadsheets used for downstream analysis. diff --git a/documentation/development.md b/documentation/development.md deleted file mode 100644 index c5f5c4da9..000000000 --- a/documentation/development.md +++ /dev/null @@ -1,61 +0,0 @@ -# Development and Testing - -## Build - -Compile and package: - -```bash -mvn clean package -``` - -## Tests - -### Unit tests - -Run unit tests (Surefire): - -```bash -mvn test -``` - -### Integration tests - -Run integration tests (Failsafe): - -```bash -mvn verify -``` - -Integration tests exercise setup and run flows and compare generated CSV outputs to expected files in: - -- `src/test/java/simpaths/integrationtest/expected/` - -## CI workflows - -GitHub workflows in `.github/workflows/` run: - -- build and package on pull requests to `main` and `develop` -- integration tests (`mvn verify`) -- smoke runs for `singlerun.jar` and `multirun.jar` with persistence variants -- Javadoc generation and publish (on `develop` pushes) - -## Javadoc - -Generate locally: - -```bash -mvn javadoc:javadoc -``` - -## Typical contributor flow - -1. Create a feature branch in your fork. -2. Implement and test changes. -3. Run `mvn verify` before opening a PR. -4. Open a PR against `develop` (or `main` for stable fixes, when appropriate). - -## Debugging tips - -- Use `-g false` on headless systems. -- Use `-f` with `multirun.jar` to capture logs in `output/logs/`. -- Start from `config/test_create_database.yml` and `config/test_run.yml` when reproducing CI behavior. diff --git a/documentation/getting-started.md b/documentation/getting-started.md deleted file mode 100644 index 6a93e977d..000000000 --- a/documentation/getting-started.md +++ /dev/null @@ -1,65 +0,0 @@ -# Getting Started - -## Prerequisites - -- Java 19 -- Maven 3.8+ -- Optional IDE: IntelliJ IDEA (import as a Maven project) - -## Build - -From repository root: - -```bash -mvn clean package -``` - -Artifacts produced at the root: - -- `singlerun.jar` -- `multirun.jar` - -## Understand run modes - -SimPaths supports two entry points: - -- `singlerun.jar` (`SimPathsStart`): setup and single simulation execution -- `multirun.jar` (`SimPathsMultiRun`): repeated runs across seeds/scenarios - -## First run (headless) - -### Step 1: setup input artifacts - -```bash -java -jar singlerun.jar -c UK -s 2019 -g false -Setup --rewrite-policy-schedule -``` - -This prepares required setup files such as: - -- `input/input.mv.db` -- `input/EUROMODpolicySchedule.xlsx` -- `input/DatabaseCountryYear.xlsx` - -### Step 2: execute a multi-run configuration - -```bash -java -jar multirun.jar -config default.yml -g false -``` - -Results are written under `output//`. - -## Training vs full data mode - -- The repository includes training data under: - - `input/InitialPopulations/training/` - - `input/EUROMODoutput/training/` -- If no initial-population CSV files are found in the main input location, SimPaths automatically switches to training mode. -- Training mode supports development and CI, but is not intended for research interpretation. - -## GUI usage - -Use `-g true` (default behavior in several flows) to run with GUI components. - -In headless/remote environments, set `-g false`. - -See [GUI Guide](gui-guide.md) for screenshots. diff --git a/documentation/gui-guide.md b/documentation/gui-guide.md deleted file mode 100644 index 40ad53d96..000000000 --- a/documentation/gui-guide.md +++ /dev/null @@ -1,51 +0,0 @@ -# GUI Guide - -The GUI is available in single-run and multi-run workflows when enabled. - -## Enable GUI - -Single run: - -```bash -java -jar singlerun.jar -g true -``` - -Multi run: - -```bash -java -jar multirun.jar -config default.yml -g true -``` - -## Screenshots - -Main GUI: - -![SimPaths GUI](figures/SimPaths%20GUI.png) - -Control buttons: - -![SimPaths Buttons](figures/SimPaths-Buttons.png) - -Parameter selection: - -![SimPaths Parameters](figures/SimPaths%20parameters.png) - -Charts overview: - -![Charts](figures/Charts.png) - -Chart properties: - -![Chart Properties](figures/Chart%20Properties.png) - -Chart zoom example: - -![Chart Zoom](figures/SimPaths-Chart-Zoom.png) - -Output stream panel: - -![Output Stream](figures/Output%20stream.png) - -## Headless note - -In remote servers or CI, run with `-g false`. diff --git a/documentation/scenario-cookbook.md b/documentation/scenario-cookbook.md deleted file mode 100644 index 1d8576068..000000000 --- a/documentation/scenario-cookbook.md +++ /dev/null @@ -1,171 +0,0 @@ -# Scenario Cookbook - -This guide maps every provided YAML scenario in `config/` to its intended use. - -All commands below assume you are running from repository root after building jars. - -## Baseline and testing scenarios - -### `default.yml` - -Use when you want the standard baseline run with conservative defaults. - -Command: - -```bash -java -jar multirun.jar -config default.yml -g false -``` - -### `test_create_database.yml` - -Use for test-oriented database setup with training data (`trainingFlag: true`). - -Command: - -```bash -java -jar multirun.jar -DBSetup -config test_create_database.yml -``` - -### `test_run.yml` - -Use for integration-style short runs (2 runs, test settings). - -Command: - -```bash -java -jar multirun.jar -config test_run.yml -P root -``` - -### `programming test.yml` - -Use for quick developer smoke runs with smaller population and simplified behavior flags. - -Command: - -```bash -java -jar multirun.jar -config "programming test.yml" -g false -``` - -## Setup-focused scenario - -### `create database.yml` - -Use to build a full database object set for UK long-horizon work. This file sets `flagDatabaseSetup: true` in `innovation_args`, so it runs setup mode. - -Command: - -```bash -java -jar multirun.jar -config "create database.yml" -``` - -## Sensitivity and robustness scenarios - -### `random seed.yml` - -Use to run multiple replications with random-seed iteration enabled. - -Command: - -```bash -java -jar multirun.jar -config "random seed.yml" -g false -``` - -### `intertemporal elasticity.yml` - -Use for intertemporal elasticity sensitivity (3 runs with interest-rate innovation pattern). - -Command: - -```bash -java -jar multirun.jar -config "intertemporal elasticity.yml" -g false -``` - -### `labour supply elasticity.yml` - -Use for labour-supply elasticity sensitivity (3 runs with labour-income innovation pattern). - -Command: - -```bash -java -jar multirun.jar -config "labour supply elasticity.yml" -g false -``` - -## Targeted output scenarios - -### `employmentTransStats.yml` - -Use when you mainly want employment transition statistics and minimal other persisted outputs. - -Command: - -```bash -java -jar multirun.jar -config employmentTransStats.yml -g false -``` - -## Social care scenario family - -### `sc calibration.yml` - -Use to calibrate preference parameters for social care analysis. - -Command: - -```bash -java -jar multirun.jar -config "sc calibration.yml" -g false -``` - -### `sc analysis0.yml` - -Base social care analysis run with social care enabled and alignment on. - -Command: - -```bash -java -jar multirun.jar -config "sc analysis0.yml" -g false -``` - -### `sc analysis1.yml` - -Main social care analysis run with named behavioral grid output (`saveBehaviour: true`, `readGrid: "sc analysis1"`). - -Command: - -```bash -java -jar multirun.jar -config "sc analysis1.yml" -g false -``` - -### `sc analysis1b.yml` - -Variant of analysis1 with `alignPopulation: false` and `useSavedBehaviour: true` for comparison. - -Command: - -```bash -java -jar multirun.jar -config "sc analysis1b.yml" -g false -``` - -### `sc analysis2.yml` - -Zero-costs social care scenario (`flagSuppressChildcareCosts: true`, `flagSuppressSocialCareCosts: true`). - -Command: - -```bash -java -jar multirun.jar -config "sc analysis2.yml" -g false -``` - -### `sc analysis3.yml` - -Ignore-costs response scenario that reuses behavior from analysis2 (`useSavedBehaviour: true`, `readGrid: "sc analysis2"`). - -Command: - -```bash -java -jar multirun.jar -config "sc analysis3.yml" -g false -``` - -## Practical notes - -- Use quotes around config filenames that contain spaces. -- Add `-f` to write run logs to `output/logs/`. -- Override config values via CLI flags when needed (for example `-n`, `-r`, `-P`, `-g`). diff --git a/documentation/troubleshooting.md b/documentation/troubleshooting.md deleted file mode 100644 index d9e69082c..000000000 --- a/documentation/troubleshooting.md +++ /dev/null @@ -1,83 +0,0 @@ -# Troubleshooting - -## `Config file not found` - -Cause: - -- `-config` points to a file not present in `config/`. - -Fix: - -- Verify filename and extension. -- Example: - -```bash -java -jar multirun.jar -config default.yml -``` - -## Missing `EUROMODpolicySchedule.xlsx` - -Cause: - -- Setup has not generated schedule files yet. - -Fix: - -- Re-run setup with rewrite enabled: - -```bash -java -jar singlerun.jar -c UK -s 2019 -g false --rewrite-policy-schedule -Setup -``` - -## GUI errors on server or CI - -Cause: - -- Running GUI mode in headless environment. - -Fix: - -- Disable GUI: - -```bash --g false -``` - -## Start year rejected or inconsistent - -Cause: - -- Chosen year is outside available input/training data bounds. - -Fix: - -- Use a year covered by available input files. -- For training-only mode, use the provided training start year (2019 in this repository setup). - -## Expected CSV files not found after run - -Cause: - -- Collector settings disabled certain exports. -- Run failed before collector dump phase. - -Fix: - -- Check `collector_args` in YAML. -- Re-run with `-f` and inspect `output/logs/run_.txt` and `.log`. - -## Integration test output mismatch - -Cause: - -- Simulation behavior changed or output schema changed. - -Fix: - -1. Confirm differences are intended. -2. Replace expected files in `src/test/java/simpaths/integrationtest/expected/` with verified new outputs. -3. Re-run: - -```bash -mvn verify -``` diff --git a/documentation/wiki/assets/css/extra.css b/documentation/wiki/assets/css/extra.css index 04fef6eca..94ee99161 100644 --- a/documentation/wiki/assets/css/extra.css +++ b/documentation/wiki/assets/css/extra.css @@ -190,10 +190,6 @@ CONTENT — TYPOGRAPHY ═══════════════════════════════════════════════ */ -.md-content { - max-width: 820px; -} - .md-typeset { font-size: 0.82rem; line-height: 1.6; @@ -203,6 +199,7 @@ font-weight: 600; font-size: 1.45rem; letter-spacing: -0.015em; + color: var(--sp-primary); border-bottom: 2px solid transparent; border-image: var(--sp-gradient) 1; padding-bottom: 0.3rem; @@ -215,7 +212,7 @@ letter-spacing: -0.01em; margin-top: 1.4rem; margin-bottom: 0.45rem; - color: var(--md-default-fg-color); + color: var(--sp-primary); } .md-typeset h3 { @@ -223,7 +220,13 @@ font-size: 1.1rem; margin-top: 1rem; margin-bottom: 0.3rem; - color: var(--md-default-fg-color); + color: var(--sp-primary); +} + +[data-md-color-scheme="slate"] .md-typeset h1, +[data-md-color-scheme="slate"] .md-typeset h2, +[data-md-color-scheme="slate"] .md-typeset h3 { + color: #a8c8e8; } /* Paragraph justification */ diff --git a/documentation/wiki/developer-guide/internals/file-organisation.md b/documentation/wiki/developer-guide/internals/file-organisation.md index 742a1ef3a..0720bb9d1 100644 --- a/documentation/wiki/developer-guide/internals/file-organisation.md +++ b/documentation/wiki/developer-guide/internals/file-organisation.md @@ -1,5 +1,142 @@ # File Organisation -!!! warning "In progress" - This page is under development. Contributions welcome — - see the [Developer Guide](../index.md) for how to contribute. +This page describes the directory and package layout of the SimPaths repository. For the generic JAS-mine project structure, see [Project Structure](../jasmine/project-structure.md). + +# Repository Structure + +``` +SimPaths/ +├── config/ # YAML configuration files for simulation runs +│ ├── default.yml # Default simulation parameters (fully annotated) +│ ├── test_create_database.yml # Database creation config (CI) +│ └── test_run.yml # Test run config (CI) +│ +├── documentation/ # Quick-reference docs (this folder) +│ ├── wiki/ # Website source (model description, guides, research) +│ ├── SimPaths_Variable_Codebook.xlsx # Variable definitions for output CSVs +│ ├── SimPaths Stata Parameters.xlsx # Parameter comparison: Stata do-files vs Java +│ └── SimPathsUK_Schedule.xlsx # Event schedule with corresponding Java classes +│ +├── input/ # Input data and parameters +│ ├── InitialPopulations/ +│ │ ├── training/ # De-identified training population (included in repo) +│ │ └── compile/ # Stata pipeline: builds populations, estimates regressions +│ │ ├── do_emphist/ # Employment history reconstruction sub-pipeline +│ │ └── RegressionEstimates/ # Regression coefficient estimation scripts +│ ├── EUROMODoutput/ +│ │ └── training/ # Training UKMOD outputs (included in repo) +│ ├── DoFilesTarget/ # Stata scripts that generate alignment targets +│ ├── reg_*.xlsx # Regression coefficient tables +│ ├── align_*.xlsx # Alignment targets +│ ├── projections_*.xlsx # ONS demographic projections +│ ├── scenario_*.xlsx # Scenario-specific parameter overrides +│ ├── policy parameters.xlsx # Tax-benefit policy parameters +│ ├── validation_statistics.xlsx # Validation targets +│ ├── input.mv.db # H2 donor database (generated by setup) +│ ├── EUROMODpolicySchedule.xlsx # Policy year mapping (generated by setup) +│ └── DatabaseCountryYear.xlsx # Macro parameters (generated by setup) +│ +├── output/ # Simulation outputs (created at runtime) +│ └── / +│ ├── csv/ +│ │ ├── Statistics1.csv # Income distribution, Gini, S-Index +│ │ ├── Statistics2.csv # Demographics by age and gender +│ │ ├── Statistics3.csv # Alignment diagnostics +│ │ ├── Person.csv # Person-level output +│ │ ├── BenefitUnit.csv # Benefit-unit-level output +│ │ └── Household.csv # Household-level output +│ ├── database/ # Run-specific persistence output +│ └── input/ # Copied run input artifacts +│ +├── src/ +│ ├── main/java/simpaths/ +│ │ ├── data/ # Parameters, input parsing, filters, statistics +│ │ ├── experiment/ # Entry points: SimPathsStart, SimPathsMultiRun, +│ │ │ # SimPathsCollector, SimPathsObserver +│ │ └── model/ # Core simulation: Person, BenefitUnit, Household, +│ │ ├── decisions/ # intertemporal optimisation grids +│ │ ├── enums/ # categorical variable definitions +│ │ ├── taxes/ # EUROMOD donor matching +│ │ └── lifetime_incomes/ # synthetic income trajectory generation +│ └── test/java/simpaths/ # Unit and integration tests +│ +├── validation/ # Stata validation scripts and reference graphs +│ ├── 01_estimate_validation/ # Predicted vs observed for each regression module +│ └── 02_simulated_output_validation/ # Simulated output vs UKHLS survey data +│ +├── pom.xml # Maven build configuration +├── singlerun.jar # Single-run executable +└── multirun.jar # Multi-run executable +``` + + +## Sub-package detail + +The following sub-packages are self-contained subsystems whose internals are not obvious from the class names alone. + +### `model/decisions/` — IO engine + +When IO is enabled, computing optimal consumption–labour choices for every agent at every time step would be prohibitively slow. This package solves the problem once before the simulation runs: it constructs a grid covering all meaningful combinations of state variables (wealth, age, health, family status, etc.), then works backwards from the end of life to find the optimal choice at each grid point (backward induction). During the simulation, agents simply look up their current state in the pre-computed grid. + +| Class | Purpose | +| --- | --- | +| `DecisionParams` | Defines the state-space dimensions and grid parameters for the optimisation problem. | +| `ManagerPopulateGrids` | Populates the state-space grid points and evaluates value functions by backward induction. | +| `ManagerSolveGrids` | Solves for optimal policy at each grid point. | +| `ManagerFileGrids` | Reads and writes pre-computed grids to disk, so they can be reused across runs. | +| `Grids` | Container for the set of solved decision grids. | +| `States` | Enumerates the state variables that define each grid point. | +| `Expectations` / `LocalExpectations` | Computes expected future values over stochastic transitions. | +| `CESUtility` | CES utility function used in the optimisation. | + +### `model/taxes/` — EUROMOD donor matching + +Imputes taxes and benefits onto simulated benefit units by matching them to pre-computed EUROMOD donor records. + +| Class | Purpose | +| --- | --- | +| `DonorTaxImputation` | Main entry point. Implements the three-step matching process: coarse-exact matching on characteristics, income proximity filtering, and candidate selection/averaging. | +| `KeyFunction` / `KeyFunction1`–`4` | Four progressively relaxed matching-key definitions. The system tries the tightest key first and falls back through wider keys if no donors are found. | +| `DonorKeys` | Builds composite matching keys from benefit-unit characteristics. | +| `DonorTaxUnit` / `DonorPerson` | Represent the pre-computed EUROMOD donor records loaded from the database. | +| `CandidateList` | Ranked list of donor matches for a given benefit unit, sorted by income proximity. | +| `Match` / `Matches` | Store the final selected donor(s) and their imputed tax-benefit values. | + +The `taxes/database/` sub-package handles loading donor data from the H2 database into memory (`TaxDonorDataParser`, `DatabaseExtension`, `MatchIndices`). + +### `model/lifetime_incomes/` — synthetic income trajectories + +When IO is enabled, this package creates projected income paths for birth cohorts using an AR(2) process anchored to age-gender geometric means, and matches simulated persons to donor income profiles. + +| Class | Purpose | +| --- | --- | +| `ManagerProjectLifetimeIncomes` | Generates the synthetic income trajectory database for all birth cohorts in the simulation horizon. | +| `LifetimeIncomeImputation` | Matches each simulated person to a donor income trajectory via binary search on the income CDF. | +| `AnnualIncome` | Implements the AR(2) income process with age-gender anchoring. | +| `BirthCohort` | Groups individuals by birth year for cohort-level income projection. | +| `Individual` | Entity carrying age dummies and log GDP per capita for income regression. | + +CSV filenames follow the pattern `.csv`. With a single run the suffix is `1`; with multiple runs each run produces its own numbered file. + +For a description of the variables in output CSV files, see `documentation/SimPaths_Variable_Codebook.xlsx`. For a description of each `reg_*`, `align_*`, and `scenario_*` input file, see [Model Parameterisation](../documentation/wiki/overview/parameterisation.md) on the website. + +## Setup-generated artifacts + +Running setup (`multirun -DBSetup`) creates or refreshes three files in `input/`: + +- `input.mv.db` — H2 database of EUROMOD donor tax-benefit outcomes +- `EUROMODpolicySchedule.xlsx` — maps simulation years to EUROMOD policy systems +- `DatabaseCountryYear.xlsx` — year-specific macro parameters + +These must exist before any simulation run. If they are missing, re-run setup. + +## Training mode + +The repository includes de-identified training data under `input/InitialPopulations/training/` and `input/EUROMODoutput/training/`. If no initial-population CSV files are found in the main input location, SimPaths automatically switches to training mode. Training mode supports development and CI but is not intended for research interpretation. + +## Logging + +With `-f` on `multirun.jar`, logs are written to `output/logs/run_.txt` (stdout) and `output/logs/run_.log` (log4j). + +--- + diff --git a/documentation/wiki/getting-started/environment-setup.md b/documentation/wiki/getting-started/environment-setup.md index d121cab7c..e3731590a 100644 --- a/documentation/wiki/getting-started/environment-setup.md +++ b/documentation/wiki/getting-started/environment-setup.md @@ -6,8 +6,8 @@ ## Requirements -- Java Development Kit (JDK) 11 or later -- Apache Maven 3.6 or later +- Java Development Kit (JDK) 19 (the project targets Java 19 — earlier versions will not compile) +- Apache Maven 3.8 or later - Git ## Cloning the repository @@ -20,7 +20,9 @@ cd SimPaths ## Building the project ```bash -mvn clean install -DskipTests +mvn clean package ``` +This produces `singlerun.jar` and `multirun.jar` at the repository root. + Refer to the [Working in GitHub](../developer-guide/working-in-github.md) guide for the full development workflow. diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 000000000..31873be04 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,157 @@ +site_name: SimPaths Documentation +site_description: >- + An open-source microsimulation framework for modelling individual + and household life course events across the UK and Europe. +site_url: https://centreformicrosimulation.github.io/SimPaths/ +repo_url: https://github.com/centreformicrosimulation/SimPaths +repo_name: centreformicrosimulation/SimPaths + +docs_dir: documentation/wiki + +copyright: >- + Copyright © Matteo Richiardi, Patryk Bronka, Justin van de Ven — + Centre for Microsimulation and Policy Analysis + +theme: + name: material + palette: + - scheme: default + primary: custom + accent: custom + toggle: + icon: material/weather-night + name: Switch to dark mode + - scheme: slate + primary: custom + accent: custom + toggle: + icon: material/weather-sunny + name: Switch to light mode + font: false + icon: + repo: fontawesome/brands/github + logo: material/chart-timeline-variant-shimmer + features: + - navigation.tabs + - navigation.tabs.sticky + - navigation.sections + - navigation.indexes + - navigation.top + - navigation.footer + - navigation.tracking + - toc.follow + - search.suggest + - search.highlight + - search.share + - content.code.copy + - content.code.annotate + +extra_css: + - assets/css/extra.css + +plugins: + - search: + lang: en + +markdown_extensions: + - admonition + - pymdownx.details + - pymdownx.superfences + - pymdownx.tabbed: + alternate_style: true + - pymdownx.highlight: + anchor_linenums: true + line_spans: __span + pygments_lang_class: true + - pymdownx.inlinehilite + - pymdownx.snippets + - pymdownx.emoji: + emoji_index: !!python/name:material.extensions.emoji.twemoji + emoji_generator: !!python/name:material.extensions.emoji.to_svg + - toc: + permalink: true + toc_depth: 3 + - attr_list + - md_in_html + - tables + - footnotes + - def_list + +extra: + social: + - icon: fontawesome/brands/github + link: https://github.com/centreformicrosimulation/SimPaths + name: GitHub + - icon: fontawesome/solid/globe + link: https://www.microsimulation.ac.uk/ + name: Centre for Microsimulation and Policy Analysis + +nav: + - Home: index.md + + - Overview: + - overview/index.md + - Model Description: overview/model-description.md + - Simulated Modules: overview/simulated-modules.md + - Model Parameterisation: overview/parameterisation.md + - Country Variants: overview/country-variants.md + - How to Cite: overview/how-to-cite.md + + - Getting Started: + - getting-started/index.md + - Environment Setup: getting-started/environment-setup.md + - Input Data: + - getting-started/data/index.md + - Initial Population (UK): getting-started/data/initial-population-uk.md + - Tax-Benefit Donors (UK): getting-started/data/tax-benefit-donors-uk.md + - Running Your First Simulation: getting-started/first-simulation.md + - Video Tutorials: getting-started/video-tutorials.md + + - User Guide: + - user-guide/index.md + - Single Runs: user-guide/single-runs.md + - Multiple Runs: user-guide/multiple-runs.md + - Graphical User Interface: user-guide/gui.md + - Modifying Parameters: user-guide/modifying-parameters.md + - Modifying Tax-Benefit Settings: user-guide/tax-benefit-parameters.md + - Uncertainty Analysis: user-guide/uncertainty-analysis.md + + - Developer Guide: + - developer-guide/index.md + - Working in GitHub: developer-guide/working-in-github.md + - JAS-mine Architecture: + - developer-guide/jasmine/index.md + - Project Structure: developer-guide/jasmine/project-structure.md + - The Model and the Schedule: developer-guide/jasmine/model-and-schedule.md + - The Start Class: developer-guide/jasmine/start-class.md + - The MultiRun Class: developer-guide/jasmine/multirun-class.md + - Updating JAS-mine: developer-guide/jasmine/updating-jasmine.md + - SimPaths Internals: + - developer-guide/internals/index.md + - SimPaths API: developer-guide/internals/api.md + - File Organisation: developer-guide/internals/file-organisation.md + - The SimPathsModel Class: developer-guide/internals/simpaths-model.md + - Start Class Implementation: developer-guide/internals/start-class-implementation.md + - MultiRun Implementation: developer-guide/internals/multirun-implementation.md + - How-To Guides: + - developer-guide/how-to/index.md + - Introduce a New Variable: developer-guide/how-to/new-variable.md + - Add Parameters to the GUI: developer-guide/how-to/add-gui-parameters.md + - Perform MultiRun Simulations: developer-guide/how-to/multirun-simulations.md + - JAS-mine Reference: + - jasmine-reference/index.md + - Statistical Package: jasmine-reference/statistical-package.md + - Collection Filters: jasmine-reference/collection-filters.md + - Alignment Library: jasmine-reference/alignment-library.md + - Matching Library: jasmine-reference/matching-library.md + - Regression Library: jasmine-reference/regression-library.md + - Saving Outputs: jasmine-reference/saving-outputs.md + - Querying the Database: jasmine-reference/querying-database.md + - Links and Resources: jasmine-reference/links.md + - Enums: jasmine-reference/enums.md + + - Model Validation: + - validation/index.md + + - Research: + - research/index.md