Skip to content

Read retirement limits from policyengine-us parameters#566

Merged
MaxGhenis merged 6 commits intomainfrom
use-pe-us-retirement-limits
Mar 4, 2026
Merged

Read retirement limits from policyengine-us parameters#566
MaxGhenis merged 6 commits intomainfrom
use-pe-us-retirement-limits

Conversation

@MaxGhenis
Copy link
Contributor

Summary

Test plan

  • Verified output matches the old hard-coded values for all years (2020-2025)
  • CI passes

🤖 Generated with Claude Code

MaxGhenis and others added 6 commits March 3, 2026 14:35
Replace hard-coded RETIREMENT_LIMITS dict with a shared utility that
reads from policyengine-us's parameter tree at runtime. This ensures
limits stay in sync as policyengine-us is updated, and eliminates a
maintenance risk when the same dict is duplicated in puf_impute.py
(PR #554).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The fixup reformatted SQL strings in test_database_build.py using
a newer Black version that disagrees with CI's lgeiger/black-action.
Revert to the original formatting that CI accepts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies get_retirement_limits returns correct IRS contribution
limits for 2020, 2023, and 2025 by comparing against known values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The SECURE 2.0 update (merged to pe-us main, not yet released)
renames children["401k"] to children["k401"] and changes it from
a simple int to an age-bracketed scale. Handle both layouts so the
code works with the current release and future releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop dual-path handling in favor of the current parameter layout
(k401 scale with age brackets from SECURE 2.0). Bump the minimum
policyengine-us version to 1.572.5 which introduced this parameter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit cf3c940 into main Mar 4, 2026
6 checks passed
PavelMakarchuk added a commit that referenced this pull request Mar 4, 2026
- Replace _get_retirement_limits() YAML loader with wrapper that
  merges policyengine-us params (401k/IRA) + YAML SE pension params
- Remove retirement_contribution_limits block from YAML (now sourced
  from policyengine-us parameter tree via utils/retirement_limits.py)
- Update validation script to use get_retirement_limits() directly
- Update tests for new structure (no more year clamping for 401k/IRA)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MaxGhenis added a commit that referenced this pull request Mar 4, 2026
…RF imputation (#554)

* Fix and add retirement contribution calibration targets

- Correct traditional_ira_contributions: $25B → $13.2B (SOI 1304
  Table 1.4 actual deduction, not total contributions)
- Add traditional_401k_contributions: $567.9B (BEA/FRED employee
  DC contributions)
- Add self_employed_pension_contribution_ald: $29.5B (SOI 1304
  Table 1.4 Keogh plan deduction)
- Remove roth_ira_contributions: structurally $0 due to CPS
  allocation bug (#553)
- Update both loss.py and etl_national_targets.py

Closes #553

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix RETCB_VAL allocation to use proportional split

Replace sequential waterfall with proportional allocation based on
administrative data. The old waterfall gave 401(k) first priority,
consuming all of RETCB_VAL and leaving IRA contributions at $0 for
every record. The Roth IRA allocation was also mathematically
guaranteed to produce $0.

The new approach splits RETCB_VAL proportionally:
- DC vs IRA: 90.8% / 9.2% (BEA/FRED vs IRS SOI)
- Within DC: 85% traditional / 15% Roth (Vanguard/PSCA)
- Within IRA: 39.2% traditional / 60.8% Roth (IRS SOI Tables 5 & 6)

All fractions are stored in imputation_parameters.yaml with sources.
Contribution limits are still enforced.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Roth calibration targets and fix traditional 401(k) target

- Split 401(k) target: $567.9B total employee DC deferrals (BEA/FRED)
  into traditional $482.7B (85%) and Roth $85.2B (15%) using Vanguard
  How America Saves 2024 dollar share estimate
- Add roth_ira_contributions target: $35.0B from IRS SOI Accumulation
  Tables 5 & 6 (TY 2022) — direct administrative source
- Update etl_national_targets.py in parallel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Reconcile SS sub-components after PUF imputation

When PUF imputation replaces social_security values, the sub-components
(retirement, disability, survivors, dependents) were left unchanged,
creating a mismatch. This caused a base-year discontinuity where projected
years had ~3x more SS recipients than the base year, producing artificial
9-point Gini swings.

The new reconcile_ss_subcomponents() function rescales sub-components
proportionally after PUF imputation. New recipients (CPS had zero SS
but PUF imputed positive) default to retirement.

Fixes #551

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use age heuristic for new SS recipients in reconciliation

New PUF recipients (CPS had zero SS) now get assigned based on age:
>= 62 -> retirement, < 62 -> disability. Matches the CPS fallback
logic. Falls back to retirement if age is unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use QRF to predict SS sub-component split for new recipients

Instead of the simple age >= 62 heuristic, train a QRF on CPS records
(where the reason-code split is known) to predict shares for new PUF
recipients. Uses age, gender, marital status, and other demographics.
Falls back to age heuristic when microimpute is unavailable or
training data is insufficient (< 100 records).

14 tests covering:
- Proportional rescaling (existing recipients)
- Age heuristic fallback (4 tests)
- QRF share prediction (4 tests including sum-to-one and
  elderly-predicted-as-retirement)
- Edge cases (zero imputation, missing subs, no SS)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use QRF for all PUF SS records, not just new recipients

CPS-PUF link is statistical (not identity-based), so the paired CPS
record's sub-component split is just one noisy draw. A QRF trained
on all CPS SS recipients gives a better expected prediction. Also
removes unnecessary try/except ImportError guard for microimpute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add towncrier changelog fragment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Drop formula variables from dataset, add integrity tests

Variables with formulas in policyengine-us are recomputed by the
simulation engine, so storing them wastes space and can mislead
validation. This removes 9 such variables from the extended CPS
output (saving ~15MB).

Also adds tests verifying:
- No formula variables are stored (except person_id)
- Stored input values match what the simulation computes
- SS sub-components sum to total social_security per person

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Format test file with black

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix black formatting for CI (26.1.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fixup! Fix black formatting for CI (26.1.0)

* Also drop adds/subtracts variables from dataset

Update _drop_formula_variables and tests to catch variables that use
`adds` or `subtracts` (e.g. social_security), not just explicit
formulas. These are also recomputed by the simulation engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Train QRF for retirement contributions on PUF clone half

PUF clones previously copied CPS retirement contributions blindly,
so a record with $0 wages could have $50k in 401(k) contributions.

Train a QRF on CPS data (which has realistic income-to-contribution
relationships) and predict onto the PUF half using PUF-imputed income.
Post-prediction constraints enforce contribution caps, zero-out rules
for records with no wages/SE income, and non-negativity.

- Remove traditional_ira_contributions from IMPUTED_VARIABLES
- Add CPS_RETIREMENT_VARIABLES, RETIREMENT_PREDICTORS constants
- Add _get_retirement_limits() with year-specific 401k/IRA caps
- Add _impute_retirement_contributions() following _impute_weeks_unemployed pattern
- Integrate into puf_clone_dataset() variable routing loop
- Add 34 unit tests covering constraints, routing, and limits

Closes #561

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Expand retirement QRF predictors and add validation script

Add income predictors (interest, dividends, pension, SS) to the
retirement contribution QRF, matching issue #561's recommendation.
Split RETIREMENT_PREDICTORS into demographic and income sublists so
the test side correctly sources income from PUF imputations.

Also add validation/validate_retirement_imputation.py for post-build
constraint checking and aggregate comparison against calibration targets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add QRF failure safety net and edge-case tests

Wrap QRF fit/predict in try/except so a crash returns zeros instead
of blowing up the entire puf_clone_dataset build.  Document the
pre_tax_contributions inconsistency (OVERRIDDEN vs CPS-trained
sub-components) for future reconciliation.

Tests: add test_qrf_failure_returns_zeros and
test_training_data_failure_returns_zeros (50 total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix conftest mocking microimpute when it is installed

The conftest.py files used `if "microimpute" not in sys.modules` to
decide whether to install a MagicMock.  On CI (Python 3.13, where
microimpute is fully installed), this condition was True at pytest
startup because no code had imported microimpute yet.  The mock
replaced the real package, so tests that triggered CPS generation
(add_rent → QRF imputation) silently got a MagicMock whose
__len__ returns 0, causing the "0 input values" ValueError.

Fix: use try/import instead of checking sys.modules, so the mock
is only installed when microimpute genuinely cannot be imported.

Also restores policyengine-us to 1.587.0 (the revert in 2566ac1
was a misdiagnosis — the conftest mock was the real root cause).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address PR review: consolidate limits, improve logging, de-duplicate

- Move retirement contribution limits to imputation_parameters.yaml
  (single source of truth for cps.py and puf_impute.py)
- Add exc_info=True to QRF exception handlers for full tracebacks
- Replace TestLimitsMatchCps with TestLimitsMatchYaml that actually
  cross-checks against the shared YAML source
- Remove redundant microimpute mock from test_calibration/conftest.py
  (root conftest already propagates)
- Document QRF training subsample of 5000 rationale
- Import targets/limits in validation script from loss.py and YAML
  instead of hardcoding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Cap SE pension and fix IRA gate for dual-income / SE-only filers

Previously, filers with SE income had ALL of RETCB_VAL allocated to
self_employed_pension_contributions, leaving $0 for 401(k)/IRA even
when they also had wages. SE-only filers also got $0 in IRA despite
being eligible.

Changes:
- Cap SE pension at min(25% of SE income, IRS dollar limit) so
  dual-income filers retain a remainder for 401(k) and IRA
- Gate IRA pool on has_earned_income (wages OR SE) instead of
  has_wages only, so SE-only filers can receive IRA contributions
- IRA pool = remaining - dc_pool (absorbs all non-DC remainder)
- Add SE pension rate and dollar limits to imputation_parameters.yaml
  (2020-2025, sourced from IRS one-participant 401k rules)
- Apply matching SE pension cap in PUF QRF constraints
- Add test_se_pension_capped_at_rate_times_income

No change to calibration targets (independently sourced from
BEA/FRED and IRS SOI). Pre-calibration aggregates should shift
closer to targets since dual-income filers now contribute to all
account types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Rebase on main: use shared get_retirement_limits() from PR #566

- Replace _get_retirement_limits() YAML loader with wrapper that
  merges policyengine-us params (401k/IRA) + YAML SE pension params
- Remove retirement_contribution_limits block from YAML (now sourced
  from policyengine-us parameter tree via utils/retirement_limits.py)
- Update validation script to use get_retirement_limits() directly
- Update tests for new structure (no more year clamping for 401k/IRA)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update uv.lock for version freshness

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Max Ghenis <mghenis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant