Read retirement limits from policyengine-us parameters by MaxGhenis · Pull Request #566 · PolicyEngine/policyengine-us-data

MaxGhenis · 2026-03-03T19:35:31Z

Summary

Replaces the hard-coded RETIREMENT_LIMITS dict in cps.py with a shared utility (utils/retirement_limits.py) that reads from policyengine-us's parameter tree at runtime
The parameters already exist at gov.irs.gross_income.retirement_contributions.{limit, catch_up.limit}
Eliminates a maintenance risk: PR Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation #554 duplicates this dict in puf_impute.py — after both PRs merge, Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation #554 can import from the shared utility too

Test plan

Verified output matches the old hard-coded values for all years (2020-2025)
CI passes

🤖 Generated with Claude Code

Replace hard-coded RETIREMENT_LIMITS dict with a shared utility that reads from policyengine-us's parameter tree at runtime. This ensures limits stay in sync as policyengine-us is updated, and eliminates a maintenance risk when the same dict is duplicated in puf_impute.py (PR #554). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The fixup reformatted SQL strings in test_database_build.py using a newer Black version that disagrees with CI's lgeiger/black-action. Revert to the original formatting that CI accepts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Verifies get_retirement_limits returns correct IRS contribution limits for 2020, 2023, and 2025 by comparing against known values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The SECURE 2.0 update (merged to pe-us main, not yet released) renames children["401k"] to children["k401"] and changes it from a simple int to an age-bracketed scale. Handle both layouts so the code works with the current release and future releases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Drop dual-path handling in favor of the current parameter layout (k401 scale with age brackets from SECURE 2.0). Bump the minimum policyengine-us version to 1.572.5 which introduced this parameter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace _get_retirement_limits() YAML loader with wrapper that merges policyengine-us params (401k/IRA) + YAML SE pension params - Remove retirement_contribution_limits block from YAML (now sourced from policyengine-us parameter tree via utils/retirement_limits.py) - Update validation script to use get_retirement_limits() directly - Update tests for new structure (no more year clamping for 401k/IRA) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…RF imputation (#554) * Fix and add retirement contribution calibration targets - Correct traditional_ira_contributions: $25B → $13.2B (SOI 1304 Table 1.4 actual deduction, not total contributions) - Add traditional_401k_contributions: $567.9B (BEA/FRED employee DC contributions) - Add self_employed_pension_contribution_ald: $29.5B (SOI 1304 Table 1.4 Keogh plan deduction) - Remove roth_ira_contributions: structurally $0 due to CPS allocation bug (#553) - Update both loss.py and etl_national_targets.py Closes #553 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix RETCB_VAL allocation to use proportional split Replace sequential waterfall with proportional allocation based on administrative data. The old waterfall gave 401(k) first priority, consuming all of RETCB_VAL and leaving IRA contributions at $0 for every record. The Roth IRA allocation was also mathematically guaranteed to produce $0. The new approach splits RETCB_VAL proportionally: - DC vs IRA: 90.8% / 9.2% (BEA/FRED vs IRS SOI) - Within DC: 85% traditional / 15% Roth (Vanguard/PSCA) - Within IRA: 39.2% traditional / 60.8% Roth (IRS SOI Tables 5 & 6) All fractions are stored in imputation_parameters.yaml with sources. Contribution limits are still enforced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add Roth calibration targets and fix traditional 401(k) target - Split 401(k) target: $567.9B total employee DC deferrals (BEA/FRED) into traditional $482.7B (85%) and Roth $85.2B (15%) using Vanguard How America Saves 2024 dollar share estimate - Add roth_ira_contributions target: $35.0B from IRS SOI Accumulation Tables 5 & 6 (TY 2022) — direct administrative source - Update etl_national_targets.py in parallel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Reconcile SS sub-components after PUF imputation When PUF imputation replaces social_security values, the sub-components (retirement, disability, survivors, dependents) were left unchanged, creating a mismatch. This caused a base-year discontinuity where projected years had ~3x more SS recipients than the base year, producing artificial 9-point Gini swings. The new reconcile_ss_subcomponents() function rescales sub-components proportionally after PUF imputation. New recipients (CPS had zero SS but PUF imputed positive) default to retirement. Fixes #551 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use age heuristic for new SS recipients in reconciliation New PUF recipients (CPS had zero SS) now get assigned based on age: >= 62 -> retirement, < 62 -> disability. Matches the CPS fallback logic. Falls back to retirement if age is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use QRF to predict SS sub-component split for new recipients Instead of the simple age >= 62 heuristic, train a QRF on CPS records (where the reason-code split is known) to predict shares for new PUF recipients. Uses age, gender, marital status, and other demographics. Falls back to age heuristic when microimpute is unavailable or training data is insufficient (< 100 records). 14 tests covering: - Proportional rescaling (existing recipients) - Age heuristic fallback (4 tests) - QRF share prediction (4 tests including sum-to-one and elderly-predicted-as-retirement) - Edge cases (zero imputation, missing subs, no SS) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use QRF for all PUF SS records, not just new recipients CPS-PUF link is statistical (not identity-based), so the paired CPS record's sub-component split is just one noisy draw. A QRF trained on all CPS SS recipients gives a better expected prediction. Also removes unnecessary try/except ImportError guard for microimpute. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add towncrier changelog fragment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Drop formula variables from dataset, add integrity tests Variables with formulas in policyengine-us are recomputed by the simulation engine, so storing them wastes space and can mislead validation. This removes 9 such variables from the extended CPS output (saving ~15MB). Also adds tests verifying: - No formula variables are stored (except person_id) - Stored input values match what the simulation computes - SS sub-components sum to total social_security per person Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Format test file with black Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix black formatting for CI (26.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fixup! Fix black formatting for CI (26.1.0) * Also drop adds/subtracts variables from dataset Update _drop_formula_variables and tests to catch variables that use `adds` or `subtracts` (e.g. social_security), not just explicit formulas. These are also recomputed by the simulation engine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Train QRF for retirement contributions on PUF clone half PUF clones previously copied CPS retirement contributions blindly, so a record with $0 wages could have $50k in 401(k) contributions. Train a QRF on CPS data (which has realistic income-to-contribution relationships) and predict onto the PUF half using PUF-imputed income. Post-prediction constraints enforce contribution caps, zero-out rules for records with no wages/SE income, and non-negativity. - Remove traditional_ira_contributions from IMPUTED_VARIABLES - Add CPS_RETIREMENT_VARIABLES, RETIREMENT_PREDICTORS constants - Add _get_retirement_limits() with year-specific 401k/IRA caps - Add _impute_retirement_contributions() following _impute_weeks_unemployed pattern - Integrate into puf_clone_dataset() variable routing loop - Add 34 unit tests covering constraints, routing, and limits Closes #561 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Expand retirement QRF predictors and add validation script Add income predictors (interest, dividends, pension, SS) to the retirement contribution QRF, matching issue #561's recommendation. Split RETIREMENT_PREDICTORS into demographic and income sublists so the test side correctly sources income from PUF imputations. Also add validation/validate_retirement_imputation.py for post-build constraint checking and aggregate comparison against calibration targets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add QRF failure safety net and edge-case tests Wrap QRF fit/predict in try/except so a crash returns zeros instead of blowing up the entire puf_clone_dataset build. Document the pre_tax_contributions inconsistency (OVERRIDDEN vs CPS-trained sub-components) for future reconciliation. Tests: add test_qrf_failure_returns_zeros and test_training_data_failure_returns_zeros (50 total). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix conftest mocking microimpute when it is installed The conftest.py files used `if "microimpute" not in sys.modules` to decide whether to install a MagicMock. On CI (Python 3.13, where microimpute is fully installed), this condition was True at pytest startup because no code had imported microimpute yet. The mock replaced the real package, so tests that triggered CPS generation (add_rent → QRF imputation) silently got a MagicMock whose __len__ returns 0, causing the "0 input values" ValueError. Fix: use try/import instead of checking sys.modules, so the mock is only installed when microimpute genuinely cannot be imported. Also restores policyengine-us to 1.587.0 (the revert in 2566ac1 was a misdiagnosis — the conftest mock was the real root cause). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR review: consolidate limits, improve logging, de-duplicate - Move retirement contribution limits to imputation_parameters.yaml (single source of truth for cps.py and puf_impute.py) - Add exc_info=True to QRF exception handlers for full tracebacks - Replace TestLimitsMatchCps with TestLimitsMatchYaml that actually cross-checks against the shared YAML source - Remove redundant microimpute mock from test_calibration/conftest.py (root conftest already propagates) - Document QRF training subsample of 5000 rationale - Import targets/limits in validation script from loss.py and YAML instead of hardcoding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Cap SE pension and fix IRA gate for dual-income / SE-only filers Previously, filers with SE income had ALL of RETCB_VAL allocated to self_employed_pension_contributions, leaving $0 for 401(k)/IRA even when they also had wages. SE-only filers also got $0 in IRA despite being eligible. Changes: - Cap SE pension at min(25% of SE income, IRS dollar limit) so dual-income filers retain a remainder for 401(k) and IRA - Gate IRA pool on has_earned_income (wages OR SE) instead of has_wages only, so SE-only filers can receive IRA contributions - IRA pool = remaining - dc_pool (absorbs all non-DC remainder) - Add SE pension rate and dollar limits to imputation_parameters.yaml (2020-2025, sourced from IRS one-participant 401k rules) - Apply matching SE pension cap in PUF QRF constraints - Add test_se_pension_capped_at_rate_times_income No change to calibration targets (independently sourced from BEA/FRED and IRS SOI). Pre-calibration aggregates should shift closer to targets since dual-income filers now contribute to all account types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Rebase on main: use shared get_retirement_limits() from PR #566 - Replace _get_retirement_limits() YAML loader with wrapper that merges policyengine-us params (401k/IRA) + YAML SE pension params - Remove retirement_contribution_limits block from YAML (now sourced from policyengine-us parameter tree via utils/retirement_limits.py) - Update validation script to use get_retirement_limits() directly - Update tests for new structure (no more year clamping for 401k/IRA) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update uv.lock for version freshness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Max Ghenis <mghenis@gmail.com>

MaxGhenis and others added 6 commits March 3, 2026 14:35

fixup! Read retirement limits from policyengine-us parameters

1b2d1f3

Add tests for retirement limits utility

b176493

Verifies get_retirement_limits returns correct IRS contribution limits for 2020, 2023, and 2025 by comparing against known values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MaxGhenis merged commit cf3c940 into main Mar 4, 2026
6 checks passed

MaxGhenis mentioned this pull request Mar 4, 2026

Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation #554

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read retirement limits from policyengine-us parameters#566

Read retirement limits from policyengine-us parameters#566
MaxGhenis merged 6 commits intomainfrom
use-pe-us-retirement-limits

MaxGhenis commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Mar 3, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant