Add calibration package checkpointing, target config, and hyperparameter CLI#538
Add calibration package checkpointing, target config, and hyperparameter CLI#538
Conversation
There was a problem hiding this comment.
Minor comments, but generally LGTM, I was also able to run the calibration job in modal (after removing the ellipsis in unified_calibration.py)!
Small note: if im not mistaken this pr addressess issue #534. Seems like #310 was referenced in it as something that would be addressed together, but this pr does not save the calibration_log.csv among its outputs. Do we want to add it at this point?
4c51b32 to
61523d8
Compare
59b27a8 to
0a0f167
Compare
|
A couple questions on recent changes
Commit 49a1f66 ("Remove redundant --puf-dataset flag, add national targets") removed the ability to run PUF cloning inside the calibration pipeline, with the rationale that PUF cloning already happens upstream in However, PR #516 specifically designed the pipeline so that PUF + QRF imputation runs after cloning and geography assignment, so that each clone gets geographically-informed imputations (with
With the current flow ( Are we planning to bring back the post-cloning PUF imputation once the calibration pipeline is stabilized? Or has the approach changed?
In commit 02f8ad0, Then in commit 40fb389 (a "checkpoint"), This means the matrix builder now runs ~1,000-2,000 county-level simulations (one per unique county in the geography assignment) instead of 51 state-level simulations for variables like Was this an intentional simplification, or a debugging shortcut that could be reverted to the two-tier approach? Restoring |
|
@juaristi22 thank you for your thoughtful and excellent comments
That will fit the model on modal and drop a calibration_log.csv right on your local drive. I know one of the Issues was about actually storing it in an archive, and maybe that should be out of scope given this PR's complexity.
|
…ter CLI - Add build-only mode to save calibration matrix as pickle package - Add target config YAML for declarative target exclusion rules - Add CLI flags for beta, lambda_l2, learning_rate hyperparameters - Add streaming subprocess output in Modal runner - Add calibration pipeline documentation - Add tests for target config filtering and CLI arg parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Modal calibration runner was missing --lambda-l0 passthrough. Also fix KeyError: Ellipsis when load_dataset() returns dicts instead of h5py datasets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upload a pre-built calibration package to Modal and run only the fitting phase, skipping HuggingFace download and matrix build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Chunked training with per-target CSV log matching notebook format - Wire --log-freq through CLI and Modal runner - Create output directory if missing (fixes Modal container error) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set verbose_freq=chunk so epoch counts don't reset each chunk - Rename: diagnostics -> unified_diagnostics.csv, epoch log -> calibration_log.csv (matches dashboard expectation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of creating a new Microsimulation per clone (~3 min each, 22 hours for 436 clones), precompute values for all 51 states on one sim object (~3 min total), then assemble per-clone values via numpy fancy indexing (~microseconds per clone). New methods: _build_state_values, _assemble_clone_values, _evaluate_constraints_from_values, _calculate_target_values_from_values. DEFAULT_N_CLONES raised to 436 for 5.2M record matrix builds. Takeup re-randomization deferred to future post-processing layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Modal runner: add --package-volume flag to read calibration package from a Modal Volume instead of passing 2+ GB as a function argument - unified_calibration: set PYTORCH_CUDA_ALLOC_CONF=expandable_segments to prevent CUDA memory fragmentation during L0 backward pass - docs/calibration.md: rewrite to lead with lightweight build-then-fit workflow, document prerequisites, and add volume-based Modal usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- target_config.yaml: exclude everything except person_count/age (~8,766 targets) to isolate fitting issues from zero-target and zero-row-sum problems in policy variables - target_config_full.yaml: backup of the previous full config - unified_calibration.py: set PYTORCH_CUDA_ALLOC_CONF=expandable_segments to fix CUDA memory fragmentation during backward pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- apply_target_config: support 'include' rules (keep only matching targets) in addition to 'exclude' rules; geo_level now optional - target_config.yaml: 3-line include config replaces 90-line exclusion list for age demographics (person_count with age domain, ~8,784 targets) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The roth_ira_contributions target has zero row sum (no CPS records), making it impossible to calibrate. Remove it from target_config.yaml so Modal runs don't waste epochs on an unachievable target. Also adds `python -m policyengine_us_data.calibration.validate_package` CLI tool for pre-upload package validation, with automatic validation on --build-only runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename w_district_calibration.npy and unified_weights.npy to calibration_weights.npy everywhere (HF paths, local defaults, docs) - Add upload_calibration_artifacts() to huggingface.py for atomic multi-file HF uploads (weights + blocks + logs in one commit) - Add --upload flag (replaces --upload-logs) and --trigger-publish flag to remote_calibration_runner.py - Add _trigger_repository_dispatch() for GitHub workflow auto-trigger - Remove dead _upload_logs_to_hf() and _upload_calibration_artifact() - Add scripts/upload_calibration.py CLI + make upload-calibration target - Update modal_app/README.md with new flags and artifact table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chains make data, upload-dataset (API direct to HF), calibrate-modal (GPU fit + upload weights), and stage-h5s (build + stage H5s). Configurable via GPU, EPOCHS, BRANCH, NUM_WORKERS variables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Delete 3 one-off scripts (diagnose_ky01, generate_test_data, migrate_versioned) - Move check_staging_sums to calibration module with CLI args - Move verify_county_fix to test_xw_consistency.py (pytest, @slow) - Inline upload_calibration.py into Makefile target - Add sanity_checks.py: structural integrity checks for H5 files - Add --sanity-only flag to validate_staging.py - Add Makefile targets: validate-staging, check-staging, check-sanity, upload-validation - Add validation_results.csv to upload_calibration_artifacts() log_files - Append 4 doc sections: takeup rerandomization, block seeding, X@w invariant, gating workflow - Add calibration.md to MyST TOC Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- unified_calibration: emit SOURCE_IMPUTED_PATH for runner to capture - remote_calibration_runner: upload source-imputed dataset to HF after build - local_area: prefer source-imputed dataset when building staged H5s - publish_local_area: same source-imputed preference - Improved logging in remote runner (banner format, push plan) - Added check_volume_package helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
H5 files use variable_name/2024 (group → dataset), not flat keys. Use a _get() helper that resolves slash paths via f[path] instead of checking top-level f.keys(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add validate-staging job to local_area_publish.yaml that runs after staging, uploads results to HF, and posts summary to step summary - Add `make promote` target with auto-detected version from pyproject.toml - Fix validate_staging.py OOM: replace sim_cache dict with one-at-a-time loading, explicit del+gc.collect between states to prevent two sims coexisting in memory (failed on CO after CA) - Add per-state population logging and total weighted population check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dal runner - Add git provenance (branch, commit, dirty flag, version, dataset/DB SHA checksums) to calibration package metadata and run config output - Print provenance banner on package load with staleness/branch warnings - Write JSON sidecar on Modal volume for lightweight provenance checks - Remote runner: remove package_bytes param, auto-upload to Modal volume via --package-path, show provenance on --prebuilt-matrices - Fix takeup rerandomization: move override after initial state/county setup to avoid poisoning base calculations; county-level saves/restores original takeup values between counties and clears cache after override - Add domain_variable: age to district person_count in target config - Show git provenance fields in validation report - Replace hardcoded RECORD_IDX in matrix masking tests with dynamic record selection to avoid brittleness when data/formulas change - Update docs for new --package-path upload behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add parallel national calibration pipeline that produces a sparse national US.h5 alongside the existing local-area H5 files. Both calibrations share the pre-built matrix and run in parallel. - Add prefix parameter to HF upload/download for national artifacts - Add --national flag to calibration runner (defaults lambda_l0=1e-4) - Add build_national_h5() and national worker support - Add coordinate_national_publish() and main_national() entrypoint - Add Makefile targets: calibrate-modal-national, calibrate-both, stage-national-h5, stage-all-h5s - Remove --prebuilt-matrices flag; volume-fit is now the default - Update pipeline target to run both calibrations in parallel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`from policyengine_us_data import __version__` imports the submodule __version__.py rather than the string it defines. Changed to import from the module directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Comment out all targets except district-level age demographics - Rewrite build_national_h5 to collapse CD weights to household level instead of running 436 per-CD simulations - Add validate_national_h5.py script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Uncomment all ~80 targets in target_config.yaml (district, state, national) - Wire geo_labels.json through remote calibration runner (parse, save, upload) - Add staging support for national H5 (upload_to_staging_hf instead of direct upload_local_area_file) - Add main_national_promote entrypoint for two-phase publish - Include prior uncommitted work: geo_labels rename, stacked_dataset_builder, publish_local_area refactor, takeup utils, huggingface upload improvements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These functions were dropped during merge conflict resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Worker commits to volume but coordinator's view is stale. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace inline takeup draw loops in unified_matrix_builder.py (both the parallel worker path and the sequential clone path) with calls to the shared compute_block_takeup_for_entities() from utils/takeup.py. Remove deprecated functions from takeup.py that are no longer used: draw_takeup_for_geo, compute_entity_takeup_for_geo, apply_takeup_draws_to_sim, apply_block_takeup_draws_to_sim, and _build_entity_to_hh_index. Also remove the now-unused rerandomize_takeup function from unified_calibration.py. Simplify compute_block_takeup_for_entities signature by deriving state FIPS from block GEOID prefix instead of requiring a separate entity_state_fips parameter. Update tests to exercise the remaining shared functions directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove dead sim-based methods: _evaluate_constraints_entity_aware,
_calculate_target_values, and calculate_spm_thresholds_for_cd
- Delete duplicate class methods _evaluate_constraints_from_values and
_calculate_target_values_from_values; update call sites to use the
existing standalone functions with variable_entity_map
- Fix count-vs-dollar classifier: replace substring heuristic in
_get_uprating_info with endswith("_count"); use exact equality in
validate_staging._classify_variable to prevent false positives
- Add optional precomputed_rates parameter to
apply_block_takeup_to_arrays to skip redundant load_take_up_rate calls
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a676c66 to
c201eb7
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes #533
Fixes #534
Fixes #558
Fixes #559
Fixes #562
Summary
--build-onlysaves the expensive matrix build as a pickle,--package-pathloads it for fast re-fitting with different hyperparameters or target setstarget_config.yaml) replace hardcoded target filtering; checked-in config reproduces the junkyard's 22 excluded groups--beta,--lambda-l2,--learning-rateare now tunable from the command line and Modal runnerdocs/calibration.mdcovers all workflows (single-pass, build-then-fit, package re-filtering, Modal, portable fitting)XX-01(conventional 1-based) instead ofXX-00Test plan
pytest policyengine_us_data/tests/test_calibration/test_unified_calibration.py— CLI arg parsing testspytest policyengine_us_data/tests/test_calibration/test_target_config.py— target config filtering + package round-trip testsmake calibrate-buildproduces package,--package-pathloads it and fits🤖 Generated with Claude Code