Add pyfia.carbon: NSVB-based forest carbon estimation (all 6 IPCC pools + stock-change)#83
Merged
Merged
Conversation
…ase 1 Introduce a new top-level subpackage src/pyfia/carbon/ as a sibling to src/pyfia/estimation/. Phase 1 PR 1 of 3 — ships only the NSVB equation library, coefficient loaders, and CarbonEstimator skeleton. The estimator itself (NSVBLiveTreeEstimator) and the FIADB validation gate land in PR 2 and PR 3 respectively. What's in this PR: - pyfia.carbon.nsvb.equations: pure-math implementation of NSVB Models 1, 2, 4, and 5 (Westfall et al. 2023, GTR-WO-104), plus the harmonization algorithm and the predict_tree_biomass orchestrator. Reverse-engineered from the GTR-WO-104 worked examples (Douglas-fir SPCD=202 with no cull; red maple SPCD=316 with 3% cull). Verified against both worked examples to 1e-9 relative error. - pyfia.carbon.nsvb.coefficients: importlib.resources-based loaders for the vendored S1a-S8b CSVs with NSVB lookup precedence (SPCD+DIVISION+ STDORGCD, SPCD+DIVISION, SPCD-only, JENKINS_SPGRPCD fallback). lru_cache on the loader. Phase 1 uses the SPCD-only fallback path; DIVISION-specific rows are wired but not exercised until pyFIA gains a PLOT.ECOSUBCD -> Bailey ecoprovince mapping. - pyfia.carbon.nsvb.carbon_fractions: S10a (live, 2676 species) and S10b (dead, hw/sw x decay class) loaders. Percent-to-decimal normalization at load time with explicit unit tests. Unknown SPCDs fall back to the national mean and warn-once per SPCD. - src/pyfia/carbon/nsvb/data/: 12 vendored coefficient CSVs from GTR-WO-104 Supp1 plus a README documenting provenance, the column-rename procedure for S10a (division -> hw_sw, angiosperm/gymnosperm -> hardwood/softwood), and the source data quirks. - CarbonEstimator skeleton in src/pyfia/carbon/__init__.py with all six pool methods raising NotImplementedError pointing to the Phase that will deliver each one. live_tree() will be wired in PR 2. - 50 unit tests + 9 hypothesis property tests covering the equation forms, the harmonization invariants, the lookup precedence, the percent-to-decimal normalization, and the cull-reduction path. Findings worth flagging in review: 1. The equations.md sidecar in references_md documents Models 2 and 4 incorrectly. Model 2 is power-power-with-k-constant (k=9 softwood, k=11 hardwood), not power-exponential. Model 4 has a missing exp(-b1*D) factor in the sidecar. Both correct forms are verified against the WO-104 PDF worked examples in this PR's tests. 2. The DIVISION column in S1a-S8b uses Bailey ecoprovince codes (e.g., M240), while the division column in S10a uses angiosperm/gymnosperm. Different concepts, same column name. The vendoring step renames the S10a column to hw_sw to prevent runtime collisions. 3. SPCD 5152 has a carbon fraction of 0.5786, exceeding the conservative [0.40, 0.55] bound the plan assumed. The actual S10a range is [0.365, 0.609] across 2676 species. Tests use the realistic [0.30, 0.65] bound. Phase 1 scope cuts (acknowledged in plan): - Coarse roots / Heath et al. (2009): bridged to FIADB CARBON_BG in PR 2 - Model 6 / merchantable subdivision: deferred to a later phase - DIVISION-specific coefficient lookup: deferred unless validation gate in PR 3 measures > 1% per-tree disagreement with FIADB CARBON_AG Test plan: - uv run pytest tests/unit/test_nsvb_equations.py tests/unit/test_nsvb_coefficients.py tests/unit/test_carbon_fractions.py tests/property/test_nsvb_properties.py (59 tests, all passing) - uv run ruff format && uv run ruff check src/pyfia/carbon/ tests/unit/test_nsvb_*.py tests/property/test_nsvb_properties.py tests/unit/test_carbon_fractions.py (clean) - uv run mypy src/pyfia/carbon/ (clean) - Full unit suite: uv run pytest tests/unit/ — 770 passed, no regressions Refs: - pyfia.carbon technical specification (external): /Users/cmihiar/Documents/Claude/Projects/schmidt/pyfia_carbon_tech_spec.md - Plan: /Users/cmihiar/.claude/plans/transient-gathering-kazoo.md - Westfall, J.A. et al. (2023). GTR-WO-104. DOI: 10.2737/WO-GTR-104 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two blockers from the PR 1 critical review, plus a forward-looking roadmap so PR 2 has a single source of truth for the architectural contract. Drop the CarbonEstimator skeleton class - All method bodies raised NotImplementedError; the class added zero functional behavior and diverged from the pyfia function-based convention (biomass(), mortality(), volume()). - PR 2 will introduce pyfia.carbon.live_tree(db, ...) as a function matching the existing estimator pattern, on a clean slate. - Avoids shipping a public API surface PR 2 has to remove. Rewrite carbon/__init__.py as the canonical PR 2 roadmap - Four sections: currently in PR 1 / PR 2 contract / architectural rules / items deferred from review. - Architectural rules codify: function-based public API, vectorize via polars joins (the scalar lookup_coefficients and predict_tree_biomass are reference implementations only), inherit from BaseEstimator, match mortality() docstring quality, bridge BG carbon to FIADB CARBON_BG for now. - Deferred items list captures the should-fix items from review (schema fragility, null coercion, boundary types, hardcoded carbon fraction default, dead lookup precedence levels, SPCD 10 misclassification). Add CSV→pipeline regression sentinels (TestPipelineViaCSV) - Two tests walking the real lookup_coefficients → predict_tree_biomass path for Douglas-fir and red maple. - Asserts AGB within 0.5% of the GTR-WO-104 worked example values (currently measured: Douglas-fir 0.09%, red maple 0.00%). - Asserts every component of the bundle is resolved at the "spcd" precedence level — catches mis-routing through the Jenkins fallback if the SPCD-keyed tables get truncated. - Closes the gap where the existing tests verify the equation form with hand-coded prose coefficients but never exercise the actual CSV data path. Add three targeted TODO(PR 2) comments - coefficients.py:load_nsvb_coefficients — explicit dtypes before wiring the ECOSUBCD lookup - coefficients.py:lookup_coefficients — vectorize via polars join - equations.py:predict_tree_biomass — vectorize as polars expressions - Each is one line and points back to the central roadmap section in carbon/__init__.py — no scattered notes. Verified - 61/61 NSVB tests pass (was 59 + 2 new) - 772/772 unit suite passes (no regressions) - mypy src/pyfia/carbon/ clean - ruff format and check clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second-pass review found that the regression sentinels added in 316f966 only catch corruption to total_biomass_spcd.csv. Mutation testing confirmed: doubling the 'a' coefficient in volib_spcd, volbk_spcd, bark_biomass_spcd, or branch_biomass_spcd leaves the harmonized AGB unchanged because the harmonization step proportionally rescales components to hit the directly-predicted total. A single AGB assertion is structurally blind to four of the five vendored CSVs. Add per-component lock-in assertions - New module-level constants DOUGFIR_CSV_EXPECTED and REDMAPLE_CSV_EXPECTED capture the full TreeBiomassResult vector (agb, w_wood, w_bark, w_branch, v_wood_ib, v_bark) for the species-level CSV path. - Each TestPipelineViaCSV test now makes assertions at two layers: 1. Layer 1 (existing): AGB vs WO-104 worked example, ~0.5% tolerance — documents the per-PR contract (Phase 1 reproduces the worked example AGB to within 0.5% even with the deferred ECOSUBCD lookup) 2. Layer 2 (new): full result vector vs CSV-expected constants at f64-tight rel_tol=1e-9 — locks every vendored coefficient CSV - Failure messages name the suspect CSV file so debugging is direct. - v_wood_ib and v_bark are the only signals that lock volib_spcd and volbk_spcd respectively, since volbk feeds nothing in the Phase 1 AGB path (its value is stored in TreeBiomassResult for Phase 2+ adjusted-density use). Mutation-test verification (after fix) - volib *= 2 → CAUGHT (w_wood, w_bark, w_branch, v_wood_ib fail) - volbk *= 2 → CAUGHT (v_bark fails) - bark_bio *= 2 → CAUGHT (w_wood, w_bark, w_branch fail) - branch_bio *= 2 → CAUGHT (w_wood, w_bark, w_branch fail) - total_agb *= 2 → CAUGHT (agb, w_wood, w_bark, w_branch fail) All five mutations also caught at 1 part per million sensitivity (target['a'] *= 1 + 1e-6). Verified - 61/61 NSVB tests pass - 772/772 unit suite passes - mypy src/pyfia/carbon/ clean - ruff format and check clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NSVB equation library and coefficient loaders for pyfia.carbon Phase 1
Wires the vectorized NSVB (Westfall et al. 2023, GTR-WO-104) biomass pipeline to a pyFIA estimator, implementing the Phase 1 live tree pool of the Schmidt Sciences Synthetic Inventory project. - compute_nsvb_biomass(trees_lf): polars-expression orchestrator that joins 5 coefficient tables + cull adjustment + harmonization in a single LazyFrame pass, replacing the scalar predict_tree_biomass per-tree path at FIA scale. - nsvb_biomass_expr(): polars expression dispatching Models 1/2/4/5 from a 'model' column (Jenkins fallback Model 5 included). - get_vectorized_lookup_tables(): cached bundle of join-ready species-level + Jenkins coefficient tables for all 5 components. - load_carbon_fractions_live_df(): join-ready S10a DataFrame. - LiveTreeEstimator(BaseEstimator) + live_tree(db, pool='ag'|'bg'|'total'): public API exported from pyfia.carbon and pyfia. Uses NSVB for AG and bridges to FIADB CARBON_BG for BG per the PR 2 contract. Resolves all PR 1 deferred review items: - Schema fragility: load_nsvb_coefficients uses explicit schema_overrides (DIVISION Utf8, STDORGCD Int64). - Boundary types: predict_tree_biomass normalizes hw_sw casing, types it as Literal["hardwood","softwood"], and validates dia >= 1.0. - DEFAULT_LIVE_CARBON_FRACTION: now lazily computed from the S10a mean via PEP 562 __getattr__ (was hardcoded 0.4716, drifted from 0.4741). - SPCD 10 misclassification: resolved by deriving hw_sw from the SPCD<300 rule (consistent with _model_k), not S10a. - Levels 1-2 of the lookup precedence are documented as skipped in Phase 1 — re-enabled when ECOSUBCD → DIVISION mapping lands. - Scalar predict_tree_biomass stays as the test oracle. Tests (103 new carbon/NSVB tests; 826 total unit+property tests pass): - test_nsvb_vectorized.py: vectorized ≡ scalar oracle at rel_tol=1e-9 on 550+ synthetic trees spanning Models 1/2/4, hw/sw, Jenkins fallback, and cull>0/null paths. Plus the Douglas-fir and red maple worked-example sentinels at f64-tight precision. - test_live_tree_estimator.py: config/column unit tests + end-to-end smoke tests gated on the georgia_db fixture. - Fixes a pre-existing ULP-level fragility in two model_1 monotonicity property tests (assume |d_hi - d_lo| > 1e-6). Also removes the empty pools/ subdirectory per the flat-structure convention in CLAUDE.md; Phase 2+ pools will land alongside live_tree.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three defensive fixes that narrow the investigation search space before
running pool='ag' / pool='total' validation against FIADB CARBON_AG and
EVALIDator. Each fix is independently motivated; bundled here to keep the
pre-validation diff small.
Defensive null-filter in build_species_level_lookup
- The vectorized _join_and_eval_component coalesce operates column-by-column,
so a species-level row with a null b1 on a Model 2/4 row would silently
fall back to the Jenkins b1=0.0 synthetic value and corrupt the math
row-wise. The scalar reference path coerces nulls to 0.0 within the
same row, so the two paths would diverge in ways the equivalence suite
might not catch (depends on which SPCDs the test sampler picks).
- Current vendored CSVs have no such rows (verified via direct null count:
every null b1 in a *_spcd table is on a Model 1 row, which does not
consume b1), so this filter is a no-op today. It is a regression guard:
a rogue null on a re-vendored Model 2/4 row will be dropped here and
the SPCD will fall through cleanly to Jenkins, rather than silently
producing wrong per-row arithmetic.
Cross-era BG bridge warning for pool='total'
- The Phase 1 BG bridge reads FIADB TREE.CARBON_BG directly. For NSVB-era
inventories (Sep 2023 onward) this is internally consistent with the
NSVB-recomputed AG. For pre-NSVB inventories CARBON_BG was computed via
legacy Jenkins-based allometry, so pool='total' produces a cross-era
methodological sum.
- Logs a best-effort warning when pool='total' and the selected EVALID's
END_INVYR is < 2024 (safe threshold covering the Sep 2023 transition).
Wrapped in try/except so a year-lookup failure never breaks estimation.
Silent for pool='ag' (no mixing) and pool='bg' (no mixing either —
user explicitly asked for the bridge value).
Delete dead rename no-op in LiveTreeEstimator.aggregate_results
- results.rename({"CARBON_ACRE": "CARBON_ACRE", "CARBON_TOTAL": "CARBON_TOTAL"})
was an identity left over from copying the biomass estimator pattern.
Replaced with a short comment documenting that CARBON_ACRE / CARBON_TOTAL
are already the canonical names produced by _apply_two_stage_aggregation.
Deferred to a post-validation cleanup PR
- by_size_class silently ignored (inherited gap from biomass.py)
- Grouping-column order reversal in format_output (also inherited)
- REF_SPECIES join wasted work for pool='bg'
- Silent sub-inch drop in calculate_values
- See Also precision on pool semantics
Verified
- 826 unit+property tests pass (+0 regressions, identical to pre-fix)
- ruff format / ruff check clean on src/pyfia/carbon/
- mypy src/pyfia/carbon/ clean (no issues in 6 files)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs discovered when wiring up real Georgia FIA data for the PR 2
validation gate. Both blocked the 4 end-to-end smoke tests in
test_live_tree_estimator.py::TestLiveTreeEndToEnd from running, and
both are mandatory for the estimator to work against any real FIADB
data — so they belong in PR 2, not the cleanup PR.
SPCD Float64/Int64 join mismatch (src/pyfia/carbon/live_tree.py)
- Root cause: FIA DataMart CSV → DuckDB conversion infers TREE.SPCD as
Float64 whenever the raw data contains any null SPCD row (DuckDB's
read_csv_auto cannot use Int64 if NaN is present). The NSVB coefficient
tables and REF_SPECIES are both keyed on Int64 SPCD, so the first
polars join raised `polars.exceptions.SchemaError: datatypes of join
keys don't match` with no automatic cast.
- Fix: cast SPCD to Int64 at the top of calculate_values, before any
join. One cast up front handles both the REF_SPECIES join in this
module and the coefficient-table joins in compute_nsvb_biomass.
- This was not caught by the existing unit tests because
test_nsvb_vectorized.py synthesizes the trees frame with an explicit
{"SPCD": pl.Int64} schema, matching the coefficient-table dtype by
construction. Only real-data validation could surface it.
conftest georgia_db / fia_db teardown AttributeError (tests/conftest.py)
- Pre-existing infrastructure bug (not a PR 2 regression): both fixtures
call `db.close()` unconditionally, but the plain FIA class has no
close() method — cleanup is handled by FIADataReader, and FIA.__exit__
is a no-op. Only MotherDuckFIA exposes close(). On the DuckDB file
path (the common case), session teardown raised
`AttributeError: 'FIA' object has no attribute 'close'`.
- Fix: wrap both close() calls in `if hasattr(db, "close"):`. Minimal
diff; preserves the MotherDuck cleanup path.
- Fix is landed here because the broken teardown was blocking clean
test output on the validation smoke run.
Verified
- tests/unit/test_live_tree_estimator.py::TestLiveTreeEndToEnd:
4 passed (previously all 4 failing; 1 teardown error also gone)
- tests/unit tests/property full suite: 830 passed, 77 deselected
(+4 from the previously skipped smoke tests now executing against
the real Georgia DB at data/georgia.duckdb)
- Actual estimates on Georgia EVALID 132401 (END_INVYR 2024, NSVB-era):
pool='ag' → 29.163 tons/ac, 661.7M tons total, 4,632 plots, 130,806 trees
pool='bg' → 5.573 tons/ac, 126.5M tons total (same plot/tree counts)
pool='total' → 34.736 tons/ac, 788.2M tons total
- Additivity: AG + BG == total to f64 precision (ratio-of-means correct)
- BG/AG ratio: 19.1% (within the 15-25% expected range for live-tree
root biomass)
- Cross-era BG bridge warning silent as expected: year=2024 ≥ 2024
threshold
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
First real-data validation test for PR 2 (pyfia.carbon.live_tree). Runs
the vectorized NSVB pipeline on every eligible live tree in the Georgia
FIA database (EVALID 132401, end_invyr 2024, NSVB-era) and diffs against
the pre-computed TREE.CARBON_AG column. Reports a layered diagnostic
(carbon rel_error + biomass ratio + FIADB implied fraction) and asserts
against ratchet thresholds set slightly above the current Phase 1 baseline.
Baseline (1,252,938 trees compared, 0.173% null from SPCD coverage gaps)
-----------------------------------------------------------------------
Carbon rel_error vs FIADB:
mean 10.03%
median 4.87%
p95 35.19%
p99 65.23%
max 124.26x (worst offender)
within 0.1% 29.04%
within 1% 34.30%
within 5% 50.45%
Biomass ratio (pyfia_NSVB_AGB / FIADB_DRYBIO_AG):
median 1.0179 (pyfia 1.8% higher on median — Phase 1 DIVISION gap)
mean 1.0837 (mean skewed by high-cull outliers)
p5 0.9127
p95 1.3870
FIADB implied carbon fraction (CARBON_AG / DRYBIO_AG):
median 0.4770, range [0.4230, 0.5270]
→ Exactly the S10a range. Confirms FIADB uses species-specific NSVB
carbon fractions — the fraction layer is not the source of the gap.
Two distinct disagreement sources identified
---------------------------------------------
1. **Phase 1 DIVISION gap** (~1-3% median, ~30-40% p95): expected per
the PR 2 contract. PR 2 uses species-level (DIVISION-null) coefficient
rows only; FIADB uses the appropriate DIVISION-specific row when one
exists. The ECOSUBCD → Bailey DIVISION mapping needed to close this
gap was explicitly deferred to a future "Phase 1.5" PR.
2. **High-cull tree disagreement** (NEW finding, not documented in the
contract): every single one of the top-10 worst offenders has
CULL ∈ [95, 99]%. pyfia predicts 46×-125× more carbon than FIADB for
these trees. The literal GTR-WO-104 formula in PR 2 is:
W_wood_red = V_wood_ib × [1 - CULL/100 × (1 - DensProp)] × WDSG × 62.4
where DensProp = 0.54 (hardwood) / 0.92 (softwood) from Harmon et al.
(2011) Table 1 DECAYCD=3. For a hardwood with CULL=95%, this gives
a retention factor of 0.563 (56.3% of wood weight retained), whereas
FIADB's retention factor for the same tree implies ~0.05 (5%). The
difference suggests FIADB applies either:
(a) a straight (1 - CULL/100) reduction without DensProp, OR
(b) a TREECLCD-based special-case (rough/rotten cull trees get a
different allometry), OR
(c) DensProp values much lower than Harmon 2011 DECAYCD=3.
This is a methodological question that needs investigation before
we can close the validation gap. Suggested next commits in this PR:
- Add a TREECLCD breakdown to the diagnostics to confirm high-cull
trees are TREECLCD=3/4 (rotten/rough cull)
- Cross-check the Schmidt reference vs the actual FIA CARBON_AG
computation code (FIA Processing Flow documentation)
- Decide whether PR 2 should (a) special-case high-cull trees,
(b) use TREECLCD for dispatch, or (c) document the discrepancy
and filter high-cull from the reference validation scope
Ratchet thresholds
------------------
The test asserts at thresholds slightly looser than the current baseline
so it passes today:
median rel error < 6% (current 4.87%)
p99 rel error < 70% (current 65.23%)
within 5% frac > 45% (current 50.45%)
biomass ratio med ∈ [1.00, 1.05] (current 1.0179)
SPCD null frac < 1% (current 0.173%)
As the Phase 1.5 DIVISION lookup lands and the high-cull question is
resolved, each improvement should ratchet these constants down. Loosening
any of them is a regression; tightening is a measurable improvement.
Running
-------
The test is marked validation + network + slow by the tests/validation
conftest, so it's excluded from the default CI pass. Run explicitly:
uv run pytest tests/validation/test_live_tree_nsvb.py -v -s -m validation
Requires data/georgia.duckdb (symlinked to ~/.pyfia/data/ga/ga.duckdb
after running: python -c "from pyfia import download; download('GA')").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-estimator Add NSVB live tree carbon estimator (pyfia.carbon.live_tree)
Wires the Bailey ecoprovince DIVISION lookup (Level 2 of the NSVB precedence) through the vectorized biomass pipeline. This closes the systematic 3.2% growing-stock biomass overestimate found in the first validation gate commit (e1f0254) by activating ~63% of the vendored NSVB coefficient rows that Phase 1 was skipping as dead code. Measured gap closure on Georgia EVALID 132401 (~1.25M live trees) ------------------------------------------------------------------ Phase 1 Phase 1.5 delta median rel_err 4.87% 3.55% -27% within 0.1% 28.99% 38.83% +9.8pt within 1% 34.24% 43.07% +8.8pt within 5% 50.36% 53.90% +3.5pt biomass ratio med 1.0179 1.0000 systematic bias closed biomass ratio mean 1.0837 1.0776 p99 65.23% 65.55% ~unchanged DIVISION coverage — 100.00% all plots resolved The biomass-ratio median hits 1.0000 exactly — half of all trees now match FIADB DRYBIO_AG to floating-point precision. The remaining disagreement (mean 9.57%, p99 65.55%) is dominated by the TREECLCD=4 rotten-cull methodology issue identified in the first commit, which is orthogonal to the DIVISION lookup and will be tackled next. What landed ----------- **Coefficient infrastructure** (src/pyfia/carbon/nsvb/coefficients.py) - `build_division_lookup()`: new helper that filters `*_spcd` tables to DIVISION-specific rows (DIVISION non-null + STDORGCD null) and returns a `(SPCD, DIVISION, model, a, b, b1, c)` lookup keyed on the composite `(SPCD, DIVISION)`. Applies the same defensive Model 2/4 null-b1 filter as `build_species_level_lookup`. - `ecosubcd_to_division()`: Bailey crosswalk utility that walks Subsection → Division (e.g., "231Ae" → "230", "M231Aa" → "M230"). Handles null, empty, whitespace, lowercase, and malformed inputs. - `VectorizedLookupTables` gains 5 new `*_div` fields (one per component table). `get_vectorized_lookup_tables()` builds them alongside the existing species-level + Jenkins lookups. - Level 1 of the NSVB precedence (SPCD + DIVISION + STDORGCD) is still deliberately deferred: only ~10 rows total and would require threading COND.STDORGCD through. Revisit once the DIVISION closure impact is understood. **Vectorized pipeline** (src/pyfia/carbon/nsvb/equations.py) - `_join_and_eval_component()` now takes a `div_table` and a `has_division` bool. When True, runs the Level 2 join first and does a 3-way coalesce (division → species-level → Jenkins); when False, falls back to the original 2-way coalesce, preserving backward compatibility for callers that haven't wired ECOSUBCD. - `compute_nsvb_biomass()` probes the trees schema once for a `DIVISION` column and passes `has_division` to all 5 per-component joins. `DIVISION` is explicitly documented as optional on the input LazyFrame so synthetic tests and pre-ECOSUBCD callers still work unchanged. **Unit tests** - `TestEcosubcdToDivision`: 7 tests covering SE Mixed Forest, mountain prefix, other domains, null/empty, malformed, lowercase, whitespace. - `TestBuildDivisionLookup`: 4 tests for column layout, DIVISION filter, STDORGCD exclusion, composite key uniqueness. - `TestGetVectorizedLookupTables` gains division-lookup assertions and a Georgia-species DIVISION=230 presence check. - `TestDivisionLookupPath` in test_nsvb_vectorized.py: 4 end-to-end tests covering non-null AGB output, DIVISION=null fallback, a positive control using SPCD=316 DIVISION=230 (material coefficient difference verified against the raw tables), and a bogus-DIVISION fall-through case. - The existing scalar ≡ vectorized equivalence suite still passes because the synthetic trees have no DIVISION column; the `has_division=False` branch preserves exact prior behavior. **Validation test** (tests/validation/test_live_tree_nsvb.py) - LEFT JOINs `PLOTGEOM` on `PLT_CN = pg.CN` to pull ECOSUBCD. - Maps ECOSUBCD → DIVISION via `ecosubcd_to_division`. 100% of Georgia trees resolve to a DIVISION (all `231*` / `232*` → 230). - Ratchet thresholds tightened to the new Phase 1.5 baseline: _BASELINE_MEDIAN_REL_ERR 0.060 → 0.040 _BASELINE_WITHIN_5PCT_FRAC 0.45 → 0.50 _BASELINE_BIOMASS_RATIO window [1.00, 1.05] → [0.99, 1.01] New ratchets added: _BASELINE_WITHIN_1PCT_FRAC 0.40 (new) _BASELINE_WITHIN_0P1PCT_FRAC 0.35 (new) - Skips gracefully when PLOTGEOM is missing from the test database, with an inline script in the module docstring showing how to pull it (pyfia.downloader.COMMON_TABLES doesn't include PLOTGEOM). Test results ------------ - tests/unit tests/property: 847 passed, 77 deselected, 0 failures (+17 new DIVISION-related tests) - tests/validation/test_live_tree_nsvb.py: 1 passed at the Phase 1.5 baseline - ruff format / check clean on all modified files - mypy src/pyfia/carbon/ clean (no issues in 6 source files) Next commits in this PR ----------------------- 1. Investigate TREECLCD=4 rotten-cull methodology (the unchanged p99) — pyfia's literal GTR-WO-104 cull formula retains 56% of wood for CULL=95% hardwood; FIADB retains ~5%. Needs Appendix K. 2. Thread ECOSUBCD through LiveTreeEstimator.calculate_values so the production estimator path benefits from the same fix (currently only the validation test uses it; the estimator will still show the 3% growing-stock bias until this lands). 3. Ratchet thresholds further as each improvement lands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…/1.7 Quick-win downloader change plus documentation reconciliation that captures the state for the next session. No code behavior changes in the NSVB pipeline itself — this is pure integration/docs work. PLOTGEOM in COMMON_TABLES ------------------------- `pyfia.downloader.tables.COMMON_TABLES` now includes `PLOTGEOM`, so any fresh `pyfia.download(state)` call pulls ECOSUBCD automatically without needing the manual import script that the Phase 1.5 validation test required. Existing databases downloaded before this change still need the one-off import; the script lives in the validation test's module docstring (slightly reworded to note it's only for legacy databases). `tests/validation/test_live_tree_nsvb.py`: - Module docstring: notes PLOTGEOM is now in COMMON_TABLES; the manual import script stays as a fallback for legacy databases. - pytest.skip message updated to point at the same explanation. No test assertions depend on `COMMON_TABLES` length or contents — only docs files reference it, and those are code-block examples, not assertions. Carbon roadmap reconciliation (src/pyfia/carbon/__init__.py) ------------------------------------------------------------ Completely rewrote the module docstring to replace the "PR 2 contract" framing (now merged) with a Phase-based status roadmap: - **Phase 1 (merged)**: PR 1 + PR 2, the equation library, coefficient loaders, and LiveTreeEstimator. - **Phase 1.5 (in progress, this PR)**: validation gate + DIVISION lookup. Documents what landed (commits e1f0254, adf3635) and what's still pending. - **Phase 1.6 (pending, next session)**: TREECLCD=4 rotten-cull methodology investigation. Spells out the literal GTR-WO-104 formula currently in the code, the measured discrepancy (pyfia retains 56% of wood for 95% hardwood cull, FIADB ~5%), and points at the missing piece (FIA User Guide Appendix K, not yet in the Schmidt references library). - **Phase 1.7 (pending, next session)**: thread ECOSUBCD through `LiveTreeEstimator.calculate_values` so the production estimator path benefits from the Phase 1.5 DIVISION lookup (currently only the validation test path does). Documents the implementation shape and flags the `DataLoader`/`PLOTGEOM` routing caveat. - **Phase 2+ (deferred)**: standing dead, understory, downed dead wood, litter, SOC, native NSVB BG — each as its own flat-layout estimator. The "Architectural rules" and "Items resolved from Phase 1 review" sections are preserved and updated to reflect merged state. New "Pointers for the next session" section tells a fresh Claude session where to find the PR, the validation measurement instrument, and the Schmidt references library. Auto-memory update ------------------ (Outside this commit — in `~/.claude/projects/.../memory/`): - Removed the stale `project_pr2_carbon_nsvb.md` (PR 2 is now merged). - Added `project_pr3_carbon_validation.md` with the current state, the two pending investigations, key file pointers, and the caveat about `DataLoader` not knowing how to load `PLOTGEOM` (Phase 1.7 gotcha). - Updated `MEMORY.md` index accordingly. Verified -------- - 847 unit+property tests pass (no regressions from COMMON_TABLES addition or docs changes) - ruff format / check clean on all modified files - The Phase 1.5 ratchet thresholds still hold (this commit doesn't touch the NSVB math, but the validation test is sensitive to any pipeline regression and still passes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LiveTreeEstimator.calculate_values now loads PLOTGEOM via a new _load_plotgeom helper, joins it on PLT_CN, and derives a DIVISION column from ECOSUBCD before calling compute_nsvb_biomass. This activates the Phase 1.5 Level 2 NSVB coefficient lookup on the production estimator path — pyfia.carbon.live_tree() now closes the same ~3% growing-stock biomass bias that the validation test already exercised directly. PLOTGEOM is loaded out-of-band (not via get_required_tables) to match the existing _load_ref_species pattern, since pyfia's DataLoader doesn't have a slot for spatial / reference tables in its TREE/COND/PLOT join graph. Older databases without PLOTGEOM fall back gracefully (one-shot warning, Phase 1 quality) via a try/except around the read_table call. Measured Georgia EVALID 132401 production-path delta: AG carbon 29.1633 → 29.1012 tons/acre (-0.21% at the population sum). Smaller than the per-tree median delta (1.79%) because the sum is mean-dominated and the high-tail TREECLCD=4 outliers still inflate the average — exactly the gap Phase 1.6 targets next. All 11 unit tests + 160 carbon/nsvb tests + the validation gate remain green. Validation thresholds unchanged because the validation test calls compute_nsvb_biomass directly, not through the estimator. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original Phase 1.6 task was framed as a TREECLCD=4 rotten-cull
methodology investigation because the validation test's top-10 worst
disagreements were all CULL ≥ 95% TREECLCD=4 hardwoods with pyfia
predicting 46x-125x more carbon than FIADB.
After fetching FIADB User Guide v9.1 Appendix K (page K-3) and
cross-checking against the Georgia data, the actual root cause turned
out to be the validation test scope, not the cull formula.
Pyfia's cull formula
(1 - (1 - DENSITY_PROP) * CULL / 100) * Stem Wood
matches Appendix K verbatim. There is no TREECLCD-based dispatch in
either implementation.
The bug: the test was loading STATUSCD=1 trees with no EVALID filter,
pulling in 575k pre-1989 periodic-inventory trees from 1972/1982/1989
panels. Those have FIADB CARBON_AG/DRYBIO_AG computed via the legacy
Component Ratio Method (flat 0.5 carbon fraction), and comparing
pyfia's NSVB recompute against legacy-CRM data was producing the
spurious 1,000-12,000% rel_err outliers. 100% of TREECLCD=4 high-CULL
outliers traced back to the periodic panels — confirmed by their
implied carbon fraction being exactly 0.5 (no S10a entry has that).
Fix: add JOIN POP_PLOT_STRATUM_ASSGN ppsa ON t.PLT_CN = ppsa.PLT_CN
WHERE ppsa.EVALID = 132401 to the validation test SQL, scoping the
comparison to the official Georgia 2024 EXPVOL evaluation
(130,952 NSVB-era trees).
Measured improvement on the EVALID-filtered set:
| Metric | No filter | EVALID 132401 | Improvement |
|----------------|-----------|---------------|-------------|
| Median rel_err | 3.55% | 0.0846% | 42x |
| p99 rel_err | 65.55% | 40.37% | 1.6x |
| Max rel_err | 12,425% | 478% | 26x |
| Within 0.1% | 38.83% | 58.20% | +50% rel |
| Within 1% | 43.07% | 62.91% | +46% rel |
| Within 5% | 53.90% | 73.32% | +36% rel |
Validation ratchets tightened to: median < 0.5%, within-1% > 60%,
within-0.1% > 55%, p99 < 50%, max-null < 0.5%. The remaining ~1% tail
in the EVALID set is small (40% p99, 478% max), scattered across
TREECLCD=3/4 trees with various conditions, and not from a single
methodology bug — deferred to Phase 2+ work.
Phase 1 / 1.5 / 1.6 / 1.7 are all complete. PR 3 ready for review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mechanical format upgrade emitted by a newer uv binary touching the lockfile during a routine `uv run`. No dependency changes — just adds upload-time fields and bumps the schema revision. Standalone commit so it doesn't pollute substantive Phase 1 carbon work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add standing_dead(db, pool='ag'|'bg'|'total') and StandingDeadEstimator mirroring the live-tree estimator. The pipeline runs the same vectorized NSVB component predictions, then applies REF_TREE_DECAY_PROP reductions (DENSITY_PROP x wood, BARK_LOSS_PROP x bark, BRANCH_LOSS_PROP x branch) by hw/sw x DECAYCD and converts to carbon via S10b dead-tree fractions. TREE.CULL is intentionally not applied for dead trees per Appendix K. Validated against FIADB TREE.CARBON_AG on Georgia EVALID 132401 (6,870 standing dead trees): median rel_err 17.89%, biomass ratio median 1.174. The gap traces to broken-top corrections (75% of SDs have ACTUALHT < HT) which are deferred to Phase 2.5. Ratchet thresholds locked at baseline. Also: re-export standing_dead from pyfia top level, add API doc pages for live_tree and standing_dead, add CHANGELOG [Unreleased] entry, update README/CLAUDE.md/DEVELOPMENT.md/getting-started/mkdocs nav. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… nsvb
Two issues caught in pre-merge review:
1. StandingDeadEstimator.apply_filters cast DECAYCD to Utf8 before the
is_not_null() check, which let FIA CSV empty strings ("") pass through
as non-null. They later cast to null Int64 in calculate_values, silently
dropping trees from the estimate. Fix: cast to Int64 first so empty
strings become null and get filtered out.
2. compute_nsvb_dead_biomass, load_carbon_fractions_dead_df, and
load_dead_decay_proportions_df were importable from their direct
modules but missing from pyfia.carbon.nsvb.__init__.py, breaking
the subpackage-level import pattern that the live-tree functions use.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion gates Validation gate: live_tree NSVB vs FIADB parity (Phase 1.5 / 1.6 / 1.7 complete)
…; close all PR 3 review warnings Phase 2.5 broken-top corrections: - Implement Appendix K crown-proportion adjustment (Broken_crn_prop) for branch biomass on trees with ACTUALHT < HT - Implement paraboloid taper volume-ratio approximation ((ACTUALHT/HT)^(2/3)) for wood/bark, replacing the unimplemented Model 6 Schumacher-Hall - Vendor Table S11 (REF_TREE_STND_DEAD_CR_PROP) as dead_cr_prop.csv with load_dead_cr_prop_df() loader — keyed by Bailey ecoregion province × hw/sw - Tighten validation ratchets: median 17.89% → 10.89%, p99 442% → 36.5%, within-50% 71.6% → 99.9% PR 3 review warnings (all 6 closed): 1. Extract CarbonEstimatorBase — deduplicates ~200 lines shared by LiveTreeEstimator and StandingDeadEstimator 2. Add FIA.live_tree() and FIA.standing_dead() convenience methods 3. Vectorize ecosubcd_to_division with ecosubcd_to_division_expr() — pure Polars string expressions, no per-row map_elements dispatch 4. Tighten standing dead validation ratchets (see above) 5. Document CARBON_BG units (pounds) at the BG bridge site 6. Trim carbon/__init__.py docstring from ~260 to ~80 lines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add downed dead wood (Domke 2013), litter (Domke 2016), and soil organic carbon (Domke 2017) estimators — all condition-level, reading pre-computed COND attributes via the same architecture as understory. Pool parameter accepts only 'total' (no AG/BG split). Add total_ecosystem() convenience function that sums all six IPCC/NGHGI pools. Validated against Georgia EVALID 132301: all three pools exact match (0.000000% error) vs manual SQL replication. total_ecosystem sum matches individual pool calls exactly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tem tests - Fix potential IndexError in total_ecosystem when a pool returns 0 rows (filters out empty results before summing, returns fallback on all-empty) - Restore the CONDPROP_UNADJ explanatory comment block in aggregate_results for downed_dead, litter, and soil_organic (was present in understory template) - Add 5 unit tests for total_ecosystem: sum correctness, 7-row output, empty-result handling, year propagation, signature check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 4: condition-level carbon pools + total_ecosystem
Add stock_change() estimator that computes ΔC = C(t₂) - C(t₁) between inventory periods for the four condition-level pools (understory, downed dead, litter, soil organic). Follows the AreaChangeEstimator pattern: t₂ data from EVALID-scoped pipeline, t₁ loaded from full COND table via PREV_PLT_CN + CONDID linking, annualized by REMPER, aggregated via two-stage post-stratification using t₂'s stratification. Tree-level stock change (live_tree, standing_dead) deferred to Phase B; requesting these pools raises ValueError with clear message. Validated against Georgia EVALID 132301: all three non-understory pools exact match (0.000000% error) vs manual SQL replication. 20 unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arbon - Fix docstrings at module, class, and function level that incorrectly said PREVCOND (code uses CONDID for condition linking) - Filter out conditions where ALL carbon columns are NULL at both t1 and t2 to prevent spurious zero deltas from missing data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 5: condition-level carbon stock-change accounting
…tocks
Adds a new pyfia.carbon.nghgi module that orchestrates the six pyFIA
pool estimators into the EPA Chapter 6 LULUCF pool decomposition (AGB,
BGB, Dead Wood, Litter, Soil Mineral) and compares to published Table
6-10 targets. Forest-remaining-forest only; pre-NSVB-transition mode
('fiadb') is the default since EPA's 2024 report consumes the
September-2023 FIADB snapshot.
CONUS-48 reproduction of EPA 2022 Forest Ecosystem total: -1.20%
(= 61,381 vs 62,130 MMT C, with EPA Soil Organic constant added back).
Combined Dead Organic Matter (Dead Wood + Litter) matches within +0.76%
— the apparent per-pool +41% / -33% offsets are a FIADB-vs-report pool
boundary mirage that cancels in aggregate.
- src/pyfia/carbon/nghgi.py — compile_state_stocks, compile_conus_stocks,
load_published_targets, compare_to_published. Defaults to FIADB-stored
CARBON_AG/BG via carbon_pool; mode='nsvb' opt-in for methodology
comparison.
- src/pyfia/carbon/data/nghgi_2024_table_6_10.csv — frozen EPA Chapter
6 stocks 1990, 2005, 2019-2023.
- scripts/nghgi_validation_stage_a.py — CLI driver, ~25s for CONUS-48.
- scripts/nghgi_dead_wood_diagnostic.py — isolates NSVB recompute vs
FIADB-stored standing-dead at population level.
- .gitignore — whitelist src/pyfia/carbon/data/ for NGHGI target CSVs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Confirms Stage A accuracy holds across all available EPA Table 6-10 publication years. Forest Ecosystem (CONUS-48 + EPA Soil Organic constant) matches EPA within -0.73% (2019) to -1.27% (2023). Combined Dead Organic Matter (Dead Wood + Litter) within +0.35% to +0.85% across the same years. Per-pool deltas are stable year-over-year, confirming the ~1% headline gap is structural (missing AK + HI + territories) rather than a 2022 sampling artifact. scripts/nghgi_multi_year_validation.py picks each CONUS-48 state's CURRENT AREA, CURRENT VOLUME EVALID with the closest END_INVYR <= target year and rolls up via pyfia.carbon.nghgi.compile_state_stocks. Synthesises a FOREST_ECO_PLUS_SO row that adds EPA's reported Soil Organic constant back so the comparison is apples-to-apples (Soil Organic is not reproducible from FIADB attributes alone). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vergence Compares pyFIA condition-level stock_change per CONUS-48 state to EPA Annex 3.13 Table A-208 (state-level Net Carbon Flux from all Forest Pools, 2022). Annex 3.13 does not publish state-level stocks tables; A-208 (flux) is the only available state-level published target. Key finding: pyFIA's direct stock-difference produces 4x EPA's published condition-level component nationally. In 12 sparse-forest western/plains states, pyFIA's condition-level flux alone exceeds EPA's total all-pools state flux. This is not a bug — it reflects a methodological divergence: EPA uses the Woodall (2015a) age-class projection compilation approach (smoothing flux across age classes and forest dynamics) while pyFIA computes direct year-on-year remeasurement delta-C. The compilation approach is described in EPA Chapter 6 "Methodology and Time-Series Consistency". Architectural implication: reproducing the *flux* side of the EPA report (Tables 6-8, 6-9, A-208) likely requires implementing the Woodall 2015a age-class projection compilation, which is substantial new code that does not belong in pyFIA's pool-estimator surface. Candidate location: a sibling package that consumes pyFIA stocks and adds the projection pipeline. 3 states (AZ, NM, WY) error in stock_change with ComputeError on column type — flagged for investigation, not blocking Phase 1. - scripts/nghgi_validation_stage_b.py — CLI driver - src/pyfia/carbon/data/nghgi_2024_table_a_208_state_flux_2022.csv — frozen EPA state-level flux targets Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t branding, clean total_ecosystem
Restructure to keep pyfia.carbon as a pure estimation package and move
EPA Chapter 6 reproduction work into scripts/.
Blockers fixed
--------------
* Hardcoded /Users/cmihiar/Projects/data/fiadb paths in 4 NGHGI scripts
replaced with a --db-dir CLI flag and PYFIA_FIADB_DIR env var (default
./data/fiadb), resolved by a shared scripts/nghgi/_paths.py helper.
* src/pyfia/carbon/data/ removed: NGHGI EPA-target CSVs were not in
pyproject.toml's package-data and would have failed at runtime for any
pip-installed user. CSVs now ship inside scripts/nghgi/data/ and load
via a filesystem path, not importlib.resources.
* Resolved the missing-CSV README mismatch by removing the library
data/ subpackage entirely.
Warnings addressed
------------------
* src/pyfia/carbon/nghgi.py moved to scripts/nghgi/_compile.py. It was
report-reproduction code (EPA Chapter 6 column semantics, EPA target
CSV loaders), not estimation API. pyfia public surface no longer
exposes compile_state_stocks / compare_to_published / EPA pool labels.
* Stripped "Schmidt Sciences" / "Synthetic Inventory" grant branding from
6 carbon module docstrings (live_tree, standing_dead, understory,
downed_dead, litter, soil_organic). Grant-phase references ("Phase 1",
"Phase 2", "Phase 2.5", "Phase B") rewritten as neutral
"current/deferred" language.
* total_ecosystem now accepts grp_by and forwards it to every pool, so
grouped totals match sibling estimator behavior. SE columns are
intentionally NULL on the TOTAL_ECOSYSTEM row: combining per-pool SEs
requires modelling cross-pool sampling covariance through the shared
post-stratification, which the naive sqrt(sum(SE^2)) does not do. The
docstring explains this so users don't expect a precision figure.
* scripts/nghgi/_compile.py _stock() helper now raises on empty pool
results instead of silently coercing to zero — masking an upstream
filter as a true zero stock was the prior behavior.
Nits
----
* Narrowed bare `except Exception: pass` in live_tree.py and
standing_dead.py BG-bridge cross-era warnings to
(ValueError, TypeError, AttributeError, IndexError, KeyError) and
log at debug instead of dropping the exception silently.
* Trimmed src/pyfia/__init__.py module docstring: removed
"FIA Python Ecosystem" marketing block (PyFIA / GridFIA / PyFVS /
AskFIA) so the package docstring focuses on what pyfia does.
Verification
------------
* 910 unit tests pass.
* `uv run mypy src/pyfia/carbon` is clean.
* `uv run ruff check src/pyfia/carbon scripts/nghgi src/pyfia/__init__.py`
is clean.
* scripts/nghgi/{stage_a,stage_b,multi_year,dead_wood_diagnostic}.py all
import and respond to --help; _compile.load_published_targets() reads
the EPA Table 6-10 CSV from its new path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover the gap flagged in re-review of PR 83: grp_by was added to total_ecosystem's signature but had no unit coverage. Three new tests: * grp_by is forwarded as-is to every one of the six pool estimators * TOTAL_ECOSYSTEM rows are summed within each group, not collapsed across * pool rows survive grp_by — output is (6 pools × N groups) + N TOTAL rows Also asserts grp_by is in the public signature (defaults to None) so the parameter can't be silently dropped. 914 unit tests pass (was 910). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
pyfia.carbonsubpackage implementing design-based forest carbon estimation following the USFS National Scale Volume and Biomass (NSVB) methodology, plus NGHGI validation scaffolding that reproduces EPA Chapter 6 forest-carbon tables.Carbon estimation library
live_tree,standing_dead— coefficient-driven NSVB equations with Jenkins fallback, broken-top corrections for standing dead, ECOSUBCD/DIVISION lookups, separate live/dead carbon fractions.downed_dead,litter,soil_organic,understory— use stock estimates carried onCONDrather than recomputing.total_ecosystem(sum of all 6 pools),stock_change(Δ stock between paired CONDs across remeasurements).CarbonEstimatorBaseto reduce duplication; vectorized NSVB equation library; coefficient CSVs locked under sentinel tests; full unit + property + validation test coverage; docs (docs/api/live_tree.md,docs/api/standing_dead.md,docs/getting-started.md).NGHGI validation (Stage A + B)
src/pyfia/carbon/nghgi.py: helpers for aligning pyfia carbon estimates with EPA's National Greenhouse Gas Inventory tables.scripts/nghgi_validation_stage_a.py: reproduces EPA 2024 Chapter 6 Table 6-10 (CONUS-48 forest carbon stocks). Ecosystem total within −1.8% of EPA 2022; Dead Wood pool differs ~+38% (under investigation).scripts/nghgi_multi_year_validation.py: stocks across EVALIDs 2019–2023.scripts/nghgi_validation_stage_b.py: state-level flux comparison; documents that EPA's flux uses Woodall 2015a age-class projection while pyfia uses direct ΔC, so flux numbers diverge by construction.scripts/nghgi_dead_wood_diagnostic.py: drill-down for the Dead Wood gap.nghgi_2024_table_6_10.csv,nghgi_2024_table_a_208_state_flux_2022.csv.Validation
28 commits across 5 internally-reviewed PRs plus 3 follow-up NGHGI commits.
Test plan
uv run pytest tests/unit -k carbonpassesuv run pytest tests/property/test_nsvb_properties.pypassesuv run pytest tests/validation/test_live_tree_nsvb.pymatches FIADB for EVALID 132401uv run pytest tests/validation/test_standing_dead_nsvb.pymatches FIADBuv run ruff format --check && uv run ruff check && uv run mypy src/pyfia/carbon/from pyfia import carbon; carbon.total_ecosystem(db, evalid=...)uv run python scripts/nghgi_validation_stage_a.pyreproduces EPA Table 6-10 within tolerance🤖 Generated with Claude Code