Skip to content

fix(dwarf): remap DW_AT_high_pc lengths to the fused layout (#319 inc 1)#320

Open
avrabe wants to merge 2 commits into
mainfrom
fix/319-dwarf-high-pc-remap
Open

fix(dwarf): remap DW_AT_high_pc lengths to the fused layout (#319 inc 1)#320
avrabe wants to merge 2 commits into
mainfrom
fix/319-dwarf-high-pc-remap

Conversation

@avrabe

@avrabe avrabe commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What

Fixes the dominant cause of #319meld fuse producing DWARF that fails llvm-dwarfdump --verify with hundreds of out-of-parent + overlapping DIE ranges.

Root cause

gimli routes only address-form values through convert_address; DW_AT_high_pc encoded as a length (DW_FORM_udata, the common Rust/LLVM form) is a constant it copies verbatim. Fused bodies change length (LEB re-encode of the merged index space), so every subprogram/lexical-block/inlined DIE became [remapped_low, remapped_low + STALE_input_len) — overrunning its parent (grew) or the next function (shrank).

Fix

After the gimli conversion, walk the write DIEs and rewrite each low_pc+high_pc-length pair via a new AddressRemap::corrected_high_pc: invert the remap for low_pc within its output body, then use the span's output_body_end when the range ends at the function end (avoids the aliased-boundary translate() None) or translate for interior ends. Tombstoned (dropped) low_pc is left untouched — llvm-dwarfdump ignores those DIEs.

Measured (records+variants repro)

before after
overlapping 480 0
not-contained 534 11
invalid-expr / invalid-loc 18 / 17 18 / 17
total 1049 46

Scope

Inc 1 clears ~96% (all overlapping + 98% of out-of-parent). Residual is distinct mechanisms tracked as follow-ups on #319: nested DW_AT_ranges on inlined/lexical DIEs (11/4 not-contained), and the .debug_loc tombstone-vs-base-selection collision (Cause #2, invalid-expr/loc).

Test: corrected_high_pc_rewrites_length_to_fused_layout (grew/shrank/interior/tombstone) — deterministic, no llvm-dwarfdump dependency. 24 dwarf tests pass.

Refs: #319.

🤖 Generated with Claude Code

meld fuse produced DWARF that fails `llvm-dwarfdump --verify` with hundreds
of out-of-parent + overlapping DIE address ranges, though the inputs verify
clean.

Root cause: gimli routes only *address*-form values through convert_address;
`DW_AT_high_pc` encoded as a length (`DW_FORM_udata`, the common Rust/LLVM
form) is a constant it copies verbatim. Fused bodies change length (LEB
re-encode of the merged index space), so every subprogram/block/inlined DIE
became `[remapped_low, remapped_low + STALE_input_len)` — overrunning its
parent (grew) or the next function (shrank).

Fix: after the gimli conversion, walk the write DIEs and rewrite each
`low_pc`+`high_pc`-length pair to the fused output length via a new
`AddressRemap::corrected_high_pc` — invert the remap for `low_pc` within its
output body, then take the span's `output_body_end` for a range ending at the
function end (avoids the aliased-boundary translate() None) or `translate` for
interior ends (lexical blocks / inlined subroutines). Tombstoned (dropped)
low_pc is left untouched — llvm-dwarfdump ignores those DIEs.

Effect on the #319 repro (records+variants): overlapping 480 -> 0,
out-of-parent 534 -> 11 (~96% of errors cleared; 1049 -> 46). Residual 11
(nested DW_AT_ranges on inlined/lexical DIEs) and Cause #2 (.debug_loc
tombstone vs base-address-selection escape, 18+17) are distinct mechanisms
tracked as follow-up increments on #319.

Test: corrected_high_pc_rewrites_length_to_fused_layout (grew/shrank/interior/
tombstone). 24 dwarf tests pass.

Refs: #319.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

LS-N verification gate

58/58 approved LS entries verified

count
Passed (≥1 test, all green) 58
Failed (≥1 test failure) 0
Missing (no ls_*_NN_* test found) 0

Approved loss-scenarios.yaml entries are expected to have a
regression test named ls_<letter>_<num>_* (e.g. LS-A-11
ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.

Failed LS entries

(none)

Missing regression tests

(none)

Updated automatically by tools/post_verification_comment.py.
Source of truth: safety/stpa/loss-scenarios.yaml.

… body

Mythos delta-pass hardening for #320. A DIE whose high_pc length exceeds its
own function body would translate into the NEXT function and emit a
plausible-but-wrong length (the LS-D-1 class). Unreachable from valid
toolchain output (subprograms end at input_end; nested ranges stay interior;
CUs use DW_AT_ranges not high_pc), but guard it explicitly: bail and leave the
attribute unchanged rather than correct across a function boundary. Test:
corrected_high_pc(len past body) -> None.

Refs: #319.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@avrabe

avrabe commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Mythos delta-pass — NO FINDINGS (meld-core/src/dwarf.rs)

Ran the discover protocol on this PR's dwarf.rs delta (a fresh adversarial session). Five hypotheses tested, four refuted empirically:

  1. Wrong-span reverse map at a boundary — refuted: output bodies are non-overlapping, so a concrete out_low at a shared output_body_end == next.output_body_start boundary resolves to the next function's start (its own low_pc), not the prior's exclusive end. The input-space aliasing translate must tombstone doesn't exist in output space.
  2. reverse_in_span mis-inversion — refuted: InstrOffset.new is strictly increasing (each operator ≥1 output byte), so the exact new == instr_new match is unambiguous; the region-1/2 split is the exact inverse of translate.
  3. orig_end past the function — the one theoretical wrong-length path (a high_pc exceeding its body would translate into the next function). Unreachable from valid toolchains (subprograms end at input_end; nested blocks/inlines stay interior; CUs use DW_AT_ranges, not high_pc). Hardened anyway (commit 594ba57): bail when orig_end > span.input_end, so it's now provably impossible even on malformed input (LS-D-1).
  4. Silent no-op on real DWARF (the key risk: clang/rustc emit high_pc as DW_FORM_data4, my match is on Udata) — refuted end-to-end: gimli normalizes the constant-class high_pc to Udata in the read→WriteDwarf::from path, so the match fires and the length is corrected. Verified with hand-assembled data4 bytes.
  5. Mutation walk / adapter unit — refuted: correct_high_pc_lengths runs before append_adapter_unit, so the synthetic <meld-adapter> unit is never walked; the DFS collects ids then mutates per id (no cycles, each once).

No failing oracle for any wrong-attribution defect. Adding mythos-pass-done.

@avrabe avrabe added the mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Mythos delta-pass (auto)

NO FINDINGS across 1 Tier-5 file(s)

File Verdict Hypothesis
`` ✅ NO FINDINGS

Auto-run via anthropics/claude-code-action@v1
(SHA-pinned) on the touched Tier-5 files, using the
maintainer's Max-plan OAuth token. See
.github/workflows/mythos-auto.yml and
scripts/mythos/discover.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant