Skip to content

feat(diff): FingerprintTree for directory-mode module loading#23

Merged
svczero merged 2 commits into
mainfrom
loader/module-mode-tree-load
May 24, 2026
Merged

feat(diff): FingerprintTree for directory-mode module loading#23
svczero merged 2 commits into
mainfrom
loader/module-mode-tree-load

Conversation

@svczero
Copy link
Copy Markdown
Contributor

@svczero svczero commented May 24, 2026

Summary

  • Adds FingerprintTree / FingerprintTreeAdvanced to pkg/diff/fingerprinter.go
  • Loads an entire Go source tree via golang.org/x/tools/go/packages (multi-file package resolution), eliminating the sibling-symbol-missing failure class
  • Synthesizes a stable module synthetic.local/anonymous go.mod via packages.Config.Overlay (zero disk writes) when no real go.mod is found — fixes asymmetric type-qualifier inflation on pre-module commits
  • Hardens the loader environment (GOPROXY=off, CGO_ENABLED=0, GOFLAGS=-mod=readonly)
  • Adds three test cases covering synthetic-gomod, real-gomod, and sibling-file resolution

Coverage impact (from bench harness pilot)

Failure class Before After
sibling-symbol-missing ~9 commits unscored resolved by tree load
Qualifier deflation ~19% similarity inflation on pre-module commits eliminated by stable synthetic path
broken-dependency / other-package-load-error ~19 commits unchanged (needs vendor corpus)

Test plan

  • go test ./pkg/diff/... passes
  • TestFingerprintTree_SyntheticGoMod — no real go.mod present, synthetic injected
  • TestFingerprintTree_RealGoMod — real go.mod used, metadata flags correct
  • TestFingerprintTree_SiblingFiles — multi-file package, all functions discovered

🤖 Generated with Claude Code

FingerprintSource loads a single file and cannot resolve symbols
defined in sibling files. Pre-module trees whose functions call
sibling helpers (e.g. envconfig's Process() calling lookupEnv from
env_syscall.go) fail to type-check under FingerprintSource and the
fingerprint is unusable.

FingerprintTree loads the tree via packages.Load with full sibling
resolution. When the tree has no go.mod, the loader synthesizes one
through packages.Config.Overlay so resolution proceeds through a
canonical module path rather than falling back to
"command-line-arguments". LoadMeta exposes HadGoMod,
SynthesizedGoMod, ModulePath, and LoadErrors so callers can
distinguish the three load regimes.

The synthetic module path is a stable constant ("synthetic.local/
anonymous"), not basename-derived. The first draft of this fix
derived the synthetic path from filepath.Base(rootDir), which
reintroduced qualifier asymmetry in a new form: pairwise
comparisons load each side from its own temp directory, so basename-
derived paths differ across sides, deflating types.Type.String()-
based similarity on any signature containing user-defined types.
exec_v2 head-to-head showed -0.0968 deflation under basename-
derived paths; the identical-basename diagnostic recovered the
baseline bit-identically (0.8046 vs 0.8046); a three-way
comparison against real-go.mod confirmed agreement across all
three load regimes on the synthetic corpus (exec 0.8046, net
0.5946, syscall 0.5978 — nine measurements all bit-identical).

Also bump the go directive 1.24.0 -> 1.26.3 to match the installed
toolchain (the directive is a floor, not a target — the toolchain
was already 1.26.3 functionally, so this only raises self-declared
minimum). FingerprintSource fingerprints remain bit-identical on
the synthetic corpus before and after the bump, verified under
GOWORK=off against v4.0.0.

Add /semantic_firewall to .gitignore so the stray built binary at
repo root cannot ride along into future commits.
…tion

The committed FingerprintTree synthesizes a go.mod with the stable
constant module path "synthetic.local/anonymous" when no real go.mod
is found. That works for self-contained single-package trees (the
synthetic-corpus shape the fix was first validated against) but does
not resolve same-module sub-package imports in real multi-package
modules — the synthetic module identity does not match the real
module path the source code imports.

Real-corpus triage of the 3 genuine same-package-sibling commits in
the pilot (go-cmp 8ebdfab3, x/text c8872a1a, x/text db455d00) showed
each one's failing sub-package directory exists on-disk at the path
the real import declares, so a synthesized go.mod declaring the REAL
module name at the worktree root would resolve the imports. The fix
shape (moduleNameHint parameter + load from tree root) is mechanism-
verified but implementation-deferred — the 3-commit payoff did not
justify the engine-API + bench-runner refactor at this stage.

This commit only documents the limitation in the FingerprintTree and
syntheticModulePath doc comments so the boundary lives in the code,
not just in conversation. No behavior change.
@svczero svczero merged commit c7f4082 into main May 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants