Skip to content

Fix #751: include the full generator definition in the cache hash#783

Open
phverg wants to merge 1 commit into
olofk:mainfrom
phverg:fix-generator-cache-hash-command
Open

Fix #751: include the full generator definition in the cache hash#783
phverg wants to merge 1 commit into
olofk:mainfrom
phverg:fix-generator-cache-hash-command

Conversation

@phverg
Copy link
Copy Markdown
Contributor

@phverg phverg commented May 17, 2026

Problem

Fixes #751.

Ttptttg._sha256_input_yaml_hexdigest hashes only {parameters, vlnv, gapi} from generator_input. Editing the generator — whether by pointing command:/interpreter: at a different file or by editing the script's bytes in place — leaves the hash identical, so subsequent runs silently serve the previously-cached output even though the program that produced it has changed.

Solution

Fold the generator definition into the hash: command, interpreter, and a SHA-256 of the resolved script bytes. A clean rebuild now happens whenever any of these changes.

Why slightly broader than the literal ask in the issue: "the command being executed" naturally includes the script content, not just the declared path. Hashing only the declared command: string catches renames but not in-place script edits — which is the common "I updated the generator and the cache still served stale output" symptom. Hashing the resolved bytes closes that gap and mirrors the pattern PR #777 uses for provider patch bytes.

Deliberately NOT hashed:

  • The resolved interpreter path from shutil.which(interpreter). That depends on the system PATH at run time and is not part of the generator definition the user signs in their .core file. The interpreter name is what we sign.
  • The generator's root path. It's an absolute path that differs between machines and checkouts, which would make cache directory names environment-specific and the test reference hashes machine-specific. Two generators with the same relative command rooted under different directories are still distinguished by the script bytes hash: if their bytes differ, the cache differs; if their bytes match, sharing the cache is the desired behaviour.

This necessarily invalidates existing on-disk caches for cacheable generators on first run after upgrade (the directory name embeds the hash). Users get one extra regeneration and consistent behaviour afterwards. The hardcoded reference hashes in test_generators are updated accordingly.

Test plan

  • test_generator_input_hash_includes_command — proves changing command flips the hash; stable across repeats.
  • test_generator_input_hash_includes_interpreter — proves swapping python3python2 flips the hash.
  • test_generator_input_hash_includes_script_bytes — missing → v1 → v2 → v1-again sequence, proving the hash depends on bytes not on mtime.
  • Existing test_generators reference hashes updated; full tests/test_edalizer.py green (7 passed).
  • pre-commit run -a clean.

``Ttptttg._sha256_input_yaml_hexdigest`` hashed only
``{parameters, vlnv, gapi}`` from ``generator_input``. Editing
the generator -- whether by pointing ``command:``/``interpreter:``
at a different file or by editing the script's bytes in place --
left the hash identical, so subsequent runs silently served the
previously-cached output even though the program that produced
it had changed.

Fold the generator definition into the hash:

* ``command`` -- declared script path
* ``interpreter`` -- declared interpreter name
* SHA-256 of the resolved script bytes -- closes the in-place
  edit gap (same path, same declarations, new bytes). Same
  pattern branch olofk#777 follow-ups use for provider patch bytes.

Deliberately NOT hashed:

* The resolved interpreter path from ``shutil.which(interpreter)``
  -- depends on the system PATH at run time and is not part of
  the generator definition the user signs in their .core file.
* The generator's ``root`` path -- it's an absolute path that
  differs between machines/checkouts, which would make hashes
  non-portable and cache directory names environment-specific.
  Two generators with the same relative ``command`` rooted under
  different directories are still distinguished by the script
  bytes hash: same bytes means functionally equivalent output,
  so sharing the cache is the desired behaviour.

This necessarily invalidates existing on-disk caches for
cacheable generators on first run after upgrade (the directory
name embeds the hash). That's intentional: users get one extra
regeneration and consistent behaviour afterwards. The hardcoded
reference hashes in ``test_generators`` are updated accordingly.

Fixes olofk#751.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@phverg phverg force-pushed the fix-generator-cache-hash-command branch from ae1d416 to b9d8c4d Compare May 17, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generator caching logic

1 participant