Fix #751: include the full generator definition in the cache hash#783
Open
phverg wants to merge 1 commit into
Open
Fix #751: include the full generator definition in the cache hash#783phverg wants to merge 1 commit into
phverg wants to merge 1 commit into
Conversation
``Ttptttg._sha256_input_yaml_hexdigest`` hashed only
``{parameters, vlnv, gapi}`` from ``generator_input``. Editing
the generator -- whether by pointing ``command:``/``interpreter:``
at a different file or by editing the script's bytes in place --
left the hash identical, so subsequent runs silently served the
previously-cached output even though the program that produced
it had changed.
Fold the generator definition into the hash:
* ``command`` -- declared script path
* ``interpreter`` -- declared interpreter name
* SHA-256 of the resolved script bytes -- closes the in-place
edit gap (same path, same declarations, new bytes). Same
pattern branch olofk#777 follow-ups use for provider patch bytes.
Deliberately NOT hashed:
* The resolved interpreter path from ``shutil.which(interpreter)``
-- depends on the system PATH at run time and is not part of
the generator definition the user signs in their .core file.
* The generator's ``root`` path -- it's an absolute path that
differs between machines/checkouts, which would make hashes
non-portable and cache directory names environment-specific.
Two generators with the same relative ``command`` rooted under
different directories are still distinguished by the script
bytes hash: same bytes means functionally equivalent output,
so sharing the cache is the desired behaviour.
This necessarily invalidates existing on-disk caches for
cacheable generators on first run after upgrade (the directory
name embeds the hash). That's intentional: users get one extra
regeneration and consistent behaviour afterwards. The hardcoded
reference hashes in ``test_generators`` are updated accordingly.
Fixes olofk#751.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ae1d416 to
b9d8c4d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #751.
Ttptttg._sha256_input_yaml_hexdigesthashes only{parameters, vlnv, gapi}fromgenerator_input. Editing the generator — whether by pointingcommand:/interpreter:at a different file or by editing the script's bytes in place — leaves the hash identical, so subsequent runs silently serve the previously-cached output even though the program that produced it has changed.Solution
Fold the generator definition into the hash:
command,interpreter, and a SHA-256 of the resolved script bytes. A clean rebuild now happens whenever any of these changes.Why slightly broader than the literal ask in the issue: "the command being executed" naturally includes the script content, not just the declared path. Hashing only the declared
command:string catches renames but not in-place script edits — which is the common "I updated the generator and the cache still served stale output" symptom. Hashing the resolved bytes closes that gap and mirrors the pattern PR #777 uses for provider patch bytes.Deliberately NOT hashed:
shutil.which(interpreter). That depends on the system PATH at run time and is not part of the generator definition the user signs in their.corefile. The interpreter name is what we sign.rootpath. It's an absolute path that differs between machines and checkouts, which would make cache directory names environment-specific and the test reference hashes machine-specific. Two generators with the same relativecommandrooted under different directories are still distinguished by the script bytes hash: if their bytes differ, the cache differs; if their bytes match, sharing the cache is the desired behaviour.This necessarily invalidates existing on-disk caches for cacheable generators on first run after upgrade (the directory name embeds the hash). Users get one extra regeneration and consistent behaviour afterwards. The hardcoded reference hashes in
test_generatorsare updated accordingly.Test plan
test_generator_input_hash_includes_command— proves changingcommandflips the hash; stable across repeats.test_generator_input_hash_includes_interpreter— proves swappingpython3→python2flips the hash.test_generator_input_hash_includes_script_bytes— missing → v1 → v2 → v1-again sequence, proving the hash depends on bytes not on mtime.test_generatorsreference hashes updated; fulltests/test_edalizer.pygreen (7 passed).pre-commit run -aclean.