Skip to content

Cache the hashCode for dataflow expressions#1643

Open
wmdietl wants to merge 1 commit intomasterfrom
hashcodecache
Open

Cache the hashCode for dataflow expressions#1643
wmdietl wants to merge 1 commit intomasterfrom
hashcodecache

Conversation

@wmdietl
Copy link
Copy Markdown
Member

@wmdietl wmdietl commented Apr 21, 2026

The dataflow expressions are in many maps, so the hash is computed frequently.
Objects.hashCode allocates an array for the varargs, which is bad on the hot path.
The additional memory overhead seems low, and outweighed by the array allocations.

Copilot AI review requested due to automatic review settings April 21, 2026 01:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves performance of the dataflow JavaExpression hierarchy by caching hashCode() results for commonly-used expression types, reducing repeated Objects.hash(...) varargs allocations when these expressions are used as map/set keys.

Changes:

  • Add per-instance hashCode caching fields to multiple JavaExpression subclasses.
  • Update hashCode() implementations to memoize the computed value.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
dataflow/src/main/java/org/checkerframework/dataflow/expression/ValueLiteral.java Cache hashCode() result for literal expressions.
dataflow/src/main/java/org/checkerframework/dataflow/expression/UnaryOperation.java Cache hashCode() result for unary operations.
dataflow/src/main/java/org/checkerframework/dataflow/expression/MethodCall.java Cache hashCode() result for method calls (including constructor special-case).
dataflow/src/main/java/org/checkerframework/dataflow/expression/LocalVariable.java Cache hashCode() result for local variables.
dataflow/src/main/java/org/checkerframework/dataflow/expression/FormalParameter.java Cache hashCode() result for formal parameters.
dataflow/src/main/java/org/checkerframework/dataflow/expression/FieldAccess.java Cache hashCode() result for field accesses.
dataflow/src/main/java/org/checkerframework/dataflow/expression/BinaryOperation.java Cache hashCode() result for binary operations.
dataflow/src/main/java/org/checkerframework/dataflow/expression/ArrayCreation.java Cache hashCode() result for array creations.
dataflow/src/main/java/org/checkerframework/dataflow/expression/ArrayAccess.java Cache hashCode() result for array accesses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +108 to +116
/** Cache the hashCode. Recomputed if zero. */
private int hashCodeCache = 0;

@Override
public int hashCode() {
return Objects.hash(operationKind, operand);
if (hashCodeCache == 0) {
hashCodeCache = Objects.hash(operationKind, operand);
}
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCodeCache uses 0 as the "not computed" sentinel. If the real hash code computes to 0, this will recompute (and re-allocate via Objects.hash(...)) on every call. Prefer a separate boolean (or nullable Integer) so a computed 0 is still cached.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +81
/** Cache the hashCode. Recomputed if zero. */
private int hashCodeCache = 0;

@Override
public int hashCode() {
VarSymbol vs = (VarSymbol) element;
return Objects.hash(
index,
vs.name.toString(),
TypeAnnotationUtils.unannotatedType(vs.type).toString(),
vs.owner.toString());
if (hashCodeCache == 0) {
VarSymbol vs = (VarSymbol) element;
hashCodeCache =
Objects.hash(
index,
vs.name.toString(),
TypeAnnotationUtils.unannotatedType(vs.type).toString(),
vs.owner.toString());
}
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCodeCache uses 0 as the "not computed" sentinel. If the computed hash ends up 0, Objects.hash(...) will run on every call. Use a separate computed flag (or nullable Integer) so 0 can be cached too.

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +115
/** Cache the hashCode. Recomputed if zero. */
private int hashCodeCache = 0;

@Override
public int hashCode() {
return Objects.hash(array, index);
if (hashCodeCache == 0) {
hashCodeCache = Objects.hash(array, index);
}
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCodeCache uses 0 as the "not computed" sentinel. If Objects.hash(array, index) computes to 0, this will recompute every call (and re-allocate) rather than caching. Use a separate boolean/nullable cache to distinguish "not computed" from a computed 0.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +174
/** Cache the hashCode. Recomputed if zero. */
private int hashCodeCache = 0;

@Override
public int hashCode() {
return Objects.hash(value, type.toString());
if (hashCodeCache == 0) {
hashCodeCache = Objects.hash(value, type.toString());
}
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCodeCache uses 0 as the "not computed" sentinel. If the real hash code computes to 0, this will recompute (and re-allocate via Objects.hash(...)) on every call, defeating the caching on that path. Use a separate boolean (or an Integer cache) to distinguish "not computed" from a computed 0.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +91
/** Cache the hashCode. Recomputed if zero. */
private int hashCodeCache = 0;

@Override
public int hashCode() {
VarSymbol vs = (VarSymbol) element;
return Objects.hash(vs.pos, vs.name, vs.owner);
if (hashCodeCache == 0) {
VarSymbol vs = (VarSymbol) element;
hashCodeCache = Objects.hash(vs.pos, vs.name, vs.owner);
}
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCodeCache uses 0 as the "not computed" sentinel. If Objects.hash(vs.pos, vs.name, vs.owner) evaluates to 0, the hash code will be recomputed every call. Use a separate computed flag (or nullable cache) to cache 0 as a valid value.

Copilot uses AI. Check for mistakes.
Comment on lines +139 to 143
hashCodeCache = Objects.hash(operationKind, left, right);
}
return hashCodeCache;
}

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equals treats some operations as commutative (it considers (left,right) equal to (right,left)), but hashCode() is order-dependent (Objects.hash(operationKind, left, right)). This violates the equals/hashCode contract and can break use as a map/set key. Make hashCode() commutative for commutative operators (e.g., combine operand hashes in an order-independent way), or make equals order-sensitive.

Suggested change
hashCodeCache = Objects.hash(operationKind, left, right);
}
return hashCodeCache;
}
hashCodeCache = computeHashCode();
}
return hashCodeCache;
}
/**
* Computes a hash code consistent with equals(Object).
*
* <p>For commutative operations, the operand contribution must be order-independent because
* equals(Object) considers (left, right) equal to (right, left). For non-commutative
* operations, preserve the ordered hash.
*/
private int computeHashCode() {
if (isCommutative()) {
int leftHash = left.hashCode();
int rightHash = right.hashCode();
int first = Math.min(leftHash, rightHash);
int second = Math.max(leftHash, rightHash);
return Objects.hash(operationKind, first, second);
}
return Objects.hash(operationKind, left, right);
}

Copilot uses AI. Check for mistakes.
Comment on lines 124 to +128
public int hashCode() {
return Objects.hash(getField(), getReceiver());
if (hashCodeCache == 0) {
hashCodeCache = Objects.hash(getField(), getReceiver());
}
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equals treats this and super receivers as equivalent in some cases, but hashCode() hashes the concrete receiver via Objects.hash(getField(), getReceiver()). That violates the equals/hashCode contract and can break lookups in hash-based collections. Adjust hashCode() to mirror the equals receiver normalization (e.g., hash a canonical receiver value when it’s ThisReference/SuperReference).

Copilot uses AI. Check for mistakes.
Comment on lines 174 to +183
public int hashCode() {
if (method.getKind() == ElementKind.CONSTRUCTOR) {
// No two constructor instances have the same hashcode.
return System.identityHashCode(this);
if (hashCodeCache == 0) {
if (method.getKind() == ElementKind.CONSTRUCTOR) {
// No two constructor instances have the same hashcode.
hashCodeCache = System.identityHashCode(this);
} else {
hashCodeCache = Objects.hash(method, receiver, arguments);
}
}
return Objects.hash(method, receiver, arguments);
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equals treats calls on this and super as equal (isComparingSuperWithThis), but hashCode() includes the concrete receiver in Objects.hash(method, receiver, arguments). This violates the equals/hashCode contract and can break map/set usage. Update hashCode() to apply the same receiver equivalence as equals (e.g., canonicalize ThisReference/SuperReference before hashing). Also, using 0 as the cache sentinel means a computed 0 (including a possible identityHashCode of 0) won’t actually be cached.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +99
/** Cache the hashCode. Recomputed if zero. */
private int hashCodeCache = 0;

@Override
public int hashCode() {
return Objects.hash(dimensions, initializers, getType().toString());
if (hashCodeCache == 0) {
hashCodeCache = Objects.hash(dimensions, initializers, getType().toString());
}
return hashCodeCache;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caches the hash code, but dimensions/initializers are stored and returned as mutable lists. If a caller mutates either list after hashCode() is first computed, the cached value becomes stale and breaks hash-based collections. Consider defensive copies (and returning unmodifiable views) to make these fields effectively immutable, or avoid caching here if mutation is expected. Also, using 0 as the cache sentinel means a computed 0 won’t be cached.

Copilot uses AI. Check for mistakes.
@wmdietl
Copy link
Copy Markdown
Member Author

wmdietl commented Apr 21, 2026

@copilot Can you measure whether this has any performance benefit compared to CI running on master?

Copy link
Copy Markdown

Copilot AI commented Apr 21, 2026

@copilot Can you measure whether this has any performance benefit compared to CI running on master?

I measured this against CI on master using Actions runs:

For the 13 matching completed jobs so far, the Run test script ... step totals are:

  • master: 9,744s
  • PR: 9,681s
  • delta: -63s (~0.6% faster)

Largest swings are mixed (e.g., cftests-nonjunit faster by 128s, typecheck-part2 slower by 80s), so the signal looks small and somewhat noisy at CI level. Commit: e3008d8. UI screenshot: N/A (no UI changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants