investigation: repeated NewClassTree visits in Nullness checker#1345
investigation: repeated NewClassTree visits in Nullness checker#1345zyf265600 wants to merge 7 commits intoeisop:masterfrom
Conversation
Cache NewClassTree processing in NullnessNoInitAnnotatedTypeFactory to prevent duplicate type annotation calculations.
wmdietl
left a comment
There was a problem hiding this comment.
Thanks for starting to look into performance improvements!
On the highest level, it would be useful to add a better description of your motivation to the PR description. How many invocations of this method did you observe and why does caching it improve performance? Is it sound to cache the result or could a later execution compute a different result?
For this particular case, it is rather odd that computing the type of a new class tree is slow, as they are always non-null.
Your added test case should illustrate the performance gain, ideally using a jtreg test with timeout, on a complicated case, see e.g. these tests.
A few more comments inline.
| private final ExecutableElement mapGet = | ||
| TreeUtils.getMethod("java.util.Map", "get", 1, processingEnv); | ||
|
|
||
| // High-level optimization: cache entire addComputedTypeAnnotations processing flow |
There was a problem hiding this comment.
The Javadocs should document what is being cached, in particular if the key/value is a String. What format is the String in?
For a cache you would also expect to see the value it is mapped to, so why is a Set enough?
Who cleans up this cache? There are many allocations in large programs and we don't want a Set that becomes arbitrarily large.
Maybe it should be cleared every time a new root is set? See here.
…rn if type already has @nonnull.
|
Thanks for the review! Motivation (measured on a tiny example): Observed NewClassTree visits in a single compile: Reason: multiple subcheckers traverse the same AST; AnnotatedTypeFactory intentionally does not cache NewClassTree to avoid phase/context mismatches. Soundness: Global/phase-crossing caching for NewClassTree is not sound (later executions can differ due to dataflow refinement, initialization state, and type-argument inference). I removed the earlier cross-call cache. What I changed (safe optimizations, semantics unchanged): |
wmdietl
left a comment
There was a problem hiding this comment.
Thanks for looking into performance issues. Please also adapt the title of your PRs or open new PRs if they change substantially.
| if (type.hasEffectiveAnnotation(NONNULL)) { | ||
| return null; | ||
| } | ||
| type.addMissingAnnotation(NONNULL); |
There was a problem hiding this comment.
addMissingAnnotation already checks whether the annotation is present, so this change doesn't achieve anything.
Do you actually see any performance differences?
There was a problem hiding this comment.
Thanks for the detailed discussion.
I added a small perf harness to measure the effect of the extra hasEffectiveAnnotation(NONNULL) fast-path.
-
Test file:
checker/jtreg/nullness/perf/NewClassPerf.java
This generates a synthetic source with thousands ofnewexpressions and compares two variants:- A = fast-path enabled (extra
if) - B = fast-path disabled (
-Dcf.skipNonnullFastPath=true)
- A = fast-path enabled (extra
-
How to run:
../jtreg/bin/jtreg \ -workDir:checker/build/jtreg/perf/work \ -reportDir:checker/build/jtreg/perf/report \ -verbose:summary -samevm '-keywords:!ignore' \ "-javacoptions:-classpath checker/dist/checker.jar" \ "-vmoptions:-classpath checker/dist/checker.jar -Xms2g -Xmx2g \ -Dperf.runs=10 -Dperf.groups=400" \ -timeoutFactor:4 \ checker/jtreg/nullness/perf/NewClassPerf.java
-
Results (10×400 runs, fixed 2 GB heap, JDK 17, macOS arm64):
Across both AB and BA interleaved protocols, variant B was consistently as fast or slightly faster (≈2–3% median). In other words, the newifdoes not bring a measurable benefit, as you mentionedaddMissingAnnotationalready checks for existing annotations, so the extra condition only adds overhead.
Final conclusion: After all these investigations, it shows that the repeated visitNewClass calls are expected: multiple sub-checkers walk the same AST. This is not a defect, and at this point we don’t see a straightforward or sound optimization to reduce it. I’ll therefore retitle this PR as an investigation and propose to close it.
d684a53 to
da837a3
Compare
df7497d to
5bf54a6
Compare
wmdietl
left a comment
There was a problem hiding this comment.
Thanks for the thorough experiment!
So even though the particular change isn't working out, would it be worth it to extract the test harness? Did you start the code from scratch? Would be a waste to not remember how you did this.
If you could split it up into a "Driver" and a "CodeGenerator" then we could run experiments in the future by providing different generators and flags for comparison.
@thisisalexandercook What do you think?
|
A "CodeGenerator" and "Driver" based on this could be a really nice abstraction to add to our test suite. I could see it being useful even when implementing a new flagged feature to AB test the impact it has (RE: our discussions around implementing scoped checking that actually removes out of scope checks). Maybe you could even test performance across JDKs with this. Another thought: it might be interesting to use this to compare different tools (e.g., NullAway vs. CF) that perform similar tasks. From that perspective, this type of harness might be better suited to live outside of CF, perhaps in the javac-diagnostics-wrapper. That’s outside the scope of what the wrapper currently does, but maybe something to consider. |
|
#1424 contains the benchmarking suite. This PR can be closed. |
Summary
This PR started as an attempt to optimize repeated
NewClassTreeprocessing in the Nullness checker. During the investigation we instrumented and benchmarked thevisitNewClasspath, including an additionalif (type.hasEffectiveAnnotation(NONNULL))fast-path.Benchmark setup
A jtreg-based performance harness (
NewClassPerf.java) was created.It generates synthetic Java sources with thousands of
newexpressions (objects, generics, arrays, anonymous classes).Two variants were compared:
if).-Dcf.skipNonnullFastPath=true).Runs were repeated with both AB and BA interleaved protocols to cancel order bias, and also with separate warmup runs.
Results
Across multiple runs (10×400 groups, fixed 2 GB heap), Variant B was consistently as fast or slightly faster (≈ 2–3% median). The extra
ifcheck does not bring measurable improvement. In fact,addMissingAnnotationalready checks for existing annotations, so the fast-path only adds overhead.Conclusion
After extensive exploration, we conclude that:
visitNewClasscalls are expected, since multiple sub-checkers traverse the same AST.