Add regex & memoize caches; optimize queries#524
Conversation
Prevent memory leaks and reduce work by introducing a RegExp compile cache and clearing memoization caches on clearCache. Key changes: - Override clearCache to clear regexpCache and any decorator-based memoize caches stored on the instance. - Add a private regexpCache Map and reuse compiled RegExp objects in getMatchedPattern to avoid repeated compilation. - Annotate getProjectPatterns with @memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' }). - Tighten DB queries used with cache.get to only project _id and pass a 300s TTL to the cache.get calls to reduce payload and caching duration. - Improve log messages to include groupHash/totalCount or indicate none, for clearer diagnostics. - Change getEventCacheKey format to `event:<projectId>:<groupHash>` for clearer keys. These changes reduce memory usage, avoid repeated RegExp compilation, make logs more actionable, and minimize DB transfer sizes.
Introduce REPETITION_CACHE_TTL (300s) and replace magic timeout literals with this constant for repetition cache lookups. Move the regexpCache Map declaration to the class top-level to avoid a duplicate declaration and add JSDoc comments for both the TTL and regexp cache to clarify intent and improve maintainability.
There was a problem hiding this comment.
Pull request overview
This PR improves GrouperWorker’s caching behavior and query efficiency by adding instance-level cache invalidation and reducing repeated work during event grouping.
Changes:
- Override
clearCache()to also clear decorator memoization caches and a new compiled-RegExp cache. - Cache compiled
RegExpobjects infindMatchingPattern()to avoid repeated compilation. - Optimize repetition lookup queries by projecting only
_idand caching results for 300 seconds; update logs and adjust event cache key format.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * @param projectId - id of the project to find related event patterns | ||
| * @returns {ProjectEventGroupingPatternsDBScheme[]} EventPatterns object with projectId and list of patterns | ||
| */ | ||
| @memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' }) |
There was a problem hiding this comment.
Adding @memoize to getProjectPatterns means grouping-pattern updates in the accounts DB (or in tests where the mock changes) won’t be observed until the memoize cache expires or clearCache runs. This changes behavior for “dynamic pattern addition” scenarios and can lead to incorrect grouping for minutes after a pattern update. Consider removing memoization here, using a much shorter TTL, or incorporating a pattern-version/updatedAt value into the cache key so changes invalidate immediately.
| @memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' }) |
| * Clear RegExp cache | ||
| */ | ||
| this.regexpCache.clear(); |
There was a problem hiding this comment.
In clearCache(), for (const key in this) iterates enumerable properties including any enumerable properties on the prototype chain. Using Object.keys(this) / Object.getOwnPropertyNames(this) would avoid unexpected prototype enumeration and make the cache-clearing logic more predictable.
Replace a for..in loop over `this` with `Object.keys(this).forEach` to ensure only the object's own enumerable properties are inspected when clearing memoize caches. This avoids iterating inherited/prototype properties that could lead to unexpected behavior when resetting cached entries in workers/grouper/src/index.ts.
| * Clear memoization caches from decorators | ||
| * These are stored as properties on the instance | ||
| */ | ||
| const memoizeCachePrefix = 'memoizeCache:'; |
There was a problem hiding this comment.
seems hardcoded, its better to import it from memoize.ts.
Anyway, I don't see any reason to clear memoization cache manually:
- it has limited length (with LRU algo)
- it has ttl
| * @param projectId - id of the project to find related event patterns | ||
| * @returns {ProjectEventGroupingPatternsDBScheme[]} EventPatterns object with projectId and list of patterns | ||
| */ | ||
| @memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' }) |
There was a problem hiding this comment.
I don't see any reason to add @memoize to getProjectPatterns() since it is called only inside findSimilarEvent() which is memoized itself.
|
|
||
| if (similarEvent) { | ||
| this.logger.info(`similar event: ${JSON.stringify(similarEvent)}`); | ||
| this.logger.info(`similar event found: groupHash=${similarEvent.groupHash}, totalCount=${similarEvent.totalCount}`); |
There was a problem hiding this comment.
lets also print projectId and event payload.title.
Review required!
Prevent memory leaks and reduce work by introducing a RegExp compile cache and clearing memoization caches on clearCache. Key changes:
event:<projectId>:<groupHash>for clearer keys.These changes reduce memory usage, avoid repeated RegExp compilation, make logs more actionable, and minimize DB transfer sizes.