Skip to content

Add regex & memoize caches; optimize queries#524

Open
talyguryn wants to merge 3 commits intomasterfrom
improvements/fix-memory-usage
Open

Add regex & memoize caches; optimize queries#524
talyguryn wants to merge 3 commits intomasterfrom
improvements/fix-memory-usage

Conversation

@talyguryn
Copy link
Member

Review required!

Prevent memory leaks and reduce work by introducing a RegExp compile cache and clearing memoization caches on clearCache. Key changes:

  • Override clearCache to clear regexpCache and any decorator-based memoize caches stored on the instance.
  • Add a private regexpCache Map and reuse compiled RegExp objects in getMatchedPattern to avoid repeated compilation.
  • Annotate getProjectPatterns with @memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' }).
  • Tighten DB queries used with cache.get to only project _id and pass a 300s TTL to the cache.get calls to reduce payload and caching duration.
  • Improve log messages to include groupHash/totalCount or indicate none, for clearer diagnostics.
  • Change getEventCacheKey format to event:<projectId>:<groupHash> for clearer keys.

These changes reduce memory usage, avoid repeated RegExp compilation, make logs more actionable, and minimize DB transfer sizes.

Prevent memory leaks and reduce work by introducing a RegExp compile cache and clearing memoization caches on clearCache. Key changes:

- Override clearCache to clear regexpCache and any decorator-based memoize caches stored on the instance.
- Add a private regexpCache Map and reuse compiled RegExp objects in getMatchedPattern to avoid repeated compilation.
- Annotate getProjectPatterns with @memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' }).
- Tighten DB queries used with cache.get to only project _id and pass a 300s TTL to the cache.get calls to reduce payload and caching duration.
- Improve log messages to include groupHash/totalCount or indicate none, for clearer diagnostics.
- Change getEventCacheKey format to `event:<projectId>:<groupHash>` for clearer keys.

These changes reduce memory usage, avoid repeated RegExp compilation, make logs more actionable, and minimize DB transfer sizes.
Copilot AI review requested due to automatic review settings February 11, 2026 06:00
Introduce REPETITION_CACHE_TTL (300s) and replace magic timeout literals with this constant for repetition cache lookups. Move the regexpCache Map declaration to the class top-level to avoid a duplicate declaration and add JSDoc comments for both the TTL and regexp cache to clarify intent and improve maintainability.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves GrouperWorker’s caching behavior and query efficiency by adding instance-level cache invalidation and reducing repeated work during event grouping.

Changes:

  • Override clearCache() to also clear decorator memoization caches and a new compiled-RegExp cache.
  • Cache compiled RegExp objects in findMatchingPattern() to avoid repeated compilation.
  • Optimize repetition lookup queries by projecting only _id and caching results for 300 seconds; update logs and adjust event cache key format.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* @param projectId - id of the project to find related event patterns
* @returns {ProjectEventGroupingPatternsDBScheme[]} EventPatterns object with projectId and list of patterns
*/
@memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' })
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @memoize to getProjectPatterns means grouping-pattern updates in the accounts DB (or in tests where the mock changes) won’t be observed until the memoize cache expires or clearCache runs. This changes behavior for “dynamic pattern addition” scenarios and can lead to incorrect grouping for minutes after a pattern update. Consider removing memoization here, using a much shorter TTL, or incorporating a pattern-version/updatedAt value into the cache key so changes invalidate immediately.

Suggested change
@memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' })

Copilot uses AI. Check for mistakes.
Comment on lines +152 to +154
* Clear RegExp cache
*/
this.regexpCache.clear();
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In clearCache(), for (const key in this) iterates enumerable properties including any enumerable properties on the prototype chain. Using Object.keys(this) / Object.getOwnPropertyNames(this) would avoid unexpected prototype enumeration and make the cache-clearing logic more predictable.

Copilot uses AI. Check for mistakes.
Replace a for..in loop over `this` with `Object.keys(this).forEach` to ensure only the object's own enumerable properties are inspected when clearing memoize caches. This avoids iterating inherited/prototype properties that could lead to unexpected behavior when resetting cached entries in workers/grouper/src/index.ts.
* Clear memoization caches from decorators
* These are stored as properties on the instance
*/
const memoizeCachePrefix = 'memoizeCache:';
Copy link
Member

@neSpecc neSpecc Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems hardcoded, its better to import it from memoize.ts.

Anyway, I don't see any reason to clear memoization cache manually:

  • it has limited length (with LRU algo)
  • it has ttl

* @param projectId - id of the project to find related event patterns
* @returns {ProjectEventGroupingPatternsDBScheme[]} EventPatterns object with projectId and list of patterns
*/
@memoize({ max: 100, ttl: MEMOIZATION_TTL, strategy: 'hash' })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason to add @memoize to getProjectPatterns() since it is called only inside findSimilarEvent() which is memoized itself.


if (similarEvent) {
this.logger.info(`similar event: ${JSON.stringify(similarEvent)}`);
this.logger.info(`similar event found: groupHash=${similarEvent.groupHash}, totalCount=${similarEvent.totalCount}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets also print projectId and event payload.title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants