Skip to content

refactor: _kw_re cache is a mutable module-level dict — consider using functools.lru_cache instead #55

@Codex-Crusader

Description

@Codex-Crusader

Summary

pulseengine/core/signals.py manually implements a keyword-pattern cache using a plain module-level dict:

_KW_PATTERN_CACHE: dict[str, re.Pattern] = {}

def _kw_re(kw: str) -> re.Pattern:
    if kw not in _KW_PATTERN_CACHE:
        ...
        _KW_PATTERN_CACHE[kw] = re.compile(prefix + escaped + suffix)
    return _KW_PATTERN_CACHE[kw]

Python ships functools.lru_cache specifically for this pattern. Using it is idiomatic, thread-safe by Python's GIL for pure reads, and removes the need to manage the cache dict manually.

Proposed refactor

import functools

@functools.lru_cache(maxsize=None)
def _kw_re(kw: str) -> re.Pattern:
    """Return a compiled regex that matches *kw* as a whole token in lowercase text."""
    escaped = re.escape(kw)
    prefix = r'\b'
    suffix = r'\b' if kw[-1].isalnum() else ''
    return re.compile(prefix + escaped + suffix)

This also eliminates the _KW_PATTERN_CACHE module-level variable entirely, reducing the public namespace noise.

Notes

  • The pre-warming loop (for _kw_pairs in ASSET_KEYWORDS.values(): ...) still works unchanged—it just calls _kw_re() which populates the lru_cache internally
  • lru_cache with maxsize=None is equivalent to an unbounded dict cache but with built-in cache-info introspection (_kw_re.cache_info())

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions