Skip to content

chore: dup-count cache is process-local — stale under multi-worker deployment #20

@gregoryfoster

Description

@gregoryfoster

Problem

`_dup_count_cache` in `src/api/admin/orgs.py` is a module-level Python dict. Under a multi-worker gunicorn deployment, each worker process holds its own copy. A merge or dismiss in worker A calls `_invalidate_dup_count_cache()` on A's cache only; workers B–N continue serving the stale count for up to 5 minutes (the TTL).

Currently not a bug in production (single uvicorn worker), but will silently misbehave if workers are scaled up.

Options

Option Effort Notes
Redis Medium Add `redis-py` dep + `REDIS_URL` env var; replace dict with `GET`/`SETEX`/`DEL`
Single-worker ops constraint Zero Document `--workers 1` in deployment runbook; simplest
Accept TTL lag Zero Count is at most 5 min stale per worker; acceptable for low-traffic admin

Recommended approach

At current scale: document the `--workers 1` constraint explicitly in the deployment runbook. Revisit with Redis if worker count is ever increased.

References

  • `src/api/admin/orgs.py` — `_dup_count_cache`, `_invalidate_dup_count_cache()`, `count_org_duplicates()`
  • AGENTS.md — cache caveat documented
  • CR finding 3, conversation 2026-03-21

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions