Skip to content

security(mesh): close ACL file TOCTOU window between is_default_acl_in_use and resolve_acl #218

@cagataycali

Description

@cagataycali

Context: Surfaced in the 2026-05-25 06:42 review on PR #195 (PRRT_kwDORUMiZs6EeH_B).

The R18-C identity-tuple cache in _acl_config.py is documented as a TOCTOU close, but it only delivers one-shot consistency on stable inputs. If an attacker rewrites the ACL file between is_default_acl_in_use() and resolve_acl() -- exactly when the TOCTOU matters -- the second _load_acl_cached call computes a fresh identity tuple, misses the cache, and re-reads the (now-malicious) file.

The docstring at _acl_config.py:354-362 says "both functions take the same snapshot; if the file changes mid-flight the next call refreshes" -- the second clause is the bug surface, not the fix.

Fix options

Reviewer offered two paths:

Option A (architectural, preferred): thread a single resolved dict through Mesh.start

# in Mesh.start, before _refuse_under_permissive_default_acl
resolved_acl = _acl_config.resolve_acl(namespace)
if resolve_auth_mode() == "mtls" and _acl_config._is_permissive_acl_shape(resolved_acl):
    ...

One read at the top of Mesh.start, dict travels through the gate and into the acl_block insertion. Closes the TOCTOU window completely (the attacker would need to rewrite within a single _load_acl_file call).

Option B (minimal): soften the docstring claim

Drop the "if the file changes mid-flight the next call refreshes" sentence. The cache delivers one-shot consistency on stable inputs only; an attacker rewriting between calls still wins. State the actual guarantee.

Acceptance criteria

  • One read of STRANDS_MESH_ACL_FILE per Mesh.start() call (option A) OR
  • Docstring on _acl_config.py:354-362 no longer claims TOCTOU close behaviour (option B).
  • Pin test in tests/mesh/test_default_acl_warning.py that asserts the chosen behaviour (a counter on _load_acl_file calls per Mesh.start for option A; a docstring grep for option B).
  • Reviewer's recipe in Mesh.start is preserved verbatim if option A is chosen (it threads the dict cleanly through the existing call sites).

Why a separate PR

PR #195 is at R25 already (round-budget invoked). Folding this in would either:

  • expand R25 to 5 fixes instead of 4 (T2 has architectural choice that warrants its own review round); or
  • cascade into R26 if the docstring softening proves insufficient.

Filing here so the reviewer's recipe survives the PR merge and the next contributor has all the context.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions