Track A: lock-in & cleanup — BLE001, catalog retry, python_repl sandbox, dead-backend removal, ~/.squish centralization#58
Merged
Conversation
Locks in the repo-wide except sweep: any new `except Exception:` in squish/ now fails CI (Wall-2 enforcement of python-conventions.md). The 12 intentional boundaries already carry `# noqa: BLE001`. Fixed two malformed probes in astc_loader.py (`except (ImportError, Exception)` → logged boundary). tests/** are exempted via per-file-ignores — they use broad catches for failure-path assertions and best-effort probes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W8bTep4nw7ybFHhx7QjzMv
_background_fetch did a single urlopen with no retry. Extract a testable _fetch_catalog_bytes() that retries transient errors (OSError/URLError/timeout) up to 3 attempts with exponential backoff (0.5/1.0/2.0s); non-transient ValueError (bad URL) is not retried. Stays non-blocking (daemon thread); bundled catalog remains in effect on total failure. Adds 5 unit tests covering success, retry-then-succeed, exhaustion, non-retryable, and backoff schedule. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W8bTep4nw7ybFHhx7QjzMv
…uish path Two cleanup items from the post-review roadmap (Track A): MLC backend decision — _InferenceBackend / _MLXEagerBackend / _MLCBackend and the _active_backend global were never wired into dispatch (_active_backend is only ever read as None); they were exercised solely by their own test. Removed the ~53 lines of dead abstraction and tests/integration/test_phase_f_backends.py. The --inference-backend CLI flag (a separate string global) is unchanged. Centralize ~/.squish — add squish_home() to config.py honoring $SQUISH_HOME (default ~/.squish), route config_path() through it, and replace 19 inline `Path.home() / ".squish"` computations across cli, catalog, server, daemon (squishd/launchagent), hardware/chip_detector, platform/ane_router, and runtime/auto_profile. With SQUISH_HOME unset the paths are byte-identical, so behaviour is unchanged; setting it relocates the entire on-disk footprint. Adds 5 squish_home() unit tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W8bTep4nw7ybFHhx7QjzMv
…olation The retry unit tests patched the shared globals urllib.request.urlopen and time.sleep, so a background catalog-refresh daemon thread spawned by another test (test_catalog_branches via _try_refresh_catalog) raced on them and perturbed the call counts in the full-suite run. Make opener/sleeper injectable (defaulting to the real functions) and have the tests pass their own mocks — fully deterministic regardless of in-flight background threads. Production behaviour is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W8bTep4nw7ybFHhx7QjzMv
squish_python_repl previously exec'd code in-process with only a SIGALRM timeout, so an allocation bomb could exhaust the host serving process. An in-process RLIMIT_AS cap is unworkable here — the loaded model's mmap'd weights count against the process address space. Instead, run the code in a freshly *spawned* child (no inherited model, ~30 MiB baseline) with hard RLIMIT_AS (default 512 MiB) and RLIMIT_CPU limits, capturing stdout over a pipe and enforcing a wall-clock guard in the parent. Falls back to the original in-process path (logged) when process isolation is unavailable. - Shared restricted namespace (_repl_namespace) across both paths; no DRY drift. - New max_memory_mb parameter (exposed in the tool schema). - Adds 11 behavioural tests (the tool had none): output capture, error capture, blocked imports, enforced memory limit, runaway-loop termination, and the in-process fallback. Full CI-mode suite: 4014 passed, 127 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W8bTep4nw7ybFHhx7QjzMv
…_AS is no-op on macOS) macOS/Darwin accepts setrlimit(RLIMIT_AS) but does not enforce it, so the 320 MB allocation ran fine under the 64 MB cap on the macOS CI runners. Skip test_memory_limit_enforced on non-Linux and document the caveat in the function: the spawn isolation, RLIMIT_CPU, and wall-clock timeout still apply everywhere, but the memory cap is Linux-enforced / best-effort on macOS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01W8bTep4nw7ybFHhx7QjzMv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First follow-up sprint from the post-review roadmap (PR #57). Track A — lock-in & cleanup: low-risk, fully Linux-validatable items that cement the earlier
exceptsweep and clear small debts. All 5 planned items are included.Local CI-mode run (
CI=1, which unlocks the integration/Metal-gated suite pertests/conftest.py): 4014 passed, 127 skipped,ruff check squish/ tests/clean.Changes
1. Enable ruff
BLE001(pyproject.toml) — any newexcept Exception:insquish/now fails CI (Wall-2 enforcement ofpython-conventions.md). The intentional boundaries carry# noqa: BLE001;tests/**are exempted viaper-file-ignores. Fixed two malformedexcept (ImportError, Exception)probes inastc_loader.py.2. Catalog fetch retry/backoff (
squish/catalog.py) —_background_fetchdid a singleurlopenwith no retry. Extracted a testable_fetch_catalog_bytes()that retries transient errors (3 attempts, exponential backoff 0.5/1.0/2.0s); non-transientValueErroris not retried. Stays non-blocking; bundled catalog remains on total failure.opener/sleeperare injectable so the 5 new unit tests are fully isolated from any in-flight background-refresh thread. +5 tests.3.
python_replresource sandbox (squish/agent/builtin_tools.py) — the REPL previously exec'd in-process with only a SIGALRM timeout, so an allocation bomb could exhaust the host. A naive in-processRLIMIT_AScap is unworkable (the model's mmap'd weights count against the process address space). Instead it now runs in a freshly spawned child (no inherited model, ~30 MiB baseline) with hardRLIMIT_AS(default 512 MiB) +RLIMIT_CPUlimits, stdout captured over a pipe, wall-clock guard in the parent. Falls back to the in-process path (logged) when isolation is unavailable. +11 tests (the tool had none before).4. Remove dead inference-backend abstraction (
squish/server.py) —_InferenceBackend/_MLXEagerBackend/_MLCBackend+_active_backendwere never wired into dispatch (only read asNone) and exercised solely by their own test. Removed ~53 lines +tests/integration/test_phase_f_backends.py. The--inference-backendCLI flag (a separate string global) is untouched.5. Centralize
~/.squish(squish/config.py+ 8 modules) — addedsquish_home()honoring$SQUISH_HOME(default~/.squish), routedconfig_path()through it, replaced 19 inlinePath.home() / ".squish"computations. WithSQUISH_HOMEunset the paths are byte-identical (no behaviour change); setting it relocates the whole on-disk footprint. +5 tests.Notes
openis still exposed — unchanged). The spawn approach was validated against the address-space concern: a fresh child baselines at ~30 MiB, so the cap is meaningful and Linux-testable.🤖 Generated with Claude Code