Skip to content

perf(opencode): isolate per-proposal subprocess state to prevent SQLite WAL contention#35

Merged
KE7 merged 1 commit into
mainfrom
perf/opencode-concurrent-isolation
May 16, 2026
Merged

perf(opencode): isolate per-proposal subprocess state to prevent SQLite WAL contention#35
KE7 merged 1 commit into
mainfrom
perf/opencode-concurrent-isolation

Conversation

@KE7
Copy link
Copy Markdown
Owner

@KE7 KE7 commented May 14, 2026

Summary

  • Fixes the SQLite WAL contention observed in the PR perf(evolution): atomic-worker architecture for parallel proposals (full GEPA parity) #34 E2E re-verify run where g1-s1 failed with "Failed to run the query 'PRAGMA journal_mode = WAL'" while g1-s2 succeeded — wasting one proposal slot
  • Fix: set XDG_DATA_HOME to a per-candidate directory (<worktree>/.helix_opencode_state/) before invoking opencode run, giving each parallel worker its own isolated SQLite database
  • 5 new unit tests; backend-exclusion test covers all 4 other backends (claude/codex/cursor/gemini)

Background

Depends on: #34 (perf/architecture-d-atomic-worker)

When num_parallel_proposals > 1 with backend = "opencode", helix spawns concurrent opencode run --format json subprocesses. All share the same global SQLite database:

  • macOS: ~/Library/Application Support/opencode/opencode.db
  • Linux: $XDG_DATA_HOME/opencode/opencode.db

Each subprocess issues PRAGMA journal_mode = WAL at startup. Concurrent requests on the same file produce the error seen in PR #34's re-verify.

Root cause archaeology: Confirmed via opencode debug paths that XDG_DATA_HOME changes the data directory (and therefore opencode.db path). Setting it per-candidate gives complete isolation.

Reference: PR #34 E2E re-verify report at /Users/ke/helix-arch-d-e2e-reverify-report.md:

g1-s1 status: Failed (SQLite WAL error — handled gracefully) ⚠️

OMAR comparison: OMAR uses OPENCODE_CONFIG_CONTENT env var for MCP config injection per-session, but does NOT set XDG_DATA_HOME. OMAR runs opencode in interactive TUI mode (tmux panes), not headless opencode run, so it doesn't hit the cold-start WAL contention. The XDG_DATA_HOME isolation is helix-specific and implemented directly.

Implementation

src/helix/mutator.py (~28 LOC including explanatory comment):

if backend == "opencode" and (sandbox is None or not sandbox.enabled):
    # Per-candidate SQLite isolation for concurrent opencode subprocesses.
    # [20-line comment explaining the WAL contention root cause]
    opencode_state_dir = Path(worktree_path) / ".helix_opencode_state"
    opencode_state_dir.mkdir(parents=True, exist_ok=True)
    backend_env["XDG_DATA_HOME"] = str(opencode_state_dir)

Also adds .helix_opencode_state/ to _ignore_helix_artifacts gitignore patterns.

E2E Validation Results

Both runs used the add_one off-by-one fixture with backend = "opencode", score_parser = "exitcode", frontier_type = "instance".

Config Workers Acceptance SQLite WAL errors Cost
n_proposals=2 2/2 ✅ 100% 0 ✅ $0.00
n_proposals=3 3/3 ✅ 100% 0 ✅ $0.00

Prior (PR #34 re-verify without this fix): n_proposals=2 → 1/2 (g1-s1 SQLite WAL failure).

Per-candidate isolation confirmed: each worktree has .helix_opencode_state/opencode/opencode.db.

Test Plan

  • uv run pytest tests/unit -q → 871 passed (866 pre-existing + 5 new)
  • uv run mypy --strict src/helix/ → Success: no issues found in 29 source files
  • E2E n_proposals=2 → 2/2 success, no SQLite WAL errors
  • E2E n_proposals=3 → 3/3 success, no SQLite WAL errors
  • Reviewer verdict: APPROVE (opencode-harden-reviewer, /Users/ke/helix-opencode-harden-review.md)

🤖 Generated with Claude Code

…te WAL contention

OpenCode stores its session database at a global path (~/Library/Application
Support/opencode/opencode.db on macOS, $XDG_DATA_HOME/opencode/opencode.db on
Linux). When helix runs num_parallel_proposals > 1, all concurrent `opencode run`
subprocesses issue `PRAGMA journal_mode = WAL` against this shared database at
startup, producing:
  "Failed to run the query 'PRAGMA journal_mode = WAL'"

This was observed in PR #34 E2E re-verify: g1-s1 lost to this SQLite WAL error
while g1-s2 succeeded, wasting one proposal slot.

Fix: set XDG_DATA_HOME to a per-candidate directory (<worktree>/.helix_opencode_state/)
before invoking `opencode run`. OpenCode respects XDG_DATA_HOME and creates an
isolated database there, so no two workers ever contend on the same file.

- Only applies to `backend = "opencode"` (claude/codex/cursor/gemini untouched)
- Skips the sandboxed path (container isolation already separates per-candidate state)
- Adds .helix_opencode_state/ to .gitignore patterns (auto-cleanup with worktree)
- 5 new unit tests covering: env set, uniqueness, env inheritance, backend isolation,
  gitignore; backend-exclusion test covers all 4 other backends (claude/codex/cursor/gemini)

E2E validated: n_proposals=2 → 2/2 success, n_proposals=3 → 3/3 success, no SQLite WAL
errors in either run. Cost: $0.00 (OpenCode free tier).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Base automatically changed from perf/architecture-d-atomic-worker to main May 16, 2026 01:57
@KE7
Copy link
Copy Markdown
Owner Author

KE7 commented May 16, 2026

Closing briefly to re-trigger CI after PR #34 merged into main.

@KE7 KE7 closed this May 16, 2026
@KE7
Copy link
Copy Markdown
Owner Author

KE7 commented May 16, 2026

Reopening; base auto-retargeted to main.

@KE7 KE7 reopened this May 16, 2026
@KE7 KE7 merged commit b3d325a into main May 16, 2026
2 checks passed
@KE7 KE7 deleted the perf/opencode-concurrent-isolation branch May 16, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant