diff --git a/.claude/rules/architecture.md b/.claude/rules/architecture.md index a75a8cbb..19394ffb 100644 --- a/.claude/rules/architecture.md +++ b/.claude/rules/architecture.md @@ -155,17 +155,26 @@ State files (~/.ccbot/ or $CCBOT_DIR/): bg_notify_needs_action / language / weekly_reset_day / auto_approve / local_terminal*) + bg_status snapshot session_map.json ─ hook-generated window_id→session mapping + (SessionStart + UserPromptSubmit — the latter + self-heals stale entries on every prompt) monitor_state.json ─ poll progress (byte offset) per JSONL file + ccbot.lock ─ singleton flock held by main.py for the + process lifetime; a second start refuses with + sys.exit(1) to avoid Telegram getUpdates + cross-fire ``` ## Key Design Decisions - **DM-centric, not topic-centric** — single 1-1 chat per user; routing key is `active_sessions[user_id] -> session_id -> window_id`. Multiple parallel sessions per user, switcher in the most recent bot message. - **Window ID-centric** — All internal state keyed by tmux window ID (e.g. `@0`, `@12`), not window names. Window IDs are guaranteed unique within a tmux server session. Window names are kept as display names via `window_display_names` map. Same directory can have multiple windows. -- **Hook-based session tracking** — Claude Code `SessionStart` hook writes `session_map.json`; monitor reads it each poll cycle to auto-detect session changes. +- **Hook-based session tracking** — Claude Code `SessionStart` + `UserPromptSubmit` hooks write `session_map.json`; monitor reads it each poll cycle. SessionStart catches new claude processes; UserPromptSubmit fires per prompt and rewrites the mapping if the existing entry diverges from the current `session_id` (self-heals after `/resume`, `/clear`, or bot-restart races that miss the SessionStart firing). The hook produces zero stdout and always exits 0 — required for safety because UserPromptSubmit would otherwise prepend stdout to the prompt or block on non-zero exits. Fast-path skips the atomic rewrite when nothing changed. - **Tool use ↔ tool result pairing** — `tool_use_id` tracked across poll cycles; tool result edits the original tool_use Telegram message in-place. - **MarkdownV2 with fallback** — All messages go through `safe_reply`/`safe_edit`/`safe_send` which convert via `telegramify-markdown` and fall back to plain text on parse failure. - **No truncation at parse layer** — Full content preserved; splitting at send layer respects Telegram's 4096 char limit with expandable quote atomicity. - Only sessions registered in `session_map.json` (via hook) are monitored. - Notifications delivered to users via active_sessions reverse-map (claude session_id -> user with matching active session). Background sessions render their own per-session live cards. - **Startup re-resolution** — Window IDs reset on tmux server restart. On startup, `resolve_stale_ids()` matches persisted display names against live windows to re-map IDs. Old state.json files keyed by window name are auto-migrated. +- **Singleton lock** — `main.py` acquires an exclusive `fcntl.flock(LOCK_EX | LOCK_NB)` on `$CCBOT_DIR/ccbot.lock` before any tmux / bot startup. `FD_CLOEXEC` prevents the lock from leaking into subprocess children. A contending instance hits `OSError`, logs the path, and exits with code 1 — the supervisor's restart-backoff then just waits for the existing instance to die. +- **Orphan-process hygiene** — `archive_session` and `idle_archive_sweep` follow `tmux kill_window` with `tmux_manager.kill_orphan_claude_processes(claude_session_id)`: pgrep + SIGTERM any `claude --resume ` survivors. Catches the rare case where `claude` traps SIGHUP or the bot crashed mid-archive, leaving an orphan writer on the session's JSONL. Self/parent PID guarded. +- **Orphan-window detection** — At startup, `session_recovery.detect_orphan_windows` lists tmux windows not bound to any Session record (excluding the reserved utility windows `__main__` / `ccbot-usage`) and logs WARNING. Never auto-kills: surfaces the failure mode without destroying user state. diff --git a/.claude/rules/dm-architecture.md b/.claude/rules/dm-architecture.md index a181c429..f10a31d2 100644 --- a/.claude/rules/dm-architecture.md +++ b/.claude/rules/dm-architecture.md @@ -37,7 +37,7 @@ sessions: dict[str, Session] # short id -> Session record ### window_id -> claude session_id -Unchanged. Still written by the `SessionStart` hook to `session_map.json`. `WindowState.session_id` mirrors that. +Written by both the `SessionStart` and `UserPromptSubmit` hooks to `session_map.json`. SessionStart catches every new claude process; UserPromptSubmit fires per prompt and self-heals the mapping if it diverges from the pane's current `session_id` (covers `/resume`, `/clear`, and the bot-restart-race window where SessionStart was missed). `WindowState.session_id` mirrors the current map. ## Message flows @@ -214,7 +214,7 @@ fake user turn in JSONL, polluting the live card and eating tokens. ## What is unchanged - `tmux_manager`, `transcript_parser`, `terminal_parser`, `screenshot`, `hook`, `monitor_state`, `markdown_v2`, `telegram_sender`. -- `session_map.json` semantics (keyed by `tmux_session:window_id`, written by Claude Code SessionStart hook). +- `session_map.json` semantics (keyed by `tmux_session:window_id`, written by Claude Code `SessionStart` + `UserPromptSubmit` hooks). - `MarkdownV2` formatting pipeline. - Per-user message queue and rate limiting (`AIORateLimiter`). - Tool-use / tool-result pairing (in-place edit). diff --git a/.claude/rules/secrets.md b/.claude/rules/secrets.md index 5a68a9da..4b035e7a 100644 --- a/.claude/rules/secrets.md +++ b/.claude/rules/secrets.md @@ -14,6 +14,7 @@ where **not** to put) credentials when working on or with ccbot. | Claude Code login token | `claude auth status` — managed by the CLI, not a file in the repo | | whisper.cpp model | `~/.ccbot/models/ggml-medium.bin` (path overridable via `WHISPER_MODEL_PATH`) | | ccbot persisted state | `~/.ccbot/state.json` — non-secret, but contains user ids / paths | +| ccbot singleton lock | `~/.ccbot/ccbot.lock` — empty file, flock'd by the running bot. Refuses second-instance starts | `~/.ccbot/` itself is overridable via `CCBOT_DIR=…`. Local `./.env` beats the global one when both are present (see `config.py`). diff --git a/CLAUDE.md b/CLAUDE.md index ad57d49c..b45abccb 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -24,7 +24,8 @@ ccbot hook --install # Auto-install Claude Code SessionStart ho - **bypass-only** — claude is launched with `--dangerously-skip-permissions`. No permission relay UI. - **No message truncation** at parse layer — splitting only at send layer (`split_message`, 4096 char limit). Tables/code that overflow → file attachment. - **MarkdownV2 via telegramify-markdown** — use `safe_reply`/`safe_edit`/`safe_send` helpers (auto fallback to plain text). -- **Hook-based session tracking** — `SessionStart` hook writes `session_map.json`; monitor polls it to detect session changes. +- **Hook-based session tracking** — `SessionStart` + `UserPromptSubmit` hooks write `session_map.json`; monitor polls it to detect session changes. UserPromptSubmit self-heals stale entries on every prompt (recovers from missed SessionStart firings, e.g. `/resume`, `/clear`, bot-restart races). +- **Single-instance lock** — `main.py` holds an exclusive `fcntl.flock` on `$CCBOT_DIR/ccbot.lock` for the process lifetime. A second start refuses with `sys.exit(1)` — guards against silent `Conflict: terminated by other getUpdates request` cross-fire when two bots end up running side by side (e.g. supervisor + manual launch). - **Message queue per user** — FIFO ordering, message merging (3800 char limit), tool_use/tool_result pairing. - **Rate limiting** — `AIORateLimiter(max_retries=5)` on the Application (30/s global). On restart, the global bucket is pre-filled to avoid burst against Telegram's server-side counter. @@ -37,11 +38,11 @@ ccbot hook --install # Auto-install Claude Code SessionStart ho - Config directory: `~/.ccbot/` by default, override with `CCBOT_DIR` env var. - `.env` loading priority: local `.env` > config dir `.env`. -- State files: `state.json` (thread bindings), `session_map.json` (hook-generated), `monitor_state.json` (byte offsets). +- State files: `state.json` (sessions / window_states / user settings), `session_map.json` (hook-generated), `monitor_state.json` (byte offsets), `ccbot.lock` (singleton flock). ## Hook Configuration -Auto-install: `ccbot hook --install` +Auto-install: `ccbot hook --install` (per-event idempotent — re-running on a partial SessionStart-only install adds the missing UserPromptSubmit entry without duplicating). Or manually in `~/.claude/settings.json`: ```json @@ -51,11 +52,18 @@ Or manually in `~/.claude/settings.json`: { "hooks": [{ "type": "command", "command": "ccbot hook", "timeout": 5 }] } + ], + "UserPromptSubmit": [ + { + "hooks": [{ "type": "command", "command": "ccbot hook", "timeout": 5 }] + } ] } } ``` +The hook never writes to stdout on the normal path (UserPromptSubmit would treat stdout as prompt-prepend text) and always exits 0 (a non-zero exit would block the user's prompt). When the existing `session_map.json` entry already matches what the hook would write, the atomic rewrite is skipped — UserPromptSubmit in a stable window is a pure read. + ## Architecture Details See @.claude/rules/architecture.md for full system diagram and module inventory. diff --git a/doc/dm-multisession-spec.md b/doc/dm-multisession-spec.md index 3d127502..0925e52c 100644 --- a/doc/dm-multisession-spec.md +++ b/doc/dm-multisession-spec.md @@ -53,6 +53,7 @@ State persistence in `$CCBOT_DIR`: - `state.json` — active session per chat, session list with metadata - `session_map.json` — session_id ↔ tmux window mapping - `monitor_state.json` — JSONL read offsets per session +- `ccbot.lock` — singleton flock held by `main.py` for the process lifetime; second-instance starts refuse with `sys.exit(1)` Deployment target M3: @@ -334,11 +335,17 @@ Install footprint: ### Auto-recover on bot start (F2) +- Acquire `$CCBOT_DIR/ccbot.lock` (exclusive `fcntl.flock`). On contention, exit with code 1 — there is already a bot running and Telegram's `getUpdates` is exclusive per token. Lock is set with `FD_CLOEXEC` so it never leaks into subprocess children. - Read `state.json`. - For each session marked active or idle: check if its tmux window still exists. - If yes: re-attach. Re-bind monitor offsets. - If no: mark as `lost`. Surfaces in the switcher with a `Restore` button. - For each archived session: nothing to do at startup. +- Walk live tmux windows. Windows not bound to any Session record (excluding the reserved utility windows `__main__` / `ccbot-usage`) are logged as `orphan_window` WARNINGs. Never auto-killed — surfaces the failure mode without destroying state. Typical cause: a window that survived `kill_window` during an earlier archive (claude trapped SIGHUP, or the bot crashed mid-archive). + +### Archive cleanup + +- `kill_window(window_id)` is followed by `tmux_manager.kill_orphan_claude_processes(claude_session_id)`: `pgrep` for any `claude --resume ` survivors and `SIGTERM` them. Self/parent PID guarded. Prevents two processes from later resuming the same session id and corrupting its JSONL. ### Manual restore (F3) diff --git a/src/ccbot/i18n.py b/src/ccbot/i18n.py index cd5a8fa2..f02b896c 100644 --- a/src/ccbot/i18n.py +++ b/src/ccbot/i18n.py @@ -404,7 +404,13 @@ "less rate-limit pressure).\n" "• *Languages.* Settings → Language: en / ru / zh.\n" "• *Outbound proxy.* Set `TG_PROXY_URL` if the host can't reach " - "api.telegram.org directly." + "api.telegram.org directly.\n" + "• *Single instance.* Bot holds an exclusive flock on " + "`$CCBOT_DIR/ccbot.lock`; a second `uv run ccbot` refuses with " + "an error in stderr instead of fighting for Telegram updates.\n" + "• *Hook self-heal.* `SessionStart` + `UserPromptSubmit` hooks " + "both update `session_map.json` — a missed SessionStart is " + "fixed on the next prompt automatically." ), } @@ -760,7 +766,13 @@ "карточки сессии. Меньше = шустрее, больше = меньше rate-limit.\n" "• *Языки.* Settings → Language: en / ru / zh.\n" "• *Outbound proxy.* `TG_PROXY_URL` если api.telegram.org " - "недоступен напрямую." + "недоступен напрямую.\n" + "• *Один инстанс.* Бот держит exclusive flock на " + "`$CCBOT_DIR/ccbot.lock`; второй `uv run ccbot` откажется " + "стартовать с ошибкой в stderr, не подерётся за Telegram updates.\n" + "• *Self-heal хук.* `SessionStart` + `UserPromptSubmit` оба " + "обновляют `session_map.json` — пропущенный SessionStart " + "автоматически чинится при следующем prompt'е." ), } @@ -1080,7 +1092,12 @@ "更小 = 更灵敏,更大 = 更省 rate-limit。\n" "• *语言。* Settings → Language:en / ru / zh。\n" "• *出站代理。* `TG_PROXY_URL` 如果主机无法\n" - "直接访问 api.telegram.org。" + "直接访问 api.telegram.org。\n" + "• *单实例锁。* bot 在 `$CCBOT_DIR/ccbot.lock` 持独占 flock;\n" + "第二个 `uv run ccbot` 会拒绝启动并在 stderr 报错,\n" + "不会和原实例争抢 Telegram updates。\n" + "• *Hook 自愈。* `SessionStart` + `UserPromptSubmit` 都会更新\n" + "`session_map.json` — 错过的 SessionStart 在下一个 prompt 自动修复。" ), }