fix(startup): refuse to start when another ccbot already holds the lock#86
Merged
Conversation
Telegram's ``getUpdates`` long-poll is exclusive per bot token: a second instance silently steals updates and the original starts spamming ``telegram.error.Conflict: terminated by other getUpdates request`` on every poll. Today we hit this with two ccbot processes running side by side — one under ``ccbot-supervisor.sh`` inside ``tmux ccbot:__main__``, the other started ~1d earlier from a NetHunter-terminal interactive shell outside tmux. Both kept retrying for hours; user-visible behaviour was message delays, ghost responses, and Card edits failing. Fix: ``main.py`` now opens ``$CCBOT_DIR/ccbot.lock`` and holds an exclusive ``fcntl.flock(LOCK_EX | LOCK_NB)`` for the process lifetime BEFORE any tmux / bot startup work runs. The handle lives at module scope so the lock survives until the interpreter exits; ``FD_CLOEXEC`` prevents the lock from leaking into any ``subprocess`` / ``asyncio.subprocess`` child (a stray child outliving the parent would otherwise hold the lock and block future starts). A contending instance hits ``OSError`` on ``flock``, logs the path and the reason, prints to stderr (for supervisor capture), closes the handle, and ``sys.exit(1)``. The supervisor's restart-backoff loop then just waits for the existing instance to die naturally — no more silent ``getUpdates`` cross-fire. +4 unit tests cover happy-path acquire, contention → SystemExit(1), re-acquire after release (process death is enough — no stale-lock sweep needed), and parent-directory creation for first-launch ``$CCBOT_DIR``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
Time4Mind
added a commit
that referenced
this pull request
May 16, 2026
…87) Recent PRs (#82–#86) changed bot behaviour in user-facing and operator-facing ways that the docs hadn't caught up with yet: - CLAUDE.md * Core Design Constraints: add the SessionStart + UserPromptSubmit hook story (self-heal) and the singleton flock. * Configuration: add ``ccbot.lock`` to the state-files list. * Hook Configuration: full block with both events + the safety contract (zero stdout, always exit 0, fast-path skip). - .claude/rules/architecture.md * State-files diagram: ``session_map.json`` now lists both hook events; new ``ccbot.lock`` entry. * Key Design Decisions: hook self-heal, singleton lock, archive- time orphan-claude kill, startup orphan-window detection. - .claude/rules/dm-architecture.md * window_id → claude session_id: both hook events update the map. * "What is unchanged": session_map.json description matches. - .claude/rules/secrets.md * Add ``ccbot.lock`` to the where-things-are table. - doc/dm-multisession-spec.md * State persistence: list ``ccbot.lock``. * Section 9 Recovery: flock acquire step, orphan-window scan, and the archive cleanup path that SIGTERMs orphan claude processes. - src/ccbot/i18n.py * /help → Tips body (en / ru / zh): two new bullets — single- instance lock, hook self-heal — so operators have visibility into the new guarantees without digging through code. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
getUpdateslong-poll is exclusive per bot token: a second instance silently steals updates and the original spamstelegram.error.Conflict: terminated by other getUpdates requeston every poll. Today we hit this with two ccbot processes running side by side — one underccbot-supervisor.shintmux ccbot:__main__, the other started ~1d earlier from a NetHunter-terminal shell outside tmux. User-visible behaviour: message delays, ghost responses, card edits failing.main.pynow acquires an exclusivefcntl.flock(LOCK_EX | LOCK_NB)on$CCBOT_DIR/ccbot.lockBEFORE any tmux / bot startup.FD_CLOEXECis set so the lock doesn't leak intosubprocess/asyncio.subprocesschildren. Handle lives at module scope so the OS holds the lock for the whole process lifetime.OSErroronflock→ logs the lock path, prints to stderr (supervisor captures it),sys.exit(1). The supervisor's restart-backoff loop just waits for the existing instance to die. No more silentgetUpdatescross-fire.Test plan
test_singleton_lock.py: happy-path acquire, contention →SystemExit(1), re-acquire after release (process death suffices — no stale-lock sweep needed), parent-directory creation.🤖 Generated with Claude Code