fix(startup): refuse to start when another ccbot already holds the lock by Time4Mind · Pull Request #86 · Time4Mind/ccbot

Time4Mind · 2026-05-16T21:52:47Z

Summary

Telegram's getUpdates long-poll is exclusive per bot token: a second instance silently steals updates and the original spams telegram.error.Conflict: terminated by other getUpdates request on every poll. Today we hit this with two ccbot processes running side by side — one under ccbot-supervisor.sh in tmux ccbot:__main__, the other started ~1d earlier from a NetHunter-terminal shell outside tmux. User-visible behaviour: message delays, ghost responses, card edits failing.
Fix: main.py now acquires an exclusive fcntl.flock(LOCK_EX | LOCK_NB) on $CCBOT_DIR/ccbot.lock BEFORE any tmux / bot startup. FD_CLOEXEC is set so the lock doesn't leak into subprocess / asyncio.subprocess children. Handle lives at module scope so the OS holds the lock for the whole process lifetime.
A contending instance hits OSError on flock → logs the lock path, prints to stderr (supervisor captures it), sys.exit(1). The supervisor's restart-backoff loop just waits for the existing instance to die. No more silent getUpdates cross-fire.

Test plan

+4 unit tests in test_singleton_lock.py: happy-path acquire, contention → SystemExit(1), re-acquire after release (process death suffices — no stale-lock sweep needed), parent-directory creation.
Full suite 473/473 green; ruff + pyright clean.

🤖 Generated with Claude Code

Telegram's ``getUpdates`` long-poll is exclusive per bot token: a second instance silently steals updates and the original starts spamming ``telegram.error.Conflict: terminated by other getUpdates request`` on every poll. Today we hit this with two ccbot processes running side by side — one under ``ccbot-supervisor.sh`` inside ``tmux ccbot:__main__``, the other started ~1d earlier from a NetHunter-terminal interactive shell outside tmux. Both kept retrying for hours; user-visible behaviour was message delays, ghost responses, and Card edits failing. Fix: ``main.py`` now opens ``$CCBOT_DIR/ccbot.lock`` and holds an exclusive ``fcntl.flock(LOCK_EX | LOCK_NB)`` for the process lifetime BEFORE any tmux / bot startup work runs. The handle lives at module scope so the lock survives until the interpreter exits; ``FD_CLOEXEC`` prevents the lock from leaking into any ``subprocess`` / ``asyncio.subprocess`` child (a stray child outliving the parent would otherwise hold the lock and block future starts). A contending instance hits ``OSError`` on ``flock``, logs the path and the reason, prints to stderr (for supervisor capture), closes the handle, and ``sys.exit(1)``. The supervisor's restart-backoff loop then just waits for the existing instance to die naturally — no more silent ``getUpdates`` cross-fire. +4 unit tests cover happy-path acquire, contention → SystemExit(1), re-acquire after release (process death is enough — no stale-lock sweep needed), and parent-directory creation for first-launch ``$CCBOT_DIR``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…87) Recent PRs (#82–#86) changed bot behaviour in user-facing and operator-facing ways that the docs hadn't caught up with yet: - CLAUDE.md * Core Design Constraints: add the SessionStart + UserPromptSubmit hook story (self-heal) and the singleton flock. * Configuration: add ``ccbot.lock`` to the state-files list. * Hook Configuration: full block with both events + the safety contract (zero stdout, always exit 0, fast-path skip). - .claude/rules/architecture.md * State-files diagram: ``session_map.json`` now lists both hook events; new ``ccbot.lock`` entry. * Key Design Decisions: hook self-heal, singleton lock, archive- time orphan-claude kill, startup orphan-window detection. - .claude/rules/dm-architecture.md * window_id → claude session_id: both hook events update the map. * "What is unchanged": session_map.json description matches. - .claude/rules/secrets.md * Add ``ccbot.lock`` to the where-things-are table. - doc/dm-multisession-spec.md * State persistence: list ``ccbot.lock``. * Section 9 Recovery: flock acquire step, orphan-window scan, and the archive cleanup path that SIGTERMs orphan claude processes. - src/ccbot/i18n.py * /help → Tips body (en / ru / zh): two new bullets — single- instance lock, hook self-heal — so operators have visibility into the new guarantees without digging through code. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Time4Mind merged commit f22aff9 into main May 16, 2026
4 checks passed

Time4Mind deleted the fix/singleton-flock-mutex branch May 16, 2026 21:53

Time4Mind mentioned this pull request May 16, 2026

docs: cover orphan-claude / UserPromptSubmit / singleton-flock fixes #87

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(startup): refuse to start when another ccbot already holds the lock#86

fix(startup): refuse to start when another ccbot already holds the lock#86
Time4Mind merged 1 commit into
mainfrom
fix/singleton-flock-mutex

Time4Mind commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Time4Mind commented May 16, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant