Stop SHUTDOWN_MODE middleware from killing the polling loop#23
Merged
Conversation
In production we observed that /help (and every other message) silently
stopped getting any reply once SHUTDOWN_MODE was on. Root cause is in
node_modules/telegraf/telegraf.js fetchUpdates: when handleUpdates
rejects, the .catch branch flips polling.started = false and never
recovers. Any throw escaping a middleware kills long-polling for the
rest of the process lifetime.
shutdownMode was the culprit. ctx.reply asserts ctx.chat and throws on
updates that have no chat (my_chat_member, chat_join_request, …) —
exactly the kind of updates that come in en masse after the farewell
goes out and users start blocking the bot. One such update in the
startup batch was enough to take all four bots off the air.
Two narrow guards:
* skip the update entirely when ctx.chat is missing
* try/catch around ctx.reply so per-send failures (blocked-by-user,
rate-limit, etc.) do not escape either
Both leave the polling loop intact so /help keeps reaching users.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Production was silent:
/helpnever got the farewell reply even withSHUTDOWN_MODE=trueand the workerHEALTHYat 3% CPU.Root cause
node_modules/telegraf/telegraf.jsfetchUpdates():Any unhandled throw from a middleware flips
polling.started = falseand the nextfetchUpdates()returns on the first line. The bot never callsgetUpdatesagain for the rest of the process lifetime.shutdownModewas throwing becausectx.reply()assertsctx.chatand throws on updates that don't have one (my_chat_member,chat_join_request, …). The deploy log shows the kill happening in the very first poll batch after boot:One chatless update in the startup batch took every bot in the process off the air. After the farewell went out, blocked-by-user updates kept arriving exactly because users were closing the bot.
Fix
Two narrow guards in
src/middlewares/shutdownMode.ts:if (!ctx.chat) return— chatless updates are no-ops, not throwstry/catcharoundctx.reply— per-send failures (blocked, rate-limit) are logged and swallowedThe polling loop now survives both.
Test plan
npx jest— 71 tests passing, two new cases pinning the failure modes:skips chatless updates instead of throwing on ctx.replyswallows reply failures (e.g. user blocked the bot)npm run lintnpm run build/helpreplies with the farewell within seconds