PR 2/2: supervised systemd service + @restart#234
Open
spierenburg wants to merge 13 commits into
Open
Conversation
Dependency-free helpers: configure_logging (logfile null/""/"-" → stdout for journald, path → file as before) and resolve_feedmap (Matterbot.feedmap single source of truth, Modules.feedmap deprecated fallback). 5 tests.
… feedmap key) matterbot.py & matterfeed.py __main__ logging now go through runtime_config.configure_logging (logfile sentinel → journald). Both processes resolve the feedmap path via runtime_config.resolve_feedmap; this fixes the pre-existing matterfeed split-brain (it read Modules.feedmap but wrote Matterbot.feedmap). 15 tests green.
isoperator(userid) = botadmins ∪ botoperators (role names or ids), delegating to isadmin for the admin superset. Gates @restart in a later task. 4 tests; 19 suite green.
…_reexec) detect_service_manager (systemd only if INVOCATION_ID present; explicit 'none' wins), state_dir, write/read/clear_restart_marker (for the post-restart "back up" notice), and self_reexec (non-systemd @restart fallback). Pure/importable. 6 tests; 25 suite green. (T5+T6 committed together so the file lands ruff-clean — every import is used.)
__init__ gains _shutdown_requested/_exit_intent. request_shutdown() stops the websocket; run_forever() returns (skipping backoff/reconnect) when the flag is set; __main__ installs a SIGUSR1 handler and exits 0 after run_forever returns so systemd Restart=always relaunches on the on-disk code. 2 tests; 27 suite green.
Restructures the PR1 lifecycle elif into an intent dispatch: @restart → isoperator-gated → write restart marker, post "Restarting now…", then request_shutdown (systemd) or self_reexec (no supervisor). reload branches untouched and still reachable; unknown lifecycle tokens inert. 27 suite green.
T8's @restart dispatch calls lifecycle.detect_service_manager/ write_restart_marker/self_reexec but matterbot.py never imported lifecycle — a latent NameError on every @restart (py_compile/pytest/ import all pass since no test exercises that path; ruff F821 caught it). 27 suite green; the three callsites confirmed resolvable.
report_restart() reads the restart marker on first connect and posts "Back up. ✅" to the originating thread, then clears the marker. Called once per process in run_forever (guarded), after welcome-reconcile and before websocket connect. 27 suite green.
contrib/systemd/{matterbot,matterfeed}.service: Type=simple, one
unprivileged matterbot user, hardened sandbox (NoNewPrivileges,
ProtectSystem=strict, ProtectHome, PrivateTmp, empty CapabilityBoundingSet,
ReadWritePaths=/var/lib/matterbot), Restart=always + StartLimit
crash-loop backstop. README.md documents the single-account install,
config paths (journald via logfile:"-"), and @restart behaviour. No
updater unit / polkit / deploy account (@update was cut).
Replaces the "no daemonization" stance: tmux/screen still supported, but systemd (contrib/systemd/README.md) recommended for production — unprivileged user, hardened sandbox, operator-gated @restart. Notes the non-systemd re-exec fallback and that there is no chat-triggered code path (git pull on the host + @reload for modules).
M1 — isadmin/isoperator now return an explicit bool on the success path and False (never None) in the except handler, so the lifecycle gate is fail-closed *and* unambiguous (no None-is-falsy fall-through from the admin sub-check). Also bind `log` at module scope: it was assigned only inside `if __name__ == '__main__'`, so the except path raised NameError instead of returning False whenever MattermostManager was used outside the script entrypoint (unit tests, imports). getLogger is a singleton — the entrypoint still attaches handlers/level. M2 — restart marker is now written with O_NOFOLLOW + O_CREAT and fchmod 0o600, and read with O_RDONLY|O_NOFOLLOW (refuses a planted symlink, fail-closed to "no marker"). A local host user can no longer steer the post-restart confirmation by pre-creating/symlinking the marker. StateDirectoryMode=0750 added to both systemd units so the state dir is not world-traversable. Tests assert the invariants: plain-user/API-error -> exactly False, marker mode == 0o600, read and write both refuse symlinks. 32 passed.
L3 — split test deps out of the runtime requirements: requirements.txt
no longer pulls pytest into the production venv (the systemd README
installs requirements.txt only). New requirements-dev.txt pins
pytest>=8.0,<9.0.
L4 — the bare `except: # ... create an empty map instead` in
load_bindmap/load_feedmap was a lying comment: the code re-raises (and
should — a corrupt existing map must fail loud, not be silently blanked,
which would drop every binding unnoticed; a missing map is handled by
is_file() and never reaches the handler). Comment now matches behaviour;
`except:` tightened to `except Exception:`.
L5 — load_command_table silently overwrote on a case-insensitive command
dir-name collision (Foo/ vs foo/), so a command the operator believes is
loaded would vanish. The collision is now recorded in report["failed"]
and the first dir walked is kept. Test added (walk simulated so it is
host-FS-independent).
L6 — replaced command.lstrip('!@') (strips any leading run of !/@, so
"@@restart" -> "restart") with a single exact-prefix strip computed once.
Unreachable today (dispatch is gated on exact membership in
lifecyclecmds) but removes the latent fragility if that list changes. No
dedicated test: the branch is provably unreachable, a test would only
exercise mocked-up scaffolding.
33 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Runs
matterbot/matterfeedas supervised systemd services with an operator-gated@restart/!restart. The@update/self-update option was deliberately cut — there is no chat-triggered code path; code updates remain a plaingit pullon the host (then@reloadfrom PR 1 for modules).contrib/systemd/):matterbot.service+matterfeed.service, one dedicated unprivilegedmatterbotuser, hardened sandbox (NoNewPrivileges,ProtectSystem=strict,ProtectHome,PrivateTmp, emptyCapabilityBoundingSet,ReadWritePaths=/var/lib/matterbot),Restart=always+StartLimitBurstcrash-loop backstop. No updater unit / polkit / sudoers / second deploy account / read-only-tree (all were@update-only machinery).@restart(isoperator-gated =botadmins ∪ botoperators): writes a marker, posts "Restarting now…", then cleanly exits (request_shutdown→ websocket close →run_foreverreturns →sys.exit(0)→ systemdRestart=always). On a non-systemd host itos.execv-re-execs instead. On first reconnect the bot posts "Back up. ✅" to the originating thread.runtime_config.py:configure_logging(logfileofnull/""/"-"→ stdout/journald; a path → file, the legacy default) andresolve_feedmap(single source of truth) wired into both processes — this also fixes matterfeed's pre-existing feedmap split-brain (it readModules.feedmapbut wroteMatterbot.feedmap).lifecyclecmdsgains!restart/@restart;botoperators: [];service_manager: systemd|none.lifecycle.py(pure/tested) holdsdetect_service_manager, the restart-marker IO, andself_reexec. 27 tests pass (PR 1's 10 + 17 new); independent final review found 0 Critical / 0 Important and confirmed no@update/Path-B leakage.Test Plan
Automated (passing):
pytest -q→ 27 passed;matterbot.py/matterfeed.pycompile; imports clean;ruff F821clean.Manual — needs a systemd host, a Mattermost operator (
Matterbot.botoperatorsorbotadmins) and a plain-user account, withlifecyclecmds(incl.@restart) in the deployedconfig.yaml:contrib/systemd/README.md;systemctl enable --now;journalctl -u matterbotshows startup (validateslogfile: "-")@restart→ "Restarting now…", clean exit, systemd relaunch, "Back up. ✅" in-thread@restart→ permission-denied, no restart@reload(PR 1) still works for an admin; non-admin denied (regression check)systemctl stop matterbot→ stays stopped (Restart=always doesn't fight an explicit stop)failedafterStartLimitBurst(no infinite flap)service_manager: nonefrom a shell →@restartre-execs, bot returns, "Back up. ✅" posts