Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 92 additions & 10 deletions scripts/hapax-post-merge-deploy
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@
# unless the unit has `# Hapax-Install-Scope: system`, in which case
# it is system/root-scoped and must be installed by its dedicated
# installer rather than copied into the user unit directory.
# A unit carrying `# Hapax-Auto-Enable: true` (+ an [Install] section)
# is `enable --now`'d when newly installed, so a freshly-merged
# service/timer goes live instead of installing-but-sleeping (the
# try-restart-is-a-no-op-for-a-never-enabled-unit gap). Unmarked new
# timers still auto-enable for back-compat; unmarked services do not.
# systemd/units/*.service.d/*.conf → cp drop-in + daemon-reload + restart active unit
# systemd/user-preset.d/*.preset → cp user preset
# systemd/overrides/** → source/rationale docs only; no direct deploy
Expand All @@ -48,6 +53,11 @@
# absence-class deploy bug where a new systemd unit lands at a
# path the deploy script doesn't case-match (P-4 of the
# absence-class-bug-prevention-and-remediation epic).
# hapax-post-merge-deploy --verify-auto-enable
# post-deploy assertion mode — exit 1 if any unit under
# systemd/units/ marked `# Hapax-Auto-Enable: true` is not enabled
# (and, for timers, active). Witnesses FM-11 (and future marked
# units) is live, not merely installed. Needs no commit SHA.
#
# Exits 0 if nothing needed deploying or all deploys succeeded.
# Exits non-zero if any deploy step failed (or, in --report-coverage
Expand All @@ -57,6 +67,7 @@ set -euo pipefail
DRY_RUN=0
REPORT_COVERAGE=0
COVERAGE_FROM_STDIN=0
VERIFY_AUTO_ENABLE=0
TRACE_READY=0
TRACE_WRITTEN=0
SHA=""
Expand All @@ -70,6 +81,8 @@ if [ "${1:-}" = "--report-coverage" ]; then
elif [ "${1:-}" = "--report-coverage-stdin" ]; then
REPORT_COVERAGE=1
COVERAGE_FROM_STDIN=1
elif [ "${1:-}" = "--verify-auto-enable" ]; then
VERIFY_AUTO_ENABLE=1
else
SHA="${1:?commit SHA required}"
fi
Expand All @@ -87,6 +100,66 @@ LOCAL_BIN="$HOME/.local/bin"
TRACE_PATH="${HAPAX_POST_MERGE_TRACE_PATH:-$HOME/.cache/hapax/post-merge-traces/post-merge-traces.jsonl}"
TRACE_MAX_RECORDS="${HAPAX_POST_MERGE_TRACE_MAX_RECORDS:-200}"

# --- auto-enable marker (reform-improve-deploy-activation-20260601) ---
# A systemd unit may declare it wants to be enabled+started on deploy by
# carrying a `# Hapax-Auto-Enable: true` comment annotation (mirrors the
# `# Hapax-Install-Scope: system` convention). Without this, `try-restart`
# is a no-op for a never-enabled unit, so a freshly-merged timer/service
# installs but never starts — the systemic "merged but sleeping" gap behind
# the FM-11 lane supervisor. The marker is honoured ONLY when the unit also
# has an [Install] section (`systemctl enable` errors on a unit with none).
unit_content_auto_enable() {
local content="$1"
[ -n "$content" ] || return 1
grep -Eiq '^[#;][[:space:]]*Hapax-Auto-Enable:[[:space:]]*(true|yes|1)[[:space:]]*$' <<< "$content" || return 1
grep -Eq '^\[Install\]' <<< "$content" || return 1
return 0
}

# --verify-auto-enable: post-deploy assertion that every unit in the repo's
# systemd/units/ carrying `# Hapax-Auto-Enable: true` is actually `enabled`
# (and, for timers, `active`). Exits 0 if all marked units pass, 1 if any
# marked unit is not live. Lets the operator / smoke runner witness FM-11
# (and any future marked unit) is running, not merely installed. Reads the
# working tree, so it needs no commit SHA and runs before the trace/trap setup.
if [ "$VERIFY_AUTO_ENABLE" = "1" ]; then
UNITS_DIR="$REPO/systemd/units"
if [ ! -d "$UNITS_DIR" ]; then
echo "verify-auto-enable: $UNITS_DIR not found" >&2
exit 2
fi
verify_rc=0
marked_count=0
for uf in "$UNITS_DIR"/*.service "$UNITS_DIR"/*.timer "$UNITS_DIR"/*.path "$UNITS_DIR"/*.target; do
[ -f "$uf" ] || continue
unit_content_auto_enable "$(cat "$uf")" || continue
unit_base="$(basename "$uf")"
marked_count=$((marked_count + 1))
if ! systemctl --user is-enabled "$unit_base" >/dev/null 2>&1; then
echo "FAIL: $unit_base is marked Hapax-Auto-Enable but is not enabled" >&2
verify_rc=1
continue
fi
if [[ "$unit_base" == *.timer ]] \
&& ! systemctl --user is-active "$unit_base" >/dev/null 2>&1; then
Comment on lines +143 to +144
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check the triggered service, not just the timer

For a marked timer such as hapax-lane-supervisor.timer, systemctl --user is-active only verifies that the timer unit is loaded/waiting; the timer can stay active even while its triggered hapax-lane-supervisor.service fails on every elapse, for example if the new %h/.local/bin/hapax-lane-supervisor ExecStart symlink is missing or stale. The added --verify-auto-enable mode would then report success while FM-11 is not actually supervising lanes, so please also fail on the associated service's failed state or otherwise verify that the triggered service can run.

Useful? React with 👍 / 👎.

echo "FAIL: timer $unit_base is marked Hapax-Auto-Enable but is not active" >&2
verify_rc=1
continue
fi
if [[ "$unit_base" == *.timer ]]; then
echo "ok: $unit_base enabled + active"
else
echo "ok: $unit_base enabled"
Comment on lines +149 to +152
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify marked services actually started

For a marked long-running service, --verify-auto-enable reports success as soon as is-enabled succeeds because non-timers only reach this ok: ... enabled branch. If that service is enabled but failed or inactive after deploy (for example due to a bad ExecStart or a later crash), the new post-deploy assertion still exits 0 even though the marker's purpose is to prove marked services went live; it should check active state or successful oneshot completion for services too.

Useful? React with 👍 / 👎.

fi
done
if [ "$marked_count" -eq 0 ]; then
echo "verify-auto-enable: no Hapax-Auto-Enable units in $UNITS_DIR"
elif [ "$verify_rc" -eq 0 ]; then
echo "verify-auto-enable: all $marked_count marked unit(s) enabled/active"
fi
exit "$verify_rc"
fi

_trace_join() {
local item
for item in "$@"; do
Expand Down Expand Up @@ -568,27 +641,36 @@ fi
if [ "${#SYSTEMD[@]}" -gt 0 ]; then
echo "systemd units changed (${#SYSTEMD[@]}):"
mkdir -p "$SYSTEMD_USER_DIR"
UNITS_TO_RESTART=()
SYSTEMD_INSTALLED=()
for f in "${SYSTEMD[@]}"; do
base="$(basename "$f")"
dest="$SYSTEMD_USER_DIR/$base"
if git cat-file -e "$SHA:$f" 2>/dev/null; then
git show "$SHA:$f" > "$dest"
echo " '$f' -> '$dest'"
UNITS_TO_RESTART+=("$base")
SYSTEMD_INSTALLED+=("$f")
else
rm -fv "$dest" || true
fi
done
systemctl --user daemon-reload
for u in "${UNITS_TO_RESTART[@]}"; do
# Only restart if it's already running; timers we try-restart
if systemctl --user is-active --quiet "$u"; then
echo "restarting $u"
restart_user_unit "$u"
elif [[ "$u" == *.timer ]]; then
echo "enabling+starting $u"
systemctl --user enable --now "$u"
for f in "${SYSTEMD_INSTALLED[@]}"; do
base="$(basename "$f")"
if systemctl --user is-active --quiet "$base"; then
# Already running — restart to pick up the new unit definition.
echo "restarting $base"
restart_user_unit "$base"
elif unit_content_auto_enable "$(git show "$SHA:$f" 2>/dev/null || true)"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enable marked units even when already active

When a marked unit is already active because it was manually started but is still disabled, this elif is skipped and the deploy only restarts it; per systemctl --help, is-active checks active state while is-enabled checks whether the unit file is enabled. That leaves an active-but-disabled marked timer/service disabled after deploy, so it will not survive reboot and --verify-auto-enable fails immediately after the deploy processed the marker.

Useful? React with 👍 / 👎.

# Marked `# Hapax-Auto-Enable: true` (+ [Install]): enable+start so a
# freshly-merged service OR timer goes live instead of sleeping.
# try-restart is a no-op for a never-enabled unit (the FM-11 gap).
echo "auto-enabling $base (Hapax-Auto-Enable marker)"
systemctl --user enable --now "$base"
elif [[ "$base" == *.timer ]]; then
# Back-compat: an unmarked NEW timer still auto-enables — timers are
# inert until started, and this has been the behaviour since #3475.
echo "enabling+starting $base"
systemctl --user enable --now "$base"
fi
done
DID_ANYTHING=1
Expand Down
10 changes: 8 additions & 2 deletions systemd/units/hapax-lane-supervisor.service
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
[Unit]
Description=Hapax Lane Supervisor — FM-11 one_for_one re-spawn (dead lanes always auto-restart)
Documentation=file:///home/hapax/projects/hapax-council/scripts/hapax-lane-supervisor
Documentation=file:///home/hapax/.local/bin/hapax-lane-supervisor
After=network.target

[Service]
Type=oneshot
ExecStart=%h/projects/hapax-council/scripts/hapax-lane-supervisor
# Run via the deploy-maintained ~/.local/bin symlink, NOT the canonical
# integrator worktree (%h/projects/hapax-council): that worktree floats across
# feature branches and frequently lacks this script (it shipped in #3803), so a
# canonical-worktree ExecStart would fail to start. The symlink tracks the
# active source-activation release — the same convention the supervisor script
# itself uses for its sibling launchers (hapax-claude-headless, hapax-codex…).
ExecStart=%h/.local/bin/hapax-lane-supervisor
Environment=PATH=%h/.local/bin:/usr/local/bin:/usr/bin:/bin
Environment=HOME=%h
StandardOutput=journal
Expand Down
5 changes: 5 additions & 0 deletions systemd/units/hapax-lane-supervisor.timer
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# Hapax-Auto-Enable: true
# reform-improve-deploy-activation: hapax-post-merge-deploy `enable --now`s
# units carrying this marker, so the FM-11 supervisor goes live on merge
# instead of installing-but-sleeping (try-restart is a no-op for a never-
# enabled timer). The marker requires the [Install] section below.
[Unit]
Description=Hapax Lane Supervisor Timer — guarantee lane liveness every 60s (FM-11)

Expand Down
Loading
Loading