cron: catch up jobs that have never recorded a successful run#139
Draft
alex-fedotyev wants to merge 2 commits into
Draft
cron: catch up jobs that have never recorded a successful run#139alex-fedotyev wants to merge 2 commits into
alex-fedotyev wants to merge 2 commits into
Conversation
APScheduler's numeric day_of_week is 0=Mon..6=Sun, while Unix crontab is 0=Sun..6=Sat (7 also means Sun). CronTrigger.from_crontab passes the number straight through without remapping, so every numeric day-of-week schedule fired one weekday late: "0 13 * * 1" (Monday) actually ran on Tuesday, and "0 3 * * 0" (Sunday) ran on Monday. Add _crontab_to_trigger, a from_crontab replacement that translates the day-of-week field to APScheduler's day-name aliases (preserving *, ranges like 1-5, lists like 1,4, and step suffixes) and leaves the other four fields and the timezone handling unchanged. Route the three trigger-construction sites (job scheduling, source scheduling, and the overdue check) through it. Tests cover the day-of-week mapping, parity with from_crontab for non-DOW schedules, the interval-string fallback, and a weekly _is_overdue case that is due after exactly one week. After a restart, every numeric day-of-week cron shifts to the day its schedule already specifies; that is the intended correction.
_catchup_missed_jobs skipped any job whose last successful run was missing, so a weekly job that missed its first fire during a daemon restart was starved: with no success row to measure against, it was never backfilled and waited a full week for the next scheduled fire. Replace the success-only check with _catchup_baseline, which measures a missed fire against the last successful run, then the most recent attempt of any status (so a job that has only ever errored is still retried), then the server start time. Anchoring a job with no history to server start keeps a brand-new, not-yet-due job from firing spuriously, since its first fire is still in the future. Add get_last_cron_run (most recent cron_logs row regardless of status) and record the server start time on startup. Tests cover the baseline selection (success, error-only, no-history) and end-to-end catch-up for a weekly job that only ever errored.
4397e60 to
1153f5a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #138 (the day-of-week fix); please review that one first. This diff includes #138's commit until it merges, after which it shows only the catch-up change. It is a draft for that reason.
_catchup_missed_jobsruns at startup and fires any job that should have run while the daemon was down. It skipped any job whose last successful run was missing, so a job that had never recorded a success was never backfilled. Together with #138, a weekly job that missed its first fire during a restart could wait a full week, or longer if it kept missing, before it ran.This replaces the success-only check with
_catchup_baseline, which measures a missed fire against, in order:Anchoring a job with no history at all to the server start time keeps a brand-new, not-yet-due job from firing spuriously: its first fire is still in the future, so it is not overdue.
Also adds
get_last_cron_run(the most recentcron_logsrow regardless of status) and records the server start time on startup.Test plan
pytest tests/test_cron.py: 88 passed locally.TestCatchupBaselinecovers baseline selection: prefers the last success, falls back to the last attempt of any status when the job never succeeded, then to server start, then to now when server start is unset.