Skip to content

SRCH-6443: CodeDeploy hooks for crawler Resque (systemd alignment)#2002

Open
luisgmetzger wants to merge 1 commit intoGSA:stagingfrom
luisgmetzger:srch-6443-resque-crawler-fix
Open

SRCH-6443: CodeDeploy hooks for crawler Resque (systemd alignment)#2002
luisgmetzger wants to merge 1 commit intoGSA:stagingfrom
luisgmetzger:srch-6443-resque-crawler-fix

Conversation

@luisgmetzger
Copy link
Copy Markdown
Contributor

SRCH-6443

Production crawler Resque workers were not managed by systemd, so CodeDeploy application_stop / application_start skipped resque-worker and resque-scheduler, leaving processes on pruned release paths.

Changes

  • Source optional /home/search/.config/searchgov-codedeploy.env (installed by searchgov-ansible crawl playbook on crawlers) so hooks can enforce stricter checks only on those hosts.
  • When REQUIRE_RESQUE_SERVICES=true: fail ApplicationStop if units are missing; require active resque services in ValidateService.
  • After systemd stop, SIGTERM/SIGKILL leftover resque processes (disable with SKIP_ORPHAN_RESQUE_SIGTERM=true if needed).
  • Add verify_resque_cwd.sh invoked when REQUIRE_RESQUE_CWD_CHECK=true.

Rollout

  1. Merge this PR.
  2. Run crawl Ansible (resque_systemd role) on production crawlers.
  3. Deploy via CodeDeploy as usual.

Related

  • searchgov-ansible: systemd units + env file (companion PR).
  • searchgov-tf: optional CloudWatch alarm on LoadError in resque.log (companion PR).

@luisgmetzger luisgmetzger changed the base branch from main to staging March 31, 2026 18:53
@luisgmetzger luisgmetzger force-pushed the srch-6443-resque-crawler-fix branch from 2710b83 to 0bc168d Compare March 31, 2026 18:53
- Source optional /home/search/.config/searchgov-codedeploy.env (Ansible-managed on crawlers).
- When REQUIRE_RESQUE_SERVICES=true: fail ApplicationStop if units missing; require active resque-worker/resque-scheduler in ValidateService.
- After stop, terminate orphan resque processes (optional SKIP_ORPHAN_RESQUE_SIGTERM).
- Add verify_resque_cwd.sh for REQUIRE_RESQUE_CWD_CHECK post-deploy validation.
@luisgmetzger luisgmetzger force-pushed the srch-6443-resque-crawler-fix branch from 0bc168d to 25dff96 Compare March 31, 2026 19:00
@YaritzaGarcia YaritzaGarcia self-requested a review April 1, 2026 18:20
Copy link
Copy Markdown
Collaborator

@YaritzaGarcia YaritzaGarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read everything and review it carefully and looks good to me.

@selfdanielj selfdanielj self-requested a review April 1, 2026 19:37
PUMA_SERVICE="${PUMA_SERVICE:-puma}"
RESQUE_WORKER_SERVICE="${RESQUE_WORKER_SERVICE:-resque-worker}"
RESQUE_SCHEDULER_SERVICE="${RESQUE_SCHEDULER_SERVICE:-resque-scheduler}"
APP_HEALTHCHECK_URL="${APP_HEALTHCHECK_URL:-http://127.0.0.1:3000/}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the app even accept a http connection?

@selfdanielj
Copy link
Copy Markdown
Contributor

this seems like a really complex solution to come up with that you probably can't have tested locally... what makes you think this will solve the problem? Were you able to do any local testing to show that this will work?

Don't you have to add these scripts to appspec to run? As is, if this is merged, nothing will happen right?

is this meant to run with capistrano, without capistrano, or be independent of capistrano?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants