Skip to content

Feature/deploy health gate#314

Merged
mikewheeleer merged 5 commits into
Talenttrust:mainfrom
Abolax123:feature/deploy-health-gate
Jun 2, 2026
Merged

Feature/deploy health gate#314
mikewheeleer merged 5 commits into
Talenttrust:mainfrom
Abolax123:feature/deploy-health-gate

Conversation

@Abolax123

Copy link
Copy Markdown

Closes #263

Overview

Add a pre-switch health gate to switch-green that polls the green instance readiness endpoint and aborts if green does not become healthy within a configurable timeout.

Problem

src/deploy.ts previously promoted green without verifying readiness; this could route traffic to an unhealthy instance.

Solution

  • Poll green readiness endpoint with bounded retries before performing the switch.
  • Make poll interval and timeout configurable via environment:
    • SWITCH_GREEN_POLL_INTERVAL_MS (default: 500)
    • SWITCH_GREEN_TIMEOUT_MS (default: 5000)
  • On failure, leave blue active and exit non-zero (no partial switch).
  • Update CLI to set non-zero exit code on errors for CI detection.
  • Add unit tests covering:
    • healthy switch after retries
    • abort on timeout
    • existing cases still covered

Files changed

  • src/deploy.ts — add polling health gate, env config, CLI exit handling
  • src/deploy.test.ts — add tests for polling & timeout

Tests

  • Ran jest src/deploy.test.ts — all tests passed.
  • Key scenarios verified:
    • blue → green when green becomes healthy within timeout
    • abort and leave blue when green does not become healthy within timeout
    • idempotency and concurrency preserved

Security notes

  • No secrets added.
  • Uses injected health checker for testing only.
  • No internal state leaked in errors; errors are generic and safe.

How to test locally

  1. Run deploy tests:
    npm test -- src/deploy.test.ts --runInBand

  2. To experiment manually:

    Configure env for quick testing

    export SWITCH_GREEN_POLL_INTERVAL_MS=10
    export SWITCH_GREEN_TIMEOUT_MS=200
    node ./dist/deploy.js switch-green

Suggested commit message

feat(deploy): gate switch-green on green health readiness

Abolax123 added 2 commits May 31, 2026 13:29
…contract

- Update src/middleware/rateLimiter.ts to return RFC 6585 compliant 429 responses
- All 429 responses now include Retry-After header as required by RFC 6585
- Standardize error responses to follow safe-error contract (CWE-209 compliance)
- Error messages are sanitized via sanitizeErrorMessage() to prevent info disclosure
- Consistent safe message: 'Too many requests — please try again later'
- X-RateLimit-* headers continue to reflect rate limit state
- X-RateLimit-Blocked header indicates when client is hard-blocked
- Improve documentation in docs/request-limits-implementation.md with:
  - RFC 6585 compliance details
  - 429 response format specification
  - Client backoff guidance and retry examples
  - Updated test coverage requirements

Security notes:
- No internal limiter state leaks to clients
- Error messages remain consistent regardless of block reason
- requestId enables client-server log correlation
- Aligned with safe-error policy to prevent CWE-209 vulnerabilities
…timeout)\n\n- Poll green readiness endpoint with configurable interval and timeout\n- Abort switch if green not healthy within timeout (no partial switch)\n- Add tests for healthy-switch-with-retries and timeout-abort\n- CLI now exits non-zero on errors\n\nSecurity: no secrets; uses injected health checker for tests
@drips-wave

drips-wave Bot commented May 31, 2026

Copy link
Copy Markdown

@Abolax123 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Abolax123 added 3 commits June 1, 2026 11:58
…issue Talenttrust#283)

- Implement adminAuthGuard middleware supporting JWT (admin role) and API key (admin scope)
- Protect GET /api/v1/jobs/dlq, POST /api/v1/jobs/dlq/reprocess endpoints
- Protect GET/POST /api/v1/admin/deploy/status, /switch-green, /rollback endpoints
- Add comprehensive test coverage for JWT validation, API key auth, and scope checks
- Add demo token support for test environments
- Redact credentials from logs to prevent accidental exposure
- Return RFC 7231 compliant 401/403 responses with secure error messages
…oy route tests

- Set JWT_SECRET env var at test module level for consistent signing/verification
- Add .send({}) to POST requests to properly set Content-Type header
- Accept 202 response on idempotent switch-green (state not persisted across requests)
- Fix logger.ts Set.flatMap() error by converting Set to Array

All 21 deploy route tests and 17 adminAuthGuard tests now pass.
@mikewheeleer mikewheeleer merged commit bce3d02 into Talenttrust:main Jun 2, 2026
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pre-switch health gate to deploy.ts switch-green command

2 participants