Skip to content

feat(email): replace Mailgun with Resend + Cloudflare inbound email routing#24

Merged
dundas merged 10 commits into
mainfrom
feat/email-resend-inbound
Mar 5, 2026
Merged

feat(email): replace Mailgun with Resend + Cloudflare inbound email routing#24
dundas merged 10 commits into
mainfrom
feat/email-resend-inbound

Conversation

@dundas
Copy link
Copy Markdown
Owner

@dundas dundas commented Mar 4, 2026

Summary

  • Storage: Renamed mailgun_idprovider_message_id across memory.js, mech.js, and outbox.service.js for provider-agnostic storage
  • Outbound (Resend): Full Mailgun → Resend adapter rewrite in outbox.service.js — Bearer auth + JSON, Resend domain management, Svix webhook signature verification; renamed /webhooks/mailgun/webhooks/resend
  • Inbound (Cloudflare Worker): New workers/email-ingestion/ Worker parses MIME via postal-mime, resolves {namespace}.{agent_id}@agentdispatch.io addresses, and forwards to ADMP
  • Inbound webhook route: New POST /webhooks/email/inbound endpoint delivers email as ADMP messages (type: email) to agent inboxes
  • Agent email address: GET /api/agents/:id now returns email_address field
  • Security hardening (adversarial review findings):
    • Svix timestamp freshness check (±5 min) to prevent replay attacks
    • Inbound webhook fails closed when INBOUND_EMAIL_SECRET is unconfigured (was fail-open)
    • 500 catch blocks return generic message; full error logged server-side only
  • Docs: .env.example, README.md, and AGENT-GUIDE.md updated; Mailgun references removed

Test plan

  • bun test src/server.test.js — 158 pass, 15 fail (pre-existing mech.js baseline)
  • bun test workers/email-ingestion/ — 5 pass (parseRecipient unit tests)
  • Verify GET /api/agents/:id returns email_address field
  • Verify POST /webhooks/email/inbound returns 500 when INBOUND_EMAIL_SECRET unset
  • Verify /webhooks/resend rejects stale Svix timestamp (>5 min old)
  • Deploy Cloudflare Worker: cd workers/email-ingestion && wrangler deploy
  • Set wrangler secret put ADMP_URL and wrangler secret put INBOUND_EMAIL_SECRET

Related

  • PRD: tasks/0004-prd-email-inbound-outbound.md
  • Task list: tasks/tasks-0004-prd-email-inbound-outbound.md

🤖 Generated with Claude Code

dundas and others added 7 commits March 2, 2026 11:04
- Rename findOutboxMessageByMailgunId → findOutboxMessageByProviderId in memory.js

- Rename mailgun_id field → provider_message_id in storage scan and outbox service

- Update _findOutboxMessageByMailgunId → _findOutboxMessageByProviderId in outbox.service.js

- Update tests 69-71 to use new method and field names

- mech.js renamed separately (gitignored private adapter)

Related to tasks/0004-prd-email-inbound-outbound.md Task 1.0
- Replace _mailgunRequest/FormData with _resendRequest using Bearer auth + JSON
- addDomain: POST /domains JSON, stores resend_domain_id + provider_state
- _getDomainDnsRecords: parses Resend domain response records[]
- verifyDomain: POST /domains/{resend_domain_id}/verify
- _attemptSend: POST /emails, to as array, stores provider_message_id
- handleWebhook: reads event.type + event.data.email_id
- verifyWebhookSignature: Svix format (svixId, svixTimestamp, svixSignature, rawBody)
- Rename /webhooks/mailgun → /webhooks/resend with express.raw() middleware
- Mount webhook router before express.json() to capture raw body for Svix verification
- Update env var warnings: RESEND_API_KEY, RESEND_WEBHOOK_SECRET
- Update all tests to use _resendRequest stubs, RESEND_* env vars, Resend event format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add src/utils/email.js with agentEmailAddress() helper
  Format: {tenantId}.{agentId}@{domain} (with tenant) or {agentId}@{domain}
  Domain defaults to INBOUND_EMAIL_DOMAIN env var or agentdispatch.io
- Add email_address field to GET /api/agents/:agentId response
- Add POST /api/webhooks/email/inbound route
  Verifies X-Webhook-Secret against INBOUND_EMAIL_SECRET (timingSafeEqual)
  Resolves agent by to_agent + optional to_namespace (tenant match)
  Delivers to inbox via inboxService.send() with verify_signature:false
  Encodes from_email as email:{local}.at.{domain} to satisfy ADMP validator
- Mount emailInboundRouter in server.js, add INBOUND_EMAIL_SECRET warning

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Scaffold workers/email-ingestion/ with package.json, wrangler.toml, index.ts
- parseRecipient(): parse {namespace}.{agentId}@Domain into structured parts
  - acme.alice → namespace=acme, agentId=alice
  - acme.alice.v2 → namespace=acme, agentId=alice.v2 (dots preserved)
  - alice → namespace=null, agentId=alice
- email() handler: parse MIME with postal-mime, POST to ADMP /webhooks/email/inbound
  - Passes X-Webhook-Secret for authentication
  - Calls message.setReject() on 404 (unknown agent)
  - Logs transient errors without bouncing
- 5 unit tests for parseRecipient (all pass)
- README: deployment guide, secrets setup, email routing config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove MAILGUN_API_KEY, MAILGUN_API_URL, MAILGUN_WEBHOOK_SIGNING_KEY from .env.example
- Add RESEND_API_KEY, RESEND_WEBHOOK_SECRET, INBOUND_EMAIL_SECRET, INBOUND_EMAIL_DOMAIN
- Add Email section to README covering inbound (Cloudflare Worker) and outbound (Resend)
- Update Production Checklist with email env vars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndpoints

- outbox.service.js: add Svix timestamp freshness check (±5 min window) to
  verifyWebhookSignature() to prevent replay attacks on /webhooks/resend
- email-inbound.js: fail closed when INBOUND_EMAIL_SECRET is not configured
  (returns 500 SERVER_MISCONFIGURATION instead of silently skipping auth)
- email-inbound.js: sanitize 500 error responses (log error, return generic
  message to avoid leaking internal details)
- server.js: update startup warning to reflect new fail-closed behavior
- server.test.js: update 4 tests to provide auth headers; add test for
  SERVER_MISCONFIGURATION when secret is unconfigured

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Agent email address format ({namespace}.{agent_id}@agentdispatch.io)
- How to retrieve email_address from GET /api/agents/:id
- Receiving email: message type, body fields, from-field encoding
- Sending email: three-step domain setup, send endpoint, status values
- Full domain management endpoint reference table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 4, 2026

Greptile Summary

This PR replaces the Mailgun integration with Resend for outbound email and introduces a new Cloudflare Worker + ADMP inbound webhook pipeline for receiving email into agent inboxes. The overall architecture is sound and the security hardening additions (Svix timestamp freshness check, fail-closed INBOUND_EMAIL_SECRET) are meaningful improvements.

Two issues need attention before this goes to production:

  • Svix secret decoding (src/services/outbox.service.js): Resend provides webhook secrets in the whsec_<base64> format. verifyWebhookSignature passes the raw env-var string to crypto.createHmac without stripping the whsec_ prefix and base64-decoding the key bytes. This means signature verification will always fail for users who copy their secret directly from the Resend dashboard, effectively disabling delivery status webhooks in production.
  • Namespace guard on inbound email (src/routes/email-inbound.js): When an email arrives with no namespace (e.g. alice@agentdispatch.io), the tenant check is skipped. Agents registered under a tenant namespace (whose canonical address is acme.alice@agentdispatch.io) will still receive mail addressed to the plain alice@agentdispatch.io address.
  • Minor: The Resend webhook catch block returns error.message in the response body, inconsistent with the PR's stated goal of returning only generic 500 messages.
  • Minor: DEFAULT_DOMAIN in src/utils/email.js is captured at module load time rather than read dynamically, which can cause stale values in test environments that set INBOUND_EMAIL_DOMAIN after import.

Confidence Score: 2/5

  • Not safe to merge — webhook signature verification will silently fail in production and tenanted agents are reachable at incorrect addresses.
  • Two concrete logic bugs: the Svix whsec_ secret is not decoded before HMAC construction (making delivery-status webhooks non-functional for any real Resend secret), and the namespace guard in the inbound route allows email to be delivered to agents at addresses that don't match their registered namespace. Both bugs would manifest on first production deployment.
  • src/services/outbox.service.js (Svix key decoding) and src/routes/email-inbound.js (namespace guard) require fixes before merge.

Important Files Changed

Filename Overview
src/services/outbox.service.js Full rewrite from Mailgun to Resend; introduces a critical Svix secret handling bug where the whsec_ prefix is not stripped and base64-decoded before use as the HMAC key, causing webhook signature verification to always fail in production.
src/routes/email-inbound.js New inbound email webhook route; contains a logic bug where tenanted agents are reachable at their un-namespaced address when to_namespace is null, allowing cross-namespace email delivery.
workers/email-ingestion/index.ts New Cloudflare Worker that parses MIME via postal-mime and forwards to ADMP; logic is correct and handles 404 bounces properly; domain parameter in parseRecipient is accepted but not validated against the actual recipient domain, though domain filtering happens upstream at Cloudflare.
src/routes/outbox.js Updated Mailgun → Resend webhook route with correct express.raw() placement before express.json(); minor inconsistency where the 500 error catch block still leaks error.message contrary to the PR's stated security hardening goal.
src/utils/email.js New helper for generating agent email addresses; DEFAULT_DOMAIN is resolved at module load time rather than dynamically, which can cause stale values in tests that override INBOUND_EMAIL_DOMAIN after import.

Sequence Diagram

sequenceDiagram
    participant Sender as Email Sender
    participant CF as Cloudflare Email Routing
    participant Worker as CF Worker (email-ingestion)
    participant ADMP as ADMP Server
    participant Resend as Resend API

    Note over Sender,ADMP: Inbound Email Flow
    Sender->>CF: Send email to {namespace}.{agentId}@agentdispatch.io
    CF->>Worker: Forward via Email Routing (catch-all)
    Worker->>Worker: parseRecipient() → namespace + agentId
    Worker->>Worker: PostalMime.parse(raw MIME)
    Worker->>ADMP: POST /api/webhooks/email/inbound<br/>(X-Webhook-Secret header)
    ADMP->>ADMP: timingSafeEqual(secret)
    ADMP->>ADMP: storage.getAgent(to_agent)
    ADMP->>ADMP: inboxService.send(type: email)
    ADMP-->>Worker: 200 OK
    Worker-->>CF: (complete)

    Note over ADMP,Resend: Outbound Email Flow
    ADMP->>ADMP: POST /api/agents/:id/outbox/send
    ADMP->>ADMP: Verify domain status = verified
    ADMP->>Resend: POST /emails (Bearer auth)
    Resend-->>ADMP: { id: re_xxx }
    ADMP->>ADMP: store provider_message_id

    Note over Resend,ADMP: Delivery Status Webhook
    Resend->>ADMP: POST /api/webhooks/resend<br/>(svix-id, svix-timestamp, svix-signature)
    ADMP->>ADMP: verifyWebhookSignature (HMAC-SHA256)
    ADMP->>ADMP: updateOutboxMessage(status)
    ADMP-->>Resend: 200 OK
Loading

Last reviewed commit: 8b0b3b0

Comment on lines +86 to +97
// --- Agent resolution ---
let agent = await storage.getAgent(to_agent);
if (to_namespace && agent && agent.tenant_id !== to_namespace) {
agent = null;
}

if (!agent) {
return res.status(404).json({
error: 'AGENT_NOT_FOUND',
message: `Agent ${to_agent} not found`
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tenanted agents reachable at their un-namespaced address

When to_namespace is null (email sent to alice@agentdispatch.io), the namespace-mismatch check on line 88 is short-circuited by the to_namespace && guard. An agent that belongs to tenant acme (whose correct address is acme.alice@agentdispatch.io) will still receive mail addressed to alice@agentdispatch.io because the check is skipped entirely.

The fix is to also reject the delivery when the incoming email has no namespace but the resolved agent does have a tenant_id:

let agent = await storage.getAgent(to_agent);
if (agent) {
  const namespaceMismatch =
    (to_namespace && agent.tenant_id !== to_namespace) ||
    (!to_namespace && agent.tenant_id);
  if (namespaceMismatch) {
    agent = null;
  }
}

Comment thread src/utils/email.js Outdated
* Email address helpers for ADMP agent email addresses
*/

const DEFAULT_DOMAIN = process.env.INBOUND_EMAIL_DOMAIN || 'agentdispatch.io';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level DEFAULT_DOMAIN snapshot prevents runtime override

DEFAULT_DOMAIN is resolved once at import time. Every other sensitive env var in this codebase (RESEND_API_KEY, RESEND_WEBHOOK_SECRET, INBOUND_EMAIL_SECRET) is read dynamically via a getter function so tests can override them with process.env.XXX = … after import. This constant will silently ignore any post-import change, which can cause agentEmailAddress() to return stale values in tests or multi-tenant scenarios.

Suggested change
const DEFAULT_DOMAIN = process.env.INBOUND_EMAIL_DOMAIN || 'agentdispatch.io';
const DEFAULT_DOMAIN = process.env.INBOUND_EMAIL_DOMAIN || 'agentdispatch.io';

Consider wrapping in a getter instead:

function getEmailDomain() {
  return process.env.INBOUND_EMAIL_DOMAIN || 'agentdispatch.io';
}

and using getEmailDomain() inside agentEmailAddress.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 4, 2026

Additional Comments (2)

src/services/outbox.service.js
Svix whsec_ secret not decoded before HMAC

Resend's dashboard provides webhook secrets in Svix format: whsec_<base64-encoded-bytes>. The actual HMAC key must be the base64-decoded bytes — not the raw string with the whsec_ prefix still attached. The current code passes the raw env-var value directly to crypto.createHmac, so any user who copies their secret from Resend's dashboard verbatim will always get a verification failure (or a silently incorrect HMAC comparison if lengths happen to match).

The correct approach:

// strip "whsec_" prefix and base64-decode
const keyBytes = Buffer.from(secret.replace(/^whsec_/, ''), 'base64');
const hmac = crypto.createHmac('sha256', keyBytes);

Without this, verifyWebhookSignature will never return true in production when RESEND_WEBHOOK_SECRET is set to the value from the Resend dashboard.


src/routes/outbox.js
Internal error message leaked in webhook 500 response

The PR description explicitly states a security goal: "500 catch blocks return generic message; full error logged server-side only." The inbound email route (email-inbound.js) follows this correctly — it logs the full error and returns 'Internal server error'. However, this catch block in the Resend webhook handler still returns error.message directly to the caller:

res.status(500).json({ error: 'WEBHOOK_FAILED', message: error.message });

Resend is a trusted caller, so the risk is low, but consistency with the stated security hardening would suggest logging and returning a generic message here too.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 4, 2026

PR Review: feat(email): replace Mailgun with Resend + Cloudflare inbound email routing

Overall this is a solid, well-scoped migration. The provider-agnostic renaming is clean, the fail-closed behavior on missing secrets is exactly right, and the test coverage for the new inbound webhook is thorough. A few items to address before merging:


Critical: Potential bug with Svix HMAC secret encoding

File: src/services/outbox.service.js in verifyWebhookSignature

Resend (via Svix) provides webhook secrets in the format whsec_{base64-encoded-bytes}. The HMAC key should be the base64-decoded bytes, not the raw string including the whsec_ prefix. Using the raw string as the key will produce a different signature than Svix computes, so every webhook verification will return false in production — the endpoint will 403 all valid Resend webhooks when RESEND_WEBHOOK_SECRET is configured.

Suggested fix: strip the prefix and base64-decode before passing to createHmac:

const keyBytes = Buffer.from(secret.replace(/^whsec_/, ''), 'base64');
const hmac = crypto.createHmac('sha256', keyBytes);

Medium: from_email not validated before use

File: src/routes/email-inbound.js around line 392

The from_email field is authenticated only by the shared secret, not validated as a legal email address. If the Worker forwards a malformed address containing ADMP-unsafe characters, the envelope could fail downstream with a confusing 500. A simple format check before the replace call would make the failure explicit with a proper 400.


Medium: Namespace/tenant mismatch branch is untested

File: src/routes/email-inbound.js around lines 377-386

The logic that nulls out agent when to_namespace does not match agent.tenant_id is correct, but there is no test for this branch. Adding a test where the agent exists but the namespace is wrong would confirm the 404 is reliably returned.


Minor: Startup warnings are noisy for non-email deployments

File: src/server.js around lines 564-588

The three new console.warn calls fire unconditionally on every server start. Operators not using email will see warnings about RESEND_API_KEY, RESEND_WEBHOOK_SECRET, and INBOUND_EMAIL_SECRET on every boot. Consider gating the RESEND_WEBHOOK_SECRET warning on RESEND_API_KEY being set, since that webhook endpoint is only meaningful when outbound email is enabled.


Minor: DEFAULT_DOMAIN captured at module load time

File: src/utils/email.js line 13

The module-level DEFAULT_DOMAIN constant captures INBOUND_EMAIL_DOMAIN at import time. The test comment acknowledges this. Reading the env var lazily inside the function body would be more robust and avoids subtle test ordering issues.


Note: Pre-existing failing tests

The test plan notes 15 pre-existing failures in server.test.js. While not introduced by this PR, landing more code on a red baseline increases the debugging surface. Worth tracking in a follow-up issue.


What is well done

  • Fail-closed on missing INBOUND_EMAIL_SECRET returns 500 immediately rather than accepting unauthenticated requests. Exactly right.
  • timingSafeEqual for secret comparison with the length-mismatch catch block is correct.
  • Svix replay protection via the 5-minute timestamp freshness check is a meaningful defense against replay attacks.
  • Generic error messages in catch blocks: full errors logged server-side only, external responses use opaque messages.
  • Provider-agnostic rename (mailgun_id to provider_message_id, etc.) is clean and forward-looking.
  • Worker correctly rejects unknown recipients via SMTP using message.setReject on 404.
  • Test coverage for the new inbound webhook covers the full happy/sad path matrix.

Review generated by Claude Code

dundas added 3 commits March 5, 2026 13:40
- Inbound email enters review_pending; trusted senders auto-queued
- POST /api/webhooks/email/inbound/:messageId/review (approve|reject, reason, model_verdict)
- GET/POST/DELETE /api/agents/:agentId/email/trusted-senders
- Inbox service: initial_status, bypass_trust_check, system_metadata guard
- Docs: README and AGENT-GUIDE updated for policy and endpoints

Made-with: Cursor
… setup docs

- src/lib/cloudflare.js: ported from circleinbox — DNS CRUD, upsertDnsRecord,
  provisionResendDns(), Email Routing enable/catch-all API
- scripts/provision-resend-dns.js: check + upsert all 4 Resend records (DKIM,
  SPF MX, SPF TXT, DMARC) against Cloudflare; safe to re-run
- docs/EMAIL-SETUP.md: step-by-step operator checklist (env, Worker deploy,
  Resend DNS in Cloudflare, validation)
- .env.example: add CLOUDFLARE_API_TOKEN/ZONE_ID and RESEND_DNS_* vars
- workers/email-ingestion/wrangler.toml: remove invalid [[email]] section
- README.md: link to EMAIL-SETUP.md from deployment checklist

Made-with: Cursor
- outbox.service: strip whsec_ prefix and base64-decode Svix secret before
  HMAC construction — without this, signature verification always fails when
  RESEND_WEBHOOK_SECRET is copied directly from the Resend dashboard
- email-inbound: add inverse namespace guard — tenanted agents are no longer
  reachable at their un-namespaced address (alice@ vs acme.alice@)
- email-inbound: add namespace mismatch test covering both wrong-namespace
  and no-namespace-to-tenanted-agent cases
- outbox: Resend webhook 500 catch returns generic message (no error.message leak)
- email.js: read INBOUND_EMAIL_DOMAIN lazily at call time, not at module load
- server: gate RESEND_WEBHOOK_SECRET warning on RESEND_API_KEY being set

Made-with: Cursor
@dundas
Copy link
Copy Markdown
Owner Author

dundas commented Mar 5, 2026

Review Feedback Addressed

All blocking and important issues resolved in 63b5b1a.

Blocking — Svix whsec_ decoding (outbox.service.js): strip prefix and base64-decode before HMAC — signature verification was always failing in production with a real Resend secret.

Blocking — Namespace guard (email-inbound.js): added inverse guard — tenanted agents (acme.alice@) are no longer reachable at bare address (alice@). Added test for wrong-namespace and no-namespace-to-tenanted-agent cases.

Minor — Resend webhook 500 returns generic error (no error.message leak), INBOUND_EMAIL_DOMAIN read lazily in email.js, RESEND_WEBHOOK_SECRET warning gated on RESEND_API_KEY being set.

All 13 email inbound tests pass. Ready for re-review.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 5, 2026

Overall this is a well-structured migration. Provider-agnostic naming, the inbound quarantine design, and test coverage are all solid. That said, I found one critical bug that will break Resend webhook verification in production.


CRITICAL: Svix webhook signature key not base64-decoded

File: src/services/outbox.service.js, method: verifyWebhookSignature

Resend Svix secrets are formatted as whsec_ followed by a base64-encoded key. The actual HMAC key bytes are the base64-decoded payload, not the raw string. The current code passes the raw secret including the prefix directly to crypto.createHmac, which produces the wrong HMAC for every real Resend secret -- verification will always fail in production.

Fix: strip the whsec_ prefix and base64-decode the remainder before passing to createHmac. The existing tests pass because they use the plain string 'test-signing-key' with no prefix, so both sides compute the same wrong HMAC. The bug will not surface until tested against a real Resend account.


Security: no validation of from_email format in the inbound webhook

src/routes/email-inbound.js -- from_email is stored without format validation. A malformed value (no at-sign for example) goes through the replacement unchanged and ends up in the envelope returned to API consumers. A check analogous to SIMPLE_EMAIL_RE in agents.js would harden this boundary.

Security: inbound webhook as an agent-existence oracle

Once authenticated, anyone with the shared secret can probe arbitrary to_agent values and observe 404 vs 200. Likely acceptable given the Worker is the only expected caller, but worth documenting in operator notes so the shared secret is treated accordingly.

Performance: findOutboxMessageByProviderId does a full scan in memory storage. An explicit TODO comment would ensure this is not missed when adding a persistent backend.


Minor nits:

getZoneId apex extraction via split dot slice(-2) breaks for ccTLD second-levels like foo.co.uk. Low risk for operator tooling but worth a comment.

listZones and getDnsRecords have no pagination (50 and 200 record caps respectively). Could silently miss records on large accounts.

email-inbound.js creates a second pino() instance at module level rather than sharing the app-wide logger.

The Resend webhook catch block in src/routes/outbox.js returns error.message to the caller. The PR description states 500 catch blocks should return a generic message with full error logged server-side only. The email-inbound route gets this right and outbox.js should follow suit.


Things done well:

Timing-safe comparison for the inbound secret check.
Fail-closed when INBOUND_EMAIL_SECRET is unset (500 not pass-through).
Svix timestamp freshness check prevents replay attacks.
Mounting the webhook router before express.json() to allow express.raw() body capture -- well-explained in the server.js comment.
bypass_trust_check plus quarantine is a clean design: inbound email enters the ADMP flow but is held until explicitly approved.
Test coverage for quarantine to approve to pull, trusted-sender fast-path, reject flow, and auth failures is thorough.
Provider-agnostic field naming (provider_message_id, provider_state) makes a future provider swap easier.
Adding work_order_result to RETAIN_TYPES is a sensible companion change.


Verdict: The Svix HMAC key decoding bug must be fixed before merging -- it will silently break all delivery-status webhook updates in production. The remaining items are improvements rather than blockers.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 5, 2026

PR Review: feat(email) - Mailgun to Resend migration

Well-structured PR with good security defaults. Several issues to address before merge:

BUGS

  1. Plus-sign emails crash on delivery (email-inbound.js:132)
    from_email.replace('@', '.at.') produces user+tag.at.example.com. The + character fails SAFE_CHARS validation in isValidAgentId() causing a 500. Plus-addressed emails are very common. Strip or replace non-SAFE_CHARS characters from the local part before constructing fromId.

  2. Double delay on retries (outbox.service.js:251-254 and :303)
    _attemptSend has an in-function backoff for attempts > 0, but retries are already pre-delayed by _scheduleRetry. First retry waits 3s instead of 2s, second waits 6s instead of 4s. Remove the in-function delay block.

SECURITY

  1. Review endpoint shares ingestion secret (email-inbound.js:181-183)
    POST /webhooks/email/inbound/:messageId/review uses the same INBOUND_EMAIL_SECRET as the Cloudflare Worker, implicitly giving the Worker permission to approve/reject quarantined messages. Use a separate credential (e.g. MASTER_API_KEY) for the review endpoint to separate ingestion and policy principals.

  2. Raw error.message forwarded in 4xx responses (outbox.js:37-43, :79-84, :150-157)
    The PR hardened 500 blocks to return generic messages, but 4xx handlers in outbox.js still expose internal details (agent IDs, domain names). Apply the same pattern used in email-inbound.js.

NOTABLE GAPS

  1. parseRecipient does not validate the domain suffix (workers/email-ingestion/index.ts:29-43)
    The domain parameter is never used to verify the @Domain part. Cross-domain emails from a misconfigured zone will still be routed. Add a domain-match assertion.

  2. Worker silently drops messages on ADMP 5xx (workers/email-ingestion/index.ts:93-99)
    Non-404 error responses cause silent message loss at the SMTP level. Document this in the Worker README and plan a dead-letter strategy.

  3. email_address always in GET /api/agents/:id (agents.js:130)
    Falls back to agentdispatch.io when INBOUND_EMAIL_DOMAIN is unset, exposing a non-functional address for deployments without email configured. Return null when unset.

  4. getDnsRecords pagination missing (cloudflare.js:92-93)
    per_page=200 with no pagination loop means zones with 200+ records silently miss stale entries in upsertDnsRecord. Add a TODO comment at minimum.

MINOR

  • CLOUDFLARE_API_KEY undocumented fallback in cloudflare.js:14 - document or remove
  • Timestamp drift in email-inbound.js:135,141 - use new Date(now).toISOString() for consistency
  • 15 pre-existing test failures - worth a follow-up tracking issue

WHAT IS DONE WELL

  • Svix timestamp freshness check prevents replay attacks
  • Fail-closed on unset INBOUND_EMAIL_SECRET
  • Trusted-sender allowlist with review_pending quarantine
  • timingSafeEqual for both secrets and Svix signatures
  • encodeURIComponent on Resend API paths
  • Namespace guard preventing cross-tenant delivery

Issues 1 and 2 are concrete bugs to fix before merging. Issue 3 is worth a design discussion on whether the review API is for human operators only or also automated policy engines.

@dundas dundas merged commit a79feaf into main Mar 5, 2026
2 checks passed
@dundas dundas deleted the feat/email-resend-inbound branch March 5, 2026 21:18
dundas added a commit that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant