Skip to content

[ENH] Pre-publish CI: run aws-sam-cli durable integration tests against candidate emulator image #225

@yaythomas

Description

@yaythomas

Background

The emulator image public.ecr.aws/durable-functions/aws-durable-execution-emulator:latest is consumed automatically by aws-sam-cli: on every sam local invoke of a durable function, sam-cli pulls :latest and refreshes the local cache (see durable_functions_emulator_container.py; customers can override per-invoke with DURABLE_EXECUTIONS_EMULATOR_IMAGE_TAG but the default is :latest). This means any image we publish ships immediately to every durable-functions customer running sam-cli, with no version pin in between.

PR #216 in this repo recently demonstrated the blast radius: ~26 sam-cli durable integration tests went red across local-invoke, local-start-lambda, tier1-finch, and tier1-windows-other jobs (e.g. aws-sam-cli Integration Tests #496, run #8779 / local-start-lambda) the moment v1.2.0 went to :latest. Customer-visible symptom: a fresh samdev local invoke against any durable function 500s on first checkpoint or 404s on first local execution get|history|stop|callback. Mitigations are in flight on the sam-cli side (aws/aws-sam-cli#9038 merged, #9040 open) but they do not address the class problem: this repo's release pipeline has no signal from sam-cli before publishing :latest.

Why our existing tests didn't catch this

tests/web/e2e/routes_arn_encoding_int_test.py (added in #222) drives a real boto client against this repo's WebServer and would have caught the emulator-side routing bug. It does not — and cannot — exercise sam-cli's LocalLambdaHttpService, which is a separate Flask service that customers' boto clients actually hit when using samdev local invoke. Anything we change in the ARN, callback ID, or function-qualifier shape can break sam-cli's service without touching ours.

Proposal

Add a pre-publish CI step that builds the candidate emulator image and runs sam-cli's durable integration suite against it. Concrete shape:

  1. Build the emulator image from this repo (we already do this in ecr-release.yml).
  2. Tag it locally with a candidate tag, e.g. aws-durable-execution-emulator:pr-${SHA}.
  3. Check out aws/aws-sam-cli at develop, install in SAM_CLI_DEV=1 mode.
  4. Run, with DURABLE_EXECUTIONS_EMULATOR_IMAGE_TAG=pr-${SHA}:
    pytest -vv \
      tests/integration/local/invoke/test_invoke_durable.py \
      tests/integration/local/start_lambda/test_start_lambda_durable.py \
      tests/integration/local/execution/test_execution.py \
      tests/integration/local/callback/test_callback.py
    That's the durable subset — ~50 tests, runs in ~3–5 min in CI based on the local-invoke and local-start-lambda timings above.
  5. Publish to ECR only if step 4 is green.

Gate this on PRs that touch src/** so we get the signal pre-merge as well as pre-publish.

Acceptance criteria

  • A workflow (e.g. .github/workflows/sam-cli-compat.yml) that runs the four sam-cli durable test files against the locally-built emulator image and is required for PRs that change src/.
  • The publish job (ecr-release.yml) gated on the same workflow's success.
  • A README / CONTRIBUTING note explaining that any change affecting the emulator's HTTP contract — ARN shape, callback-token shape, route layout, response codes — must keep this job green.

Out of scope

  • Pinning sam-cli to a specific emulator tag. That just inverts the dependency: customers stop picking up emulator fixes until sam-cli ships a new release. Roll-forward + this CI gate is the durable answer.
  • Running the full sam-cli integration suite. The four files above cover every code path that talks to the emulator.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions