Skip to content

chore(gha): cap missing integration matrix parallelism#239

Merged
so0k merged 1 commit into
open-constructs:mainfrom
sakul-learning:ci/add-missing-integration-max-parallel
Jun 5, 2026
Merged

chore(gha): cap missing integration matrix parallelism#239
so0k merged 1 commit into
open-constructs:mainfrom
sakul-learning:ci/add-missing-integration-max-parallel

Conversation

@sakul-learning
Copy link
Copy Markdown
Contributor

@sakul-learning sakul-learning commented Jun 5, 2026

Summary

  • Add max-parallel to the integration-test matrix jobs that did not have it yet.
  • Leave the existing regular integration-test max-parallel: 30 unchanged.
  • This is intentionally limited to missing matrix throttles; it does not change the existing integration workflow cap pending maintainer agreement on the broader analysis.

Why

GitHub documents jobs.<job_id>.strategy.max-parallel as the way to cap simultaneous matrix jobs. GitHub also documents that Actions jobs and workflow runs run concurrently by default, and its REST/API guidance recommends avoiding concurrent request bursts because secondary rate limits can be triggered even before hourly primary limits are exhausted.

Relevant limits/recommendations considered:

  • GitHub Actions matrix size limit: 256 jobs per workflow run.
  • GITHUB_TOKEN primary REST API limit: 1,000 requests/hour/repository.
  • GitHub REST secondary limit guidance: 900 points/minute for REST endpoints, plus concurrency-sensitive abuse protection.
  • GitHub REST best practices recommend avoiding concurrent requests and using queued/serialized request processing where possible.
  • Depot runners remove runner-capacity pressure and document no runner concurrency limit.
  • Depot runners also automatically use Depot Cache for GitHub Actions cache API operations. Depot documents that actions/cache, setup actions, and any tool using the GitHub Actions cache API automatically use Depot Cache on Depot runners with no workflow changes. Therefore, cache calls on Depot-backed jobs should not be counted as GitHub cache-service pressure, although non-cache GitHub service traffic such as checkout, artifact download, and action download still remains.

Deterministic workflow/matrix breakdown

Computed from the repo's matrix builder scripts on origin/main:

examples:     63 matrix jobs
integration:  152 matrix jobs
provider:     12 matrix jobs

Active PR integration-related runner jobs:

examples workflow:
  prep jobs:    2
  matrix jobs:  63
  total:        65

integration workflow:
  prep jobs:     1
  linux matrix:  152
  windows matrix: disabled with if: false
  total:         153

provider integration workflow:
  prep jobs:       1
  linux matrix:    12
  windows matrix:  12
  total:           25

Combined integration-related runner jobs: 243

Current initial matrix wave before this PR:

examples:            63  # no max-parallel; mostly GitHub-hosted, one Depot example job
linux integration:   30  # existing max-parallel: 30; Depot runners
linux provider:      12  # no max-parallel, matrix size 12; Depot runners
windows provider:    12  # no max-parallel, matrix size 12; GitHub-hosted runners

total initial matrix wave: 117 jobs

The deterministic script counted GitHub-facing action steps in that initial wave. Raw counts, before accounting for Depot Cache:

initial matrix wave jobs:      117
checkout steps:                117
cache/cache-restore steps:     420
artifact download steps:       147
setup actions:                 129
total `uses:` action steps:    813

After excluding cache/cache-restore steps from jobs running on Depot runners, the cache-service pressure is lower:

initial matrix wave jobs:            117
checkout steps:                      117
GitHub cache/cache-restore steps:    272
artifact download steps:             147
setup actions:                       129
effective non-Depot-cache count:     665

This does not mean exactly 665 REST API calls, but it is a deterministic signal that the first wave still causes a large simultaneous burst against GitHub-managed services, primarily checkout, artifacts, action downloads, and cache calls from GitHub-hosted jobs.

Scenarios evaluated

current:
  examples:            63
  linux integration:   30
  linux provider:      12
  windows provider:    12
  initial wave total:  117
  effective non-Depot-cache action-step count: 665

provider max-parallel: 30 only:
  examples:            63
  linux integration:   30
  linux provider:      12
  windows provider:    12
  initial wave total:  117
  note: no practical change today because provider matrices have only 12 jobs each

examples 30 + provider 30:
  examples:            30
  linux integration:   30
  linux provider:      12
  windows provider:    12
  initial wave total:  84

optimum evaluated earlier, including changing existing integration cap:
  examples:            20
  linux integration:   20
  linux provider:      10
  windows provider:    10
  initial wave total:  60
  effective non-Depot-cache action-step count: 306

This PR deliberately does not change linux_integration from 30 to 20; that should wait for maintainer agreement on the broader analysis.

What this PR changes

This PR adds only the missing caps:

examples:            max-parallel: 20
linux provider:      max-parallel: 10
windows provider:    max-parallel: 10
linux integration:   unchanged at max-parallel: 30

Resulting initial matrix wave after this PR:

examples:            20
linux integration:   30
linux provider:      10
windows provider:    10

initial wave total:  70 jobs

Raw action-step counts after this PR:

initial matrix wave jobs:      70
checkout steps:                70
cache/cache-restore steps:     240
artifact download steps:       100
setup actions:                 80
total `uses:` action steps:    490

After excluding Depot-runner cache/cache-restore steps:

initial matrix wave jobs:            70
checkout steps:                      70
GitHub cache/cache-restore steps:    96
artifact download steps:             100
setup actions:                       80
effective non-Depot-cache count:     346

That reduces the initial matrix burst from 117 to 70 jobs without changing the existing integration workflow cap.

Validation

YAML ok: .github/workflows/examples.yml
YAML ok: .github/workflows/provider-integration.yml
YAML ok: .github/workflows/integration.yml
matrix_counts= {'examples': 63, 'provider': 12, 'integration': 152}
max_parallel= {'examples': 20, 'linux_integration': 30, 'linux_provider': 10, 'windows_provider': 10}
initial_wave_after_change= {'examples': 20, 'linux_integration': 30, 'linux_provider': 10, 'windows_provider': 10} total= 70

Also ran git diff --check successfully.

@sakul-learning sakul-learning requested a review from a team as a code owner June 5, 2026 00:42
@sakul-learning sakul-learning changed the title ci: cap missing integration matrix parallelism chore(gha): cap missing integration matrix parallelism Jun 5, 2026
Copy link
Copy Markdown
Contributor

@so0k so0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are going to get blocked/rate limited soon - this is quite urgent and it's conservative so I will merge it

@so0k so0k merged commit 54dcd48 into open-constructs:main Jun 5, 2026
508 of 512 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants