⚙️ Disable spot instances on GPU runners (use on-demand) by mmcky · Pull Request #327 · QuantEcon/lecture-jax

mmcky · 2026-06-26T06:34:30Z

What

Forces on-demand GPU capacity instead of spot by adding spot=false to the RunsOn runner spec in all four GPU workflows: cache.yml, ci.yml, publish.yml, and collab.yml.

Why

AWS spot reclamation is increasingly interrupting our long GPU notebook builds mid-run. In the most recent weekly cache build (run 27929889850) the g4dn.2xlarge spot instance received a shutdown signal at 91% of the build (reading sources... [91%]), producing The runner has received a shutdown signal followed by The operation was canceled. The downstream AssertionError: self.km is not None in nbclient cleanup is a symptom of the kernel being killed mid-execution, not a real notebook failure.

These builds run 17–22 minutes on a single GPU, so a reclamation near the end wastes the entire run (and any spot savings along with it). On-demand trades a higher hourly rate for build reliability, which is the right call for GPU jobs that can't cheaply checkpoint and resume.

Change

Workflow	Trigger	Runner spec
`cache.yml`	weekly schedule	`…/volume=80gb/spot=false`
`ci.yml`	pull_request	`…/volume=80gb/spot=false`
`publish.yml`	`publish*` tag	`…/volume=80gb/spot=false`
`collab.yml`	pull_request	`…/volume=80gb/spot=false`

Notes

Tracking the broader pattern across GPU-using lecture repos in a QuantEcon/meta issue.
If we'd rather keep spot for the cheaper/shorter PR jobs (ci.yml, collab.yml) and only force on-demand for the long cache.yml/publish.yml builds, that's an easy narrowing — happy to adjust.

🤖 Generated with Claude Code

AWS spot reclamation is increasingly interrupting long GPU notebook builds mid-run (e.g. the weekly cache build was killed at 91% in https://github.com/QuantEcon/lecture-jax/actions/runs/27929889850 when the spot g4dn.2xlarge received a shutdown signal). Force on-demand capacity by adding spot=false to all four GPU runner specs. See QuantEcon/meta for the tracking discussion on GPU spot reclamation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

netlify · 2026-06-26T06:34:34Z

✅ Deploy Preview for incomparable-parfait-2417f8 ready!

Name	Link
🔨 Latest commit	`fba7b7e`
🔍 Latest deploy log	https://app.netlify.com/projects/incomparable-parfait-2417f8/deploys/6a3e1d78baa0a600086ec402
😎 Deploy Preview	https://deploy-preview-327--incomparable-parfait-2417f8.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot

Pull request overview

This PR updates the repository’s GPU GitHub Actions workflows to request on-demand GPU runner capacity (instead of spot), improving reliability for long-running notebook builds that are susceptible to spot reclamation.

Changes:

Adds spot=false to the RunsOn GPU runner spec in all GPU workflows.
Applies the same runner-spec adjustment consistently across cache, ci, publish, and collab workflows.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
.github/workflows/cache.yml	Forces on-demand GPU capacity for scheduled/manual cache builds by adding `spot=false` to the runner spec.
.github/workflows/ci.yml	Forces on-demand GPU capacity for PR preview builds by adding `spot=false` to the runner spec.
.github/workflows/publish.yml	Forces on-demand GPU capacity for tag-based publish builds by adding `spot=false` to the runner spec.
.github/workflows/collab.yml	Forces on-demand GPU capacity for PR Colab execution checks by adding `spot=false` to the runner spec.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-06-26T06:43:50Z

🚀 Deployed on https://6a3e206e817d0d9b5971b7f5--incomparable-parfait-2417f8.netlify.app

* [linkcheck] Clear residual false positives in weekly lychee report The weekly link checker (#933) flags 8 errors out of ~25k links, all false positives or harmless artifacts on non-content pages: - IEEE Xplore returns "202 Accepted" (anti-bot) for a valid DOI cited in zreferences.html -> add 202 to --accept. - genindex / search / prf-prf are auto-generated utility pages with no source notebook, so the theme's "Download Notebook" button points at a nonexistent _notebooks/<page>.ipynb and renders a second href="None" -> --exclude-path those three pages. - A Journal of Derivatives DOI redirects into a login/paywall loop that exceeds max-redirects; the citation itself is valid -> --exclude it. Configuration is kept inline in the workflow args (rather than a lychee.toml) because lychee runs against the gh-pages checkout, which does not contain repo-root config files. Closes #933 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * [linkcheck] Escape and anchor --exclude-path regexes lychee treats --exclude-path values as regular expressions, so the unescaped dots in genindex.html / search.html / prf-prf.html were regex wildcards and the patterns were unanchored. Escape the dot and anchor the end ('<name>\.html$') so each matches only the intended generated page. Addresses Copilot review feedback on #934. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Force on-demand GPU runners (spot=false) AWS spot reclamation has been interrupting the g4dn.2xlarge GPU notebook builds mid-run, discarding the whole build. Add spot=false to the RunsOn runner spec in all four GPU workflows (cache, ci, collab, publish) so they run on on-demand instances. Rolls out the org-wide decision in QuantEcon/meta#330; mirrors QuantEcon/lecture-jax#327. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 26, 2026 06:34

Copilot started reviewing on behalf of mmcky June 26, 2026 06:34 View session

mmcky mentioned this pull request Jun 26, 2026

GPU spot instance reclamation increasingly interrupting lecture builds QuantEcon/meta#330

Open

3 tasks

Copilot AI reviewed Jun 26, 2026

View reviewed changes

github-actions Bot temporarily deployed to pull request June 26, 2026 06:43 Inactive

github-actions Bot temporarily deployed to pull request June 26, 2026 06:47 Inactive

mmcky merged commit babfc41 into main Jun 26, 2026
8 checks passed

mmcky deleted the infra/disable-spot-gpu-runners branch June 26, 2026 06:50

mmcky mentioned this pull request Jun 26, 2026

Force on-demand GPU runners (spot=false) QuantEcon/lecture-python.myst#936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

⚙️ Disable spot instances on GPU runners (use on-demand)#327

⚙️ Disable spot instances on GPU runners (use on-demand)#327
mmcky merged 1 commit into
mainfrom
infra/disable-spot-gpu-runners

mmcky commented Jun 26, 2026

Uh oh!

netlify Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

mmcky commented Jun 26, 2026

What

Why

Change

Notes

Uh oh!

netlify Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for incomparable-parfait-2417f8 ready!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading