Skip to content

Adds CI Improvements - Stage 2#4877

Merged
myurasov-nv merged 31 commits intoisaac-sim:developfrom
myurasov-nv:my-ci-impr-2
Mar 9, 2026
Merged

Adds CI Improvements - Stage 2#4877
myurasov-nv merged 31 commits intoisaac-sim:developfrom
myurasov-nv:my-ci-impr-2

Conversation

@myurasov-nv
Copy link
Collaborator

@myurasov-nv myurasov-nv commented Mar 7, 2026

Description

Some mo CI stuff.

  • ECR caching for images and layers. We not only stop building docker 4 times, but layer cache makes necessary builds faaaster!
    • Implemented as drop-in replacement for build-docker action.
    • CI task graph is optimized for max cache usage.
    • ECR images are auto-deleted in a day to save costs.
  • Removes redundant test results copying step from build.yaml (done in the run-tests action).
  • Optimizes COPY commands in dockerfiles to minimize layer cache invalidation
  • Synced Dockerfile.base and .curobo so they use layer cache maximally

Type of change

  • New feature-ish (non-breaking change which adds functionality)
  • Documentation update (.github/actions/ecr-build-push-pull/README.md)

Screenshots

image

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • [NA] I have added tests that prove my fix is effective or that my feature works
  • [NA] I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Update ECR creds

RUn on pull requests

Update runner type for ECR build job to use GPU

Use aws cli to log in

Get AWS_ACCOUNT_ID

Remove ECR login step from debug workflow

Login to ECR with docker config fix

Check if nvcr login is broken

ECR stuf

Toy task inside container

Add missing package installation step in Dockerfile

Add progress output to Docker build step in debug workflow

If image exists in ecr and if it does, pull it from there instead of building

Add cache layering

Start toy dockerfile from nvcr.io/nvidian/isaac-sim:latest-develop

Test

Skip build if image exists, improve logging messages

Replace entrypoint for toy task

ECR functionality

RUn nvidia-smi

Extract ECR functionality to action

Comments
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 8, 2026
@myurasov-nv myurasov-nv changed the title [WIP] CI Improvements 2 Adds CI Improvements - Stage 2 Mar 8, 2026
@myurasov-nv myurasov-nv marked this pull request as ready for review March 8, 2026 10:37
@myurasov-nv myurasov-nv requested review from nv-apoddubny and removed request for hhansen-bdai and pascal-roth March 8, 2026 10:38
@myurasov-nv myurasov-nv marked this pull request as draft March 8, 2026 10:43
@myurasov-nv myurasov-nv marked this pull request as ready for review March 8, 2026 11:12
@isaac-sim isaac-sim deleted a comment from greptile-apps bot Mar 8, 2026
@myurasov-nv myurasov-nv marked this pull request as draft March 8, 2026 11:29
@myurasov-nv myurasov-nv marked this pull request as ready for review March 8, 2026 11:29
@isaac-sim isaac-sim deleted a comment from greptile-apps bot Mar 8, 2026
@myurasov-nv myurasov-nv marked this pull request as draft March 8, 2026 11:44
@myurasov-nv myurasov-nv marked this pull request as ready for review March 8, 2026 11:45
@myurasov-nv myurasov-nv marked this pull request as draft March 8, 2026 11:57
@isaac-sim isaac-sim deleted a comment from greptile-apps bot Mar 8, 2026
@myurasov-nv myurasov-nv marked this pull request as ready for review March 8, 2026 11:57
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR introduces a new ecr-build-push-pull composite action to replace the previous GHA-cache-backed docker-build action with ECR-backed layer caching and image sharing. The workflow is restructured into explicit build / build-curobo jobs that push images to ECR, allowing downstream test jobs to simply pull the pre-built image rather than rebuild it — eliminating four redundant full builds per CI run.

Key changes:

  • New .github/actions/ecr-build-push-pull/action.yml using docker buildx with --cache-from/to type=registry, per-Dockerfile cache-tag inputs (cache-base, cache-curobo), docker-container buildx driver with unique builder names per run/job, and nvcr.io re-authentication.
  • build.yaml split into build + build-curobo jobs with needs: dependencies ensuring test jobs always get a pre-built image from ECR.
  • Redundant "Copy Test Results from … Container" steps removed (already handled by run-tests action).
  • Both Dockerfiles updated to split COPY instructions into early (install-relevant) and late (docs/scripts) groups, maximizing layer-cache hit rates.

Confidence Score: 4/5

  • Safe to merge. Core ECR caching logic is solid; all major architectural concerns have been addressed.
  • The PR successfully implements ECR-backed layer caching with correct docker buildx usage, per-Dockerfile cache tags, platform flags, builder management, NGC authentication, and proper error handling. The workflow restructuring eliminates redundant builds and the Dockerfile optimizations improve cache efficiency. No active issues found.
  • No files require special attention

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([PR Trigger]) --> B[build\nDockerfile.base → ECR]
    A --> C[build-curobo\nDockerfile.curobo → ECR]

    B --> D[test-isaaclab-tasks\nPull from ECR → run tests]
    B --> E[test-isaaclab-tasks-2\nPull from ECR → run tests]
    B --> F[test-general\nPull from ECR → run tests]
    C --> G[test-curobo\nPull from ECR → run tests]

    D --> H[combine-results\nMerge JUnit XMLs\nPost PR comment]
    E --> H
    F --> H
    G --> H

    subgraph ECR["ECR isaaclab-ci"]
        I[":cache-base\nlayer cache"]
        J[":cache-curobo\nlayer cache"]
        K[":isaac-lab-dev--pr-N--sha\nbuilt image"]
        L[":isaac-lab-dev--pr-N--sha--curobo\nbuilt image"]
    end

    B <-->|cache-from/to| I
    C <-->|cache-from/to| J
    B -->|push image| K
    C -->|push image| L
    D -->|pull image| K
    E -->|pull image| K
    F -->|pull image| K
    G -->|pull image| L
Loading

Last reviewed commit: e09aff9

@ooctipus
Copy link
Collaborator

ooctipus commented Mar 8, 2026

Looks amazing!!

@myurasov-nv myurasov-nv merged commit 6daebe7 into isaac-sim:develop Mar 9, 2026
11 of 13 checks passed
myurasov-nv added a commit to myurasov-nv/IsaacLab that referenced this pull request Mar 9, 2026
# Description

Some mo CI stuff.

- ECR caching for images and layers. We not only stop building docker 4
times, but layer cache makes necessary builds faaaster!
   - Implemented as drop-in replacement for `build-docker` action.
   - CI task graph is optimized for max cache usage.
   - ECR images are auto-deleted in a day to save costs.
- Removes redundant test results copying step from build.yaml (done in
the run-tests action).
- Optimizes COPY commands in dockerfiles to minimize layer cache
invalidation
- Synced Dockerfile.base and .curobo so they use layer cache maximally

## Type of change

- New feature-ish (non-breaking change which adds functionality)
- Documentation update (`.github/actions/ecr-build-push-pull/README.md`)

## Screenshots

<img width="1191" height="371" alt="image"
src="https://github.com/user-attachments/assets/22fef48e-b38e-4f06-b2de-faff0aa89e15"
/>

## Checklist

- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [NA] I have added tests that prove my fix is effective or that my
feature works
- [NA] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
matthewtrepte pushed a commit to matthewtrepte/IsaacLab that referenced this pull request Mar 9, 2026
# Description

Some mo CI stuff.

- ECR caching for images and layers. We not only stop building docker 4
times, but layer cache makes necessary builds faaaster!
   - Implemented as drop-in replacement for `build-docker` action.
   - CI task graph is optimized for max cache usage.
   - ECR images are auto-deleted in a day to save costs.
- Removes redundant test results copying step from build.yaml (done in
the run-tests action).
- Optimizes COPY commands in dockerfiles to minimize layer cache
invalidation
- Synced Dockerfile.base and .curobo so they use layer cache maximally

## Type of change

- New feature-ish (non-breaking change which adds functionality)
- Documentation update (`.github/actions/ecr-build-push-pull/README.md`)

## Screenshots

<img width="1191" height="371" alt="image"
src="https://github.com/user-attachments/assets/22fef48e-b38e-4f06-b2de-faff0aa89e15"
/>

## Checklist

- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [NA] I have added tests that prove my fix is effective or that my
feature works
- [NA] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants