Skip to content

Kheiss/041426updates#1845

Open
kheiss-uwzoo wants to merge 104 commits intomainfrom
kheiss/041426updates
Open

Kheiss/041426updates#1845
kheiss-uwzoo wants to merge 104 commits intomainfrom
kheiss/041426updates

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

kheiss-uwzoo and others added 30 commits February 19, 2026 10:36
Update all hardcoded version references from 26.1.2 to 26.3.0-RC1
across helm charts, docker-compose, FastAPI, docs, and examples.

Made-with: Cursor
Co-authored-by: Kurt Heiss <kheiss@nvidia.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
…ing long VLM captioning

Large PDFs with VLM captioning enabled can take 2-22+ hours depending on hardware.
The previous defaults (STATE_TTL=7200s, RESULT_DATA_TTL=3600s) caused job state to
expire mid-processing, resulting in 404 "Job ID not found or state has expired" errors
even though the pipeline completed successfully.

Raises both defaults to 172800s (48 hours), providing sufficient headroom for all
observed workloads. Users can still override via RESULT_DATA_TTL_SECONDS and
STATE_TTL_SECONDS environment variables.

Fixes: Customer bug 5914605

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kheiss-uwzoo kheiss-uwzoo marked this pull request as ready for review April 14, 2026 16:48
@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners April 14, 2026 16:48
@kheiss-uwzoo kheiss-uwzoo requested a review from nkmcalli April 14, 2026 16:48
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR improves the docs CI workflows by removing the hardcoded ref: main checkout (enabling workflow_dispatch to build from any branch) and by properly capturing and propagating the docker wait exit code so that documentation build failures are surfaced in CI. It also adds COPY nemo_retriever nemo_retriever to the docs Dockerfile stage so the primary library source is available during documentation generation.

  • The one remaining gap is that the container cleanup step (docker rm) in both workflow files still lacks if: always(), meaning a build failure leaves an orphaned container on the self-hosted linux-large-disk runner — a concern noted in a previous review thread.

Confidence Score: 4/5

Safe to merge after addressing the orphaned-container cleanup gap; the action-pinning finding is a P2 style concern.

No P0 issues. One pre-existing P1 concern (container not cleaned up on failure) was flagged in a prior review thread but is not yet fixed. The only new finding in this review is a P2 violation of the github-actions-security rule (mutable action tags). Score of 4 reflects the unresolved orphaned-container issue rather than the P2 pinning note.

Both .github/workflows/build-docs.yml (line 64) and .github/workflows/docs-deploy.yml (line 62) need if: always() on their docker rm cleanup steps.

Important Files Changed

Filename Overview
.github/workflows/build-docs.yml Removes hardcoded ref: main checkout and improves docker wait to capture and propagate exit codes with container log output; action tags remain unpinned to SHAs.
.github/workflows/docs-deploy.yml Mirrors build-docs.yml changes — removes ref: main, adds exit-code propagation; same unpinned action tags issue applies.
Dockerfile Adds COPY nemo_retriever nemo_retriever to the docs build stage so documentation generation has access to the primary library source.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Trigger: push to main or workflow_dispatch] --> B[Checkout triggered ref]
    B --> C[Setup Pages and Docker Buildx]
    C --> D[Build docs Docker image with --target docs]
    D --> E[Run container in background]
    E --> F[docker wait captures EXIT_CODE]
    F --> G{EXIT_CODE == 0?}
    G -- Yes --> H[Emit docker logs]
    H --> I[Copy docs/site out of container]
    I --> J[docker rm container]
    J --> K[Upload pages artifact]
    K --> L[Deploy to GitHub Pages]
    G -- No --> M[Emit docker logs]
    M --> N[exit EXIT_CODE - CI step fails]
    N --> O[Cleanup step skipped - container orphaned on runner]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/workflows/build-docs.yml
Line: 30-37

Comment:
**Third-party actions not pinned to full SHAs**

The `github-actions-security` rule requires all third-party actions to be pinned to a commit SHA, not a mutable tag. Tags like `@v4`, `@v5`, and `@v3` can be silently re-pointed to different (potentially malicious) commits. The same applies to `docs-deploy.yml`.

```yaml
      - name: Checkout code
        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4
      - name: Setup Pages
        uses: actions/configure-pages@983d7736d9b0ae728b81ab479565c72886d7745  # v5
      # ...
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@b5730e3e2d58d65ee93aa9f50ca54a5c84e9f7d9  # v3
      # ...
      - name: Upload Site Artifacts
        uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa  # v3
      # ...
      - name: Deploy to GitHub Pages
        uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e  # v4
```

**Rule Used:** GitHub Actions workflows must: pin third-party act... ([source](https://app.greptile.com/review/custom-context?memory=github-actions-security))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "ci(docs): checkout triggering ref for ma..." | Re-trigger Greptile

echo "--- Container logs (make docs output) ---"
docker logs "$CONTAINER_ID"
echo "--- End container logs ---"
exit "${EXIT_CODE}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Container not cleaned up on build failure

When EXIT_CODE is non-zero, this step fails and GitHub Actions skips the subsequent docker rm $CONTAINER_ID step (line 64), leaving an orphaned container on the self-hosted linux-large-disk runner. Before this change, docker wait always exited 0, so cleanup always ran. The same issue exists in docs-deploy.yml line 56. Add if: always() to the cleanup step to ensure it runs even on failure:

      - name: Stop and remove the container
        if: always()
        run: docker rm $CONTAINER_ID
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/build-docs.yml
Line: 58

Comment:
**Container not cleaned up on build failure**

When `EXIT_CODE` is non-zero, this step fails and GitHub Actions skips the subsequent `docker rm $CONTAINER_ID` step (line 64), leaving an orphaned container on the self-hosted `linux-large-disk` runner. Before this change, `docker wait` always exited 0, so cleanup always ran. The same issue exists in `docs-deploy.yml` line 56. Add `if: always()` to the cleanup step to ensure it runs even on failure:

```yaml
      - name: Stop and remove the container
        if: always()
        run: docker rm $CONTAINER_ID
```

How can I resolve this? If you propose a fix, please make it concise.

…iever/, but make docs runs sphinx-apidoc on that path. Fix is to COPY nemo_retriever into the docs stage; I’ve added that on my branch for review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants