diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index fd6ad5e3..4e9d9cf5 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,81 +1,293 @@ # Contributing to NVIDIA Cosmos -Thank you for your interest in contributing to NVIDIA Cosmos. This document provides guidelines and instructions for contributing. +Thank you for your interest in contributing to NVIDIA Cosmos. This guide covers how to propose changes, add new cookbooks, and maintain the quality bar we hold for community-facing content. ## Code of Conduct This project adheres to the [NVIDIA Open Source Code of Conduct](https://github.com/NVIDIA/cosmos/blob/main/CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code. Please report unacceptable behavior by filing an issue or contacting [cosmos-license@nvidia.com](mailto:cosmos-license@nvidia.com). +--- + ## How to Contribute ### Reporting Issues -If you encounter a bug or have a feature request, please open an issue on the [GitHub Issues](https://github.com/NVIDIA/cosmos/issues) page. When filing an issue, include: +Open an issue on [GitHub Issues](https://github.com/NVIDIA/cosmos/issues) with: -- A clear and descriptive title -- Steps to reproduce the problem (if applicable) -- Expected behavior vs. actual behavior -- Your environment details (OS, CUDA version, GPU model, Python version) +- A clear, descriptive title +- Steps to reproduce (if applicable) +- Expected vs. actual behavior +- Environment details: OS, CUDA version, GPU model, Python version, `uv` version - Relevant logs or error messages -### Submitting Changes +### Contribution Workflow -1. **Fork the repository** and create a new branch from `main`: +1. **Fork** the repository and create a branch from `main`: - ```shell - git checkout -b your-branch-name + ```bash + git checkout -b cookbook/descriptive-name # or docs/, fix/, benchmark/ ``` -2. **Make your changes.** Ensure your changes follow the project conventions and do not introduce regressions. +2. **Make your changes** following the guidelines below. -3. **Test your changes.** Verify that existing cookbooks and examples still work correctly with your modifications. +3. **Test your changes.** Run your notebook end-to-end on the target GPU. Verify existing cookbooks are unaffected. -4. **Commit your changes** with a clear, descriptive commit message: +4. **Commit** with a clear message: - ```shell - git commit -m "Brief description of the change" + ```bash + git commit -m "Add worker-safety Reasoner cookbook with vLLM backend" ``` -5. **Push to your fork** and open a Pull Request against the `main` branch of the upstream repository. +5. **Push and open a Pull Request** against `main`. ### Pull Request Guidelines - Provide a clear description of what your PR does and why -- Reference any related issues (e.g., `Fixes #123`) -- Keep PRs focused: one logical change per PR -- Ensure your branch is up to date with `main` before submitting -- Be responsive to review feedback +- Reference related issues (e.g., `Fixes #123`) +- One logical change per PR +- Ensure your branch is up to date with `main` +- Respond to review feedback promptly + +--- + +## Cookbook Structure + +The `cookbooks/` directory is organized by **model generation → tower → capability**. Each cookbook is a self-contained directory with a README, one or more runnable notebooks, and supporting assets. + +``` +cookbooks/ +└── cosmos3/ + ├── README.md # Shared setup (all backends) + ├── cosmos3-model-architecture.png + │ + ├── reasoner/ # Reasoner Tower + │ ├── README.md # Reasoner overview + backend table + │ ├── basic_examples/ # Shipped starter cookbooks + │ │ ├── reasoner_prompt_guide.md + │ │ ├── run_with_vllm.ipynb + │ │ ├── run_with_nim.ipynb + │ │ ├── run_with_cosmos_framework.ipynb + │ │ └── assets/ + │ └── / # ← Community contributions go here + │ ├── README.md + │ ├── run__with_.ipynb + │ └── assets/ + │ + └── generator/ + ├── audiovisual/ # Generator: T2I, T2V, I2V, audio + │ ├── README.md + │ ├── basic_examples/ # Shipped starter cookbooks + │ │ ├── run_with_diffusers.ipynb + │ │ ├── run_with_vllm_omni.ipynb + │ │ ├── run_with_cosmos_framework.ipynb + │ │ └── assets/ + │ └── / # ← Community contributions go here + │ + ├── action/ # Generator: policy, FDM, IDM + │ ├── README.md + │ ├── basic_examples/ # Shipped starter cookbooks + │ │ ├── run_fd_with_cosmos_framework.ipynb + │ │ ├── run_fd_with_vllm.ipynb + │ │ ├── run_id_with_cosmos_framework.ipynb + │ │ ├── run_id_with_vllm.ipynb + │ │ ├── run_policy_with_cosmos_framework.md + │ │ └── assets/ + │ └── / # ← Community contributions go here + │ + └── transfer/ # Generator: video-to-video transfer + ├── README.md + ├── basic_examples/ # Shipped starter cookbooks + │ ├── run_video_transfer_with_cosmos_framework.ipynb + │ ├── preview_helpers.py + │ ├── specs/ + │ └── assets/ + └── / # ← Community contributions go here +``` + +### Where Does My Cookbook Go? + +| Your cookbook does... | Place it under | +|----------------------|---------------| +| Image/video understanding, VLM, reasoning, grounding | `cookbooks/cosmos3/reasoner/` | +| Text-to-image, text-to-video, image-to-video, audio | `cookbooks/cosmos3/generator/audiovisual/` | +| Robotics policy, forward/inverse dynamics | `cookbooks/cosmos3/generator/action/` | +| Video-to-video style transfer, edge-guided generation | `cookbooks/cosmos3/generator/transfer/` | + +If your cookbook spans multiple towers (e.g., Reasoner analysis → Generator synthesis), create a new directory under `cookbooks/cosmos3/` with a clear name (e.g., `cookbooks/cosmos3/end2end/`). + +--- + +## Cookbook Quality Requirements + +Every cookbook merged into this repo must meet these requirements. Reviewers will check each item. + +### 1. Open-Access Data Only + +- All datasets must be **publicly downloadable** without NVIDIA-internal credentials +- Acceptable sources: HuggingFace Hub (public or gated with free access), public URLs, synthetic data generated in the notebook +- If working with partners, request a **small public subset** for the cookbook example +- Include the dataset license in your README + +**Not acceptable:** Internal S3 buckets, VPN-only URLs, private NFS mounts, datasets requiring paid partner agreements + +### 2. Results / Expected Output + +Every cookbook must include a **Results** section showing what a successful run looks like: + +- **Inference cookbooks:** Sample generated images/videos, text outputs, or action trajectories saved to `assets/` +- **Post-training cookbooks:** Training loss curves, before/after comparison, evaluation metrics +- **Timing benchmarks:** Wall-clock time on the target GPU (e.g., "Cosmos3-Nano T2V: 45s on 1× A100") + +This lets developers validate their own runs against a known-good baseline. + +### 3. Canonical Setup (No Hidden Dependencies) + +- **Do not duplicate setup instructions.** Link to the shared [`cookbooks/cosmos3/README.md`](cookbooks/cosmos3/README.md) for backend installation (Cosmos Framework, Diffusers, vLLM, NIM) +- Your README should only document **cookbook-specific** dependencies beyond the shared setup +- All dependencies must be installable via `uv pip install` or `apt-get` — no manual builds +- Pin specific versions of critical packages when they affect reproducibility + +### 4. One-Click Runnable + +- Each notebook should run **top-to-bottom without manual intervention** +- Use environment variables for configurable paths (`HF_TOKEN`, `COSMOS3_MEDIA_ROOT`, etc.) +- Default to the smallest model size (Cosmos3-Nano) so the widest set of GPUs can run it +- If a cookbook requires a running server (vLLM, NIM), provide the exact launch command in the README and automate the health check in the notebook + +### 5. Naming Convention + +Follow the existing pattern: + +``` +run__with_.ipynb +``` + +Examples: +- `run_with_vllm.ipynb` — generic Reasoner inference via vLLM +- `run_fd_with_cosmos_framework.ipynb` — forward dynamics via Cosmos Framework +- `run_video_transfer_with_cosmos_framework.ipynb` — video transfer via Cosmos Framework + +For markdown-only guides (no notebook): `run__with_.md` + +### 6. Author Attribution + +Every cookbook must credit its authors to increase visibility and recognition: + +- **README:** Include an author block immediately after the title (see [README template](#cookbook-readme-template)) +- **Notebook:** Include an author block in the first markdown cell, right after the SPDX header and title + +Use this format: + +```markdown +> **Authors:** [Full Name](https://linkedin.com/in/handle), [Full Name](https://linkedin.com/in/handle) +> **Organization:** [Your Organization](https://your-org.com/) +``` + +This is required for all new contributions and encouraged for existing cookbooks. + +--- + +## Cookbook README Template + +Each cookbook directory needs a `README.md`. Use this structure: + +```markdown +# [Cookbook Title] + +> **Authors:** [Your Name](https://linkedin.com/in/your-handle) +> **Organization:** [Your Organization](https://your-org.com/) + +One-paragraph description of what this cookbook demonstrates and why it matters. + +## What You'll Build + +- Bullet list of concrete outputs (e.g., "Generate a 480p video from a text prompt") + +## Prerequisites + +- Link to [shared setup](../README.md#backend-name) for backend installation +- Any additional cookbook-specific requirements + +## Backends + +| Backend | Notebook | GPU Requirement | +|---------|----------|----------------| +| vLLM | [`run_with_vllm.ipynb`](run_with_vllm.ipynb) | 1× A100 (80 GB) | +| NIM | [`run_with_nim.ipynb`](run_with_nim.ipynb) | 1× A100 (80 GB) | + +## Quick Start + +Minimal steps to go from clone to first result: + + 1. Set up the backend (link) + 2. Run the notebook + 3. Check your outputs in `assets/` + +## Results / Expected Output + +Sample outputs, metrics, and timing benchmarks from a successful run. + +## Dataset + +| Name | Source | License | Size | +|------|--------|---------|------| +| Dataset Name | [HuggingFace link](...) | Apache 2.0 | ~2 GB | +``` + +--- + +## Contribution Areas + +We welcome contributions in these areas: + +| Area | Examples | +|------|---------| +| **New cookbooks** | Domain-specific applications (robotics, AV, healthcare, manufacturing) | +| **New backends** | Additional serving/inference backends for existing cookbooks | +| **Documentation** | README improvements, prompt guides, architecture explanations | +| **Bug fixes** | Notebook fixes, broken links, version compatibility issues | +| **Benchmarks** | Inference timing across GPU configurations (A100, H100, L40S, RTX 4090) | +| **Post-training recipes** | SFT, LoRA, domain adaptation examples with open datasets | + +### What We Won't Merge + +- Cookbooks that depend on internal/proprietary datasets +- Notebooks that require manual mid-run intervention +- Changes that break existing cookbook functionality +- Generated binary files (model weights, large media) — use HuggingFace/external links instead + +--- ## Development Setup ### Prerequisites - Python 3.10 or later -- CUDA 12.8 or 13.x (see [Troubleshooting](README.md#troubleshooting) for version matching) +- CUDA 12.8 or 13.x (see [Troubleshooting](README.md#troubleshooting)) - An NVIDIA GPU with sufficient VRAM for your target workflow -- `uv` >= 0.11.3 (install from [astral.sh/uv](https://astral.sh/uv)) +- `uv` >= 0.11.3 ([astral.sh/uv](https://astral.sh/uv)) +- `git-lfs` installed (`apt-get install git-lfs`) ### Getting Started -1. Clone the repository: +```bash +git clone https://github.com/NVIDIA/cosmos.git +cd cosmos +``` - ```shell - git clone https://github.com/NVIDIA/cosmos.git - cd cosmos - ``` - -2. Set up your environment following the instructions in the [README](README.md). +Follow [cookbooks/cosmos3/README.md](cookbooks/cosmos3/README.md) to set up the backend(s) your cookbook uses. -3. Explore the [cookbooks](cookbooks/) for end-to-end examples of Generator and Reasoner workflows. +### Testing Your Cookbook -## Contribution Areas +Before submitting: -We welcome contributions in the following areas: +1. **Clean run:** Restart your kernel and run all cells top-to-bottom +2. **Minimal GPU:** Test on the smallest supported GPU configuration +3. **No secrets:** Verify no API keys, tokens, or internal paths are committed +4. **Output cells:** Clear large output cells but keep the Results section outputs +5. **File sizes:** Ensure no single file exceeds 10 MB (use git-lfs for larger assets or link externally) -- **Cookbooks and examples:** New notebooks demonstrating Cosmos 3 capabilities -- **Documentation:** Improvements to README, cookbook READMEs, or inline documentation -- **Bug fixes:** Fixes for issues in existing code or documentation -- **Benchmarks:** Additional inference benchmark results across different hardware configurations +--- ## License @@ -83,4 +295,4 @@ By contributing to this project, you agree that your contributions will be licen ## Questions? -If you have questions about contributing, feel free to open an issue or reach out at [cosmos-license@nvidia.com](mailto:cosmos-license@nvidia.com). +If you have questions about contributing, open an issue or reach out at [cosmos-license@nvidia.com](mailto:cosmos-license@nvidia.com). diff --git a/README.md b/README.md index d54eaa07..a788f1a0 100644 --- a/README.md +++ b/README.md @@ -522,7 +522,7 @@ docker run -it --rm --name=$CONTAINER_NAME \ The OpenAI-compatible API is then available at `http://127.0.0.1:8000/v1`. Query it with `curl`: ```shell -IMAGE_DATA_URI="data:image/jpeg;base64,$(base64 -w 0 cookbooks/cosmos3/reasoner/assets/robot_153.jpg)" +IMAGE_DATA_URI="data:image/jpeg;base64,$(base64 -w 0 cookbooks/cosmos3/reasoner/basic_examples/assets/robot_153.jpg)" curl -X POST 'http://127.0.0.1:8000/v1/chat/completions' \ -H 'Accept: application/json' \ @@ -631,17 +631,17 @@ We are building examples that show Cosmos 3 capabilities end to end, including w | Example | Surface | Workflows demonstrated | Open | nbviewer | | --- | --- | --- | --- | --- | -| Generator (audiovisual) with Diffusers | Generator | Text-to-image, plus text-to-video and image-to-video each with or without synchronized sound, via `Cosmos3OmniPipeline`. | [Notebook](cookbooks/cosmos3/generator/audiovisual/run_with_diffusers.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/audiovisual/run_with_diffusers.ipynb) | -| Generator (audiovisual) with Cosmos Framework | Generator | Text-to-image, plus text-to-video and image-to-video each with sound on or off, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb) | -| Generator (audiovisual) with vLLM-Omni | Generator | Text-to-image, plus text-to-video and image-to-video each with sound on or off, against an OpenAI-compatible vLLM-Omni server. | [Notebook](cookbooks/cosmos3/generator/audiovisual/run_with_vllm_omni.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/audiovisual/run_with_vllm_omni.ipynb) | -| Forward dynamics with Cosmos Framework | Generator | Forward dynamics: action-conditioned future-observation prediction for AV, DROID, and UMI, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/action/run_fd_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/run_fd_with_cosmos_framework.ipynb) | -| Forward dynamics with vLLM-Omni | Generator | Forward dynamics: action-conditioned future-observation prediction for AV, DROID, and UMI, against an OpenAI-compatible vLLM-Omni server. | [Notebook](cookbooks/cosmos3/generator/action/run_fd_with_vllm.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/run_fd_with_vllm.ipynb) | -| Inverse dynamics with Cosmos Framework | Generator | Inverse dynamics: ego-motion trajectory prediction from input AV video, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/action/run_id_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/run_id_with_cosmos_framework.ipynb) | -| Inverse dynamics with vLLM-Omni | Generator | Inverse dynamics: ego-motion trajectory prediction from input AV video, against an OpenAI-compatible vLLM-Omni server. | [Notebook](cookbooks/cosmos3/generator/action/run_id_with_vllm.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/run_id_with_vllm.ipynb) | -| Transfer with Cosmos Framework | Generator | Video transfer: edge, blur, depth, segmentation, and world-scenario controls with captions, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb) | -| Reasoner with Cosmos Framework | Reasoner | Text and image reasoning: detailed captioning, robot task planning, 2D grounding, describe-anything, and action-trajectory prompts, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb) | -| Reasoner with vLLM | Reasoner | Image and video reasoning: captioning, temporal localization, embodied reasoning, common-sense reasoning, 2D grounding, describe-anything, action CoT, driving scenes, physical-plausibility, and situation understanding, against an OpenAI-compatible vLLM server (Cosmos3-Super on 4 GPUs by default; switch to Nano per the cookbook README). | [Notebook](cookbooks/cosmos3/reasoner/run_with_vllm.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/reasoner/run_with_vllm.ipynb) | -| Reasoner with NIM | Reasoner | The same image and video reasoning examples as the vLLM notebook, run against the prebuilt, OpenAI-compatible [Cosmos 3 Reasoner NIM](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/cosmos3-reasoner) container; local media is sent as base64 data URIs. | [Notebook](cookbooks/cosmos3/reasoner/run_with_nim.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/reasoner/run_with_nim.ipynb) | +| Generator (audiovisual) with Diffusers | Generator | Text-to-image, plus text-to-video and image-to-video each with or without synchronized sound, via `Cosmos3OmniPipeline`. | [Notebook](cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_diffusers.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_diffusers.ipynb) | +| Generator (audiovisual) with Cosmos Framework | Generator | Text-to-image, plus text-to-video and image-to-video each with sound on or off, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_cosmos_framework.ipynb) | +| Generator (audiovisual) with vLLM-Omni | Generator | Text-to-image, plus text-to-video and image-to-video each with sound on or off, against an OpenAI-compatible vLLM-Omni server. | [Notebook](cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_vllm_omni.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_vllm_omni.ipynb) | +| Forward dynamics with Cosmos Framework | Generator | Forward dynamics: action-conditioned future-observation prediction for AV, DROID, and UMI, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_cosmos_framework.ipynb) | +| Forward dynamics with vLLM-Omni | Generator | Forward dynamics: action-conditioned future-observation prediction for AV, DROID, and UMI, against an OpenAI-compatible vLLM-Omni server. | [Notebook](cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_vllm.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_vllm.ipynb) | +| Inverse dynamics with Cosmos Framework | Generator | Inverse dynamics: ego-motion trajectory prediction from input AV video, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/action/basic_examples/run_id_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/basic_examples/run_id_with_cosmos_framework.ipynb) | +| Inverse dynamics with vLLM-Omni | Generator | Inverse dynamics: ego-motion trajectory prediction from input AV video, against an OpenAI-compatible vLLM-Omni server. | [Notebook](cookbooks/cosmos3/generator/action/basic_examples/run_id_with_vllm.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/action/basic_examples/run_id_with_vllm.ipynb) | +| Transfer with Cosmos Framework | Generator | Video transfer: edge, blur, depth, segmentation, and world-scenario controls with captions, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/generator/transfer/basic_examples/run_video_transfer_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/generator/transfer/basic_examples/run_video_transfer_with_cosmos_framework.ipynb) | +| Reasoner with Cosmos Framework | Reasoner | Text and image reasoning: detailed captioning, robot task planning, 2D grounding, describe-anything, and action-trajectory prompts, through the `cosmos_framework.scripts.inference` entrypoint. | [Notebook](cookbooks/cosmos3/reasoner/basic_examples/run_with_cosmos_framework.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/reasoner/basic_examples/run_with_cosmos_framework.ipynb) | +| Reasoner with vLLM | Reasoner | Image and video reasoning: captioning, temporal localization, embodied reasoning, common-sense reasoning, 2D grounding, describe-anything, action CoT, driving scenes, physical-plausibility, and situation understanding, against an OpenAI-compatible vLLM server (Cosmos3-Super on 4 GPUs by default; switch to Nano per the cookbook README). | [Notebook](cookbooks/cosmos3/reasoner/basic_examples/run_with_vllm.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/reasoner/basic_examples/run_with_vllm.ipynb) | +| Reasoner with NIM | Reasoner | The same image and video reasoning examples as the vLLM notebook, run against the prebuilt, OpenAI-compatible [Cosmos 3 Reasoner NIM](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/cosmos3-reasoner) container; local media is sent as base64 data URIs. | [Notebook](cookbooks/cosmos3/reasoner/basic_examples/run_with_nim.ipynb) | [![Render with nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/nvidia/cosmos/blob/main/cookbooks/cosmos3/reasoner/basic_examples/run_with_nim.ipynb) | ### Inference Benchmarks diff --git a/cookbooks/cosmos3/README.md b/cookbooks/cosmos3/README.md index ecf78a06..48f6474c 100644 --- a/cookbooks/cosmos3/README.md +++ b/cookbooks/cosmos3/README.md @@ -203,7 +203,7 @@ export VLLM_USE_DEEP_GEMM=0 All Reasoner cookbooks talk to an OpenAI-compatible chat-completions API. After [installing vLLM](#vllm), run the commands below from `cookbooks/cosmos3/reasoner` (same working directory as -[`run_with_vllm.ipynb`](reasoner/run_with_vllm.ipynb)). That sets +[`run_with_vllm.ipynb`](reasoner/basic_examples/run_with_vllm.ipynb)). That sets `$(dirname "$(pwd)")` to `/cookbooks/cosmos3`, which matches the notebook's `COSMOS3_MEDIA_ROOT`. @@ -221,7 +221,7 @@ vllm serve nvidia/Cosmos3-Nano \ --port 8000 ``` -**Cosmos3-Super** (four GPUs; default in [`run_with_vllm.ipynb`](reasoner/run_with_vllm.ipynb), port 8001): +**Cosmos3-Super** (four GPUs; default in [`run_with_vllm.ipynb`](reasoner/basic_examples/run_with_vllm.ipynb), port 8001): ```bash export COSMOS3_MEDIA_ROOT="$(dirname "$(pwd)")" diff --git a/cookbooks/cosmos3/generator/action/README.md b/cookbooks/cosmos3/generator/action/README.md index 6158764f..e289f7e5 100644 --- a/cookbooks/cosmos3/generator/action/README.md +++ b/cookbooks/cosmos3/generator/action/README.md @@ -1,8 +1,24 @@ -# Cosmos3 Generator Action Examples +# Cosmos3 Generator Action Cookbooks -Cosmos3-Nano action-generation examples across two inference backends — native -PyTorch (Cosmos Framework) and vLLM-Omni. Both backends use the sample assets -under [`assets/`](./assets) and cover two tasks: +Cosmos3-Nano action-generation cookbooks across two inference backends — native +PyTorch (Cosmos Framework) and vLLM-Omni. + +## Basic Examples + +The [`basic_examples/`](./basic_examples/) directory contains the shipped starter +cookbooks and sample assets. Community-contributed cookbooks are added as sibling +directories alongside `basic_examples/` — see the +[Contributing Guide](../../../../CONTRIBUTING.md) for the recipe structure. + +| Cookbook | Backend | Notebook | +|---------|---------|----------| +| Forward dynamics (AV, DROID, UMI) | Cosmos Framework | [`basic_examples/run_fd_with_cosmos_framework.ipynb`](./basic_examples/run_fd_with_cosmos_framework.ipynb) | +| Inverse dynamics (AV) | Cosmos Framework | [`basic_examples/run_id_with_cosmos_framework.ipynb`](./basic_examples/run_id_with_cosmos_framework.ipynb) | +| Policy (DROID) | Cosmos Framework | [`basic_examples/run_policy_with_cosmos_framework.md`](./basic_examples/run_policy_with_cosmos_framework.md) | +| Forward dynamics (AV, DROID, UMI) | vLLM-Omni | [`basic_examples/run_fd_with_vllm.ipynb`](./basic_examples/run_fd_with_vllm.ipynb) | +| Inverse dynamics (AV) | vLLM-Omni | [`basic_examples/run_id_with_vllm.ipynb`](./basic_examples/run_id_with_vllm.ipynb) | + +Both backends use the sample assets under [`basic_examples/assets/`](./basic_examples/assets/) and cover two tasks: - **Forward dynamics (`fd`)** — predict future observations from a start image plus an action trajectory (AV, DROID, and UMI robotics examples) using the Cosmos3-Nano. @@ -68,7 +84,7 @@ torchrun --nproc-per-node=1 \ The input spec pairs a start image with an action trajectory. The notebooks assemble ready-to-run specs for AV, DROID, and UMI examples from the checked-in -assets under [`assets/`](./assets). Outputs are written under the framework +assets under [`basic_examples/assets/`](./basic_examples/assets/). Outputs are written under the framework checkout. ### Cosmos Framework Walkthrough @@ -76,11 +92,11 @@ checkout. The Cosmos Framework build their input spec, run inference, and visualize the generated videos: -- [`run_fd_with_cosmos_framework.ipynb`](./run_fd_with_cosmos_framework.ipynb) — +- [`run_fd_with_cosmos_framework.ipynb`](./basic_examples/run_fd_with_cosmos_framework.ipynb) — forward dynamics for AV, DROID, and UMI robotics examples using Cosmos3-Nano. -- [`run_id_with_cosmos_framework.ipynb`](./run_id_with_cosmos_framework.ipynb) — +- [`run_id_with_cosmos_framework.ipynb`](./basic_examples/run_id_with_cosmos_framework.ipynb) — inverse dynamics, predicting ego-motion trajectories from input AV videos using Cosmos3-Nano. -- [`run_policy_with_cosmos_framework.md`](./run_policy_with_cosmos_framework.md) - policy, predicting future observations and action trajectories for DROID robot using Cosmos3-Nano-Policy-DROID. +- [`run_policy_with_cosmos_framework.md`](./basic_examples/run_policy_with_cosmos_framework.md) - policy, predicting future observations and action trajectories for DROID robot using Cosmos3-Nano-Policy-DROID. ## Run with vLLM-Omni @@ -100,8 +116,8 @@ curl http://localhost:8001/v1/models Forward-dynamics requests are multipart `POST`s to `/v1/videos` — a start image under `files={"input_reference": ...}` plus an `extra_params` payload carrying the action trajectory. The vLLM notebooks use these diffusion defaults for action -generation (see [`run_fd_with_vllm.ipynb`](./run_fd_with_vllm.ipynb) and -[`run_id_with_vllm.ipynb`](./run_id_with_vllm.ipynb)): +generation (see [`run_fd_with_vllm.ipynb`](./basic_examples/run_fd_with_vllm.ipynb) and +[`run_id_with_vllm.ipynb`](./basic_examples/run_id_with_vllm.ipynb)): | Field | Value | | --- | --- | @@ -117,9 +133,9 @@ including autoregressive chunked generation for the robotics examples. The vLLM-Omni notebooks send requests through the OpenAI-compatible video API and write outputs under `outputs/cosmos3_action_vllm/`: -- [`run_fd_with_vllm.ipynb`](./run_fd_with_vllm.ipynb) — forward dynamics for AV, +- [`run_fd_with_vllm.ipynb`](./basic_examples/run_fd_with_vllm.ipynb) — forward dynamics for AV, DROID, and UMI robotics examples. -- [`run_id_with_vllm.ipynb`](./run_id_with_vllm.ipynb) — inverse dynamics, +- [`run_id_with_vllm.ipynb`](./basic_examples/run_id_with_vllm.ipynb) — inverse dynamics, predicting ego-motion trajectories from input AV videos. diff --git a/cookbooks/cosmos3/generator/action/assets/actions/av_traj_forward.json b/cookbooks/cosmos3/generator/action/basic_examples/assets/actions/av_traj_forward.json similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/actions/av_traj_forward.json rename to cookbooks/cosmos3/generator/action/basic_examples/assets/actions/av_traj_forward.json diff --git a/cookbooks/cosmos3/generator/action/assets/actions/av_traj_left.json b/cookbooks/cosmos3/generator/action/basic_examples/assets/actions/av_traj_left.json similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/actions/av_traj_left.json rename to cookbooks/cosmos3/generator/action/basic_examples/assets/actions/av_traj_left.json diff --git a/cookbooks/cosmos3/generator/action/assets/actions/av_traj_right.json b/cookbooks/cosmos3/generator/action/basic_examples/assets/actions/av_traj_right.json similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/actions/av_traj_right.json rename to cookbooks/cosmos3/generator/action/basic_examples/assets/actions/av_traj_right.json diff --git a/cookbooks/cosmos3/generator/action/assets/actions/umi.json b/cookbooks/cosmos3/generator/action/basic_examples/assets/actions/umi.json similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/actions/umi.json rename to cookbooks/cosmos3/generator/action/basic_examples/assets/actions/umi.json diff --git a/cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/data/chunk-000/file-000.parquet b/cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/data/chunk-000/file-000.parquet similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/data/chunk-000/file-000.parquet rename to cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/data/chunk-000/file-000.parquet diff --git a/cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/episodes/chunk-000/file-000.parquet b/cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/meta/episodes/chunk-000/file-000.parquet similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/episodes/chunk-000/file-000.parquet rename to cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/meta/episodes/chunk-000/file-000.parquet diff --git a/cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/info.json b/cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/meta/info.json similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/info.json rename to cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/meta/info.json diff --git a/cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/tasks.parquet b/cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/meta/tasks.parquet similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/tasks.parquet rename to cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/meta/tasks.parquet diff --git a/cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/videos/observation.image.exterior_image_1_left/chunk-000/file-000.mp4 b/cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/videos/observation.image.exterior_image_1_left/chunk-000/file-000.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/videos/observation.image.exterior_image_1_left/chunk-000/file-000.mp4 rename to cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/videos/observation.image.exterior_image_1_left/chunk-000/file-000.mp4 diff --git a/cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/videos/observation.image.exterior_image_2_left/chunk-000/file-000.mp4 b/cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/videos/observation.image.exterior_image_2_left/chunk-000/file-000.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/videos/observation.image.exterior_image_2_left/chunk-000/file-000.mp4 rename to cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/videos/observation.image.exterior_image_2_left/chunk-000/file-000.mp4 diff --git a/cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/videos/observation.image.wrist_image_left/chunk-000/file-000.mp4 b/cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/videos/observation.image.wrist_image_left/chunk-000/file-000.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/videos/observation.image.wrist_image_left/chunk-000/file-000.mp4 rename to cookbooks/cosmos3/generator/action/basic_examples/assets/droid_lerobot_example/videos/observation.image.wrist_image_left/chunk-000/file-000.mp4 diff --git a/cookbooks/cosmos3/generator/action/assets/images/av_0.jpg b/cookbooks/cosmos3/generator/action/basic_examples/assets/images/av_0.jpg similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/images/av_0.jpg rename to cookbooks/cosmos3/generator/action/basic_examples/assets/images/av_0.jpg diff --git a/cookbooks/cosmos3/generator/action/assets/images/av_1.jpg b/cookbooks/cosmos3/generator/action/basic_examples/assets/images/av_1.jpg similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/images/av_1.jpg rename to cookbooks/cosmos3/generator/action/basic_examples/assets/images/av_1.jpg diff --git a/cookbooks/cosmos3/generator/action/assets/images/umi.png b/cookbooks/cosmos3/generator/action/basic_examples/assets/images/umi.png similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/images/umi.png rename to cookbooks/cosmos3/generator/action/basic_examples/assets/images/umi.png diff --git a/cookbooks/cosmos3/generator/action/assets/videos/av_0.mp4 b/cookbooks/cosmos3/generator/action/basic_examples/assets/videos/av_0.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/videos/av_0.mp4 rename to cookbooks/cosmos3/generator/action/basic_examples/assets/videos/av_0.mp4 diff --git a/cookbooks/cosmos3/generator/action/assets/videos/av_1.mp4 b/cookbooks/cosmos3/generator/action/basic_examples/assets/videos/av_1.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/videos/av_1.mp4 rename to cookbooks/cosmos3/generator/action/basic_examples/assets/videos/av_1.mp4 diff --git a/cookbooks/cosmos3/generator/action/assets/videos/robolab_example_rollout.mp4 b/cookbooks/cosmos3/generator/action/basic_examples/assets/videos/robolab_example_rollout.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/videos/robolab_example_rollout.mp4 rename to cookbooks/cosmos3/generator/action/basic_examples/assets/videos/robolab_example_rollout.mp4 diff --git a/cookbooks/cosmos3/generator/action/assets/videos/umi.mp4 b/cookbooks/cosmos3/generator/action/basic_examples/assets/videos/umi.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/action/assets/videos/umi.mp4 rename to cookbooks/cosmos3/generator/action/basic_examples/assets/videos/umi.mp4 diff --git a/cookbooks/cosmos3/generator/action/run_fd_with_cosmos_framework.ipynb b/cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_cosmos_framework.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/action/run_fd_with_cosmos_framework.ipynb rename to cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_cosmos_framework.ipynb diff --git a/cookbooks/cosmos3/generator/action/run_fd_with_vllm.ipynb b/cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_vllm.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/action/run_fd_with_vllm.ipynb rename to cookbooks/cosmos3/generator/action/basic_examples/run_fd_with_vllm.ipynb diff --git a/cookbooks/cosmos3/generator/action/run_id_with_cosmos_framework.ipynb b/cookbooks/cosmos3/generator/action/basic_examples/run_id_with_cosmos_framework.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/action/run_id_with_cosmos_framework.ipynb rename to cookbooks/cosmos3/generator/action/basic_examples/run_id_with_cosmos_framework.ipynb diff --git a/cookbooks/cosmos3/generator/action/run_id_with_vllm.ipynb b/cookbooks/cosmos3/generator/action/basic_examples/run_id_with_vllm.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/action/run_id_with_vllm.ipynb rename to cookbooks/cosmos3/generator/action/basic_examples/run_id_with_vllm.ipynb diff --git a/cookbooks/cosmos3/generator/action/run_policy_with_cosmos_framework.md b/cookbooks/cosmos3/generator/action/basic_examples/run_policy_with_cosmos_framework.md similarity index 100% rename from cookbooks/cosmos3/generator/action/run_policy_with_cosmos_framework.md rename to cookbooks/cosmos3/generator/action/basic_examples/run_policy_with_cosmos_framework.md diff --git a/cookbooks/cosmos3/generator/audiovisual/README.md b/cookbooks/cosmos3/generator/audiovisual/README.md index d80adad4..fbe388f4 100644 --- a/cookbooks/cosmos3/generator/audiovisual/README.md +++ b/cookbooks/cosmos3/generator/audiovisual/README.md @@ -1,14 +1,26 @@ -# Cosmos3 Generator Audiovisual Examples +# Cosmos3 Generator Audiovisual Cookbooks Generate images and video (with optional audio) from text or image prompts with -`Cosmos3-Nano` and `Cosmos3-Super`, across three inference backends. Sample -prompts live under [`assets/`](./assets). +`Cosmos3-Nano` and `Cosmos3-Super`, across three inference backends. Environment setup for every backend is centralized in the shared [Cosmos3 cookbooks environment setup](../../README.md) guide; each backend below links to the section you need. The quickstarts are minimal text-to-video examples to get one generation running per backend — run them from this folder. +## Basic Examples + +The [`basic_examples/`](./basic_examples/) directory contains the shipped starter +cookbooks and sample prompts. Community-contributed cookbooks are added as sibling +directories alongside `basic_examples/` — see the +[Contributing Guide](../../../../CONTRIBUTING.md) for the recipe structure. + +| Cookbook | Backend | Notebook | +|---------|---------|----------| +| T2I / T2V / I2V + audio | Cosmos Framework | [`basic_examples/run_with_cosmos_framework.ipynb`](./basic_examples/run_with_cosmos_framework.ipynb) | +| T2I / T2V / I2V + audio | Diffusers | [`basic_examples/run_with_diffusers.ipynb`](./basic_examples/run_with_diffusers.ipynb) | +| T2I / T2V / I2V + audio | vLLM-Omni | [`basic_examples/run_with_vllm_omni.ipynb`](./basic_examples/run_with_vllm_omni.ipynb) | + Generator requires the Guardrail. Request access to the gated [nvidia/Cosmos-1.0-Guardrail](https://huggingface.co/nvidia/Cosmos-1.0-Guardrail) HF repository before running these examples. To disable the guardrail, set @@ -31,12 +43,12 @@ import json from pathlib import Path prompt = json.dumps( - json.load(open("assets/prompts/text2video/robot_kitchen.json")), + json.load(open("basic_examples/assets/prompts/text2video/robot_kitchen.json")), ensure_ascii=True, separators=(",", ":"), ) negative = json.dumps( - json.load(open("assets/negative_prompts/text2video/neg_prompt.json")), + json.load(open("basic_examples/assets/negative_prompts/text2video/neg_prompt.json")), ensure_ascii=True, separators=(",", ":"), ) @@ -72,7 +84,7 @@ more GPUs via `--nproc-per-node`. ### Notebook walkthrough -[`run_with_cosmos_framework.ipynb`](./run_with_cosmos_framework.ipynb) is the full +[`run_with_cosmos_framework.ipynb`](./basic_examples/run_with_cosmos_framework.ipynb) is the full tutorial for the native PyTorch backend: it covers every use case — text-to-image, text-to-video, image-to-video, with audio on or off — and includes the detailed, environment-aware setup and visualization for each generation. @@ -91,8 +103,8 @@ from diffusers import Cosmos3OmniPipeline from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler from diffusers.utils import export_to_video -prompt = json.load(open("assets/prompts/text2video/robot_kitchen.json")) -negative = json.load(open("assets/negative_prompts/text2video/neg_prompt.json")) +prompt = json.load(open("basic_examples/assets/prompts/text2video/robot_kitchen.json")) +negative = json.load(open("basic_examples/assets/negative_prompts/text2video/neg_prompt.json")) pipe = Cosmos3OmniPipeline.from_pretrained( "nvidia/Cosmos3-Nano", torch_dtype=torch.bfloat16, device_map="cuda" @@ -122,7 +134,7 @@ To run **Cosmos3-Super** instead, load the larger checkpoint: ### Notebook walkthrough -[`run_with_diffusers.ipynb`](./run_with_diffusers.ipynb) is the full tutorial for +[`run_with_diffusers.ipynb`](./basic_examples/run_with_diffusers.ipynb) is the full tutorial for the Diffusers backend: it provisions a dedicated venv, then walks through text-to-image, text-to-video, and image-to-video generation (with and without audio) using `Cosmos3OmniPipeline`, including how to preview the generated media. @@ -145,8 +157,8 @@ from pathlib import Path import requests -prompt = json.load(open("assets/prompts/text2video/robot_kitchen.json")) -negative = json.load(open("assets/negative_prompts/text2video/neg_prompt.json")) +prompt = json.load(open("basic_examples/assets/prompts/text2video/robot_kitchen.json")) +negative = json.load(open("basic_examples/assets/negative_prompts/text2video/neg_prompt.json")) response = requests.post( "http://localhost:8000/v1/videos/sync", @@ -179,7 +191,7 @@ For image-to-video, post to the same endpoint with an image under ### Notebook walkthrough -[`run_with_vllm_omni.ipynb`](./run_with_vllm_omni.ipynb) is the full tutorial for +[`run_with_vllm_omni.ipynb`](./basic_examples/run_with_vllm_omni.ipynb) is the full tutorial for the vLLM-Omni backend: it walks through text-to-image, text-to-video, and image-to-video requests with audio on or off. Server launch options (Nano and Super, tensor parallelism, layerwise offload, and CFG-parallel variants) live in diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/car_driving.jpg b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/images/image2video/car_driving.jpg similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/car_driving.jpg rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/images/image2video/car_driving.jpg diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/coastal_road_audio.jpg b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/images/image2video/coastal_road_audio.jpg similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/coastal_road_audio.jpg rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/images/image2video/coastal_road_audio.jpg diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/humanoid_robot.jpg b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/images/image2video/humanoid_robot.jpg similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/humanoid_robot.jpg rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/images/image2video/humanoid_robot.jpg diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/image2video/neg_prompt.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/negative_prompts/image2video/neg_prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/image2video/neg_prompt.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/negative_prompts/image2video/neg_prompt.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/text2video/neg_prompt.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/negative_prompts/text2video/neg_prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/text2video/neg_prompt.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/negative_prompts/text2video/neg_prompt.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/car_driving.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/image2video/car_driving.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/car_driving.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/image2video/car_driving.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/coastal_road_audio.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/image2video/coastal_road_audio.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/coastal_road_audio.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/image2video/coastal_road_audio.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/humanoid_robot.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/image2video/humanoid_robot.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/humanoid_robot.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/image2video/humanoid_robot.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2image/robot_draping.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2image/robot_draping.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2image/robot_draping.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2image/robot_draping.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/car_colliding.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2video/car_colliding.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/car_colliding.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2video/car_colliding.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_kitchen.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2video/robot_kitchen.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_kitchen.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2video/robot_kitchen.json diff --git a/cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_pouring_water_audio.json b/cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2video/robot_pouring_water_audio.json similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_pouring_water_audio.json rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/assets/prompts/text2video/robot_pouring_water_audio.json diff --git a/cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb b/cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_cosmos_framework.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_cosmos_framework.ipynb diff --git a/cookbooks/cosmos3/generator/audiovisual/run_with_diffusers.ipynb b/cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_diffusers.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/run_with_diffusers.ipynb rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_diffusers.ipynb diff --git a/cookbooks/cosmos3/generator/audiovisual/run_with_vllm_omni.ipynb b/cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_vllm_omni.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/audiovisual/run_with_vllm_omni.ipynb rename to cookbooks/cosmos3/generator/audiovisual/basic_examples/run_with_vllm_omni.ipynb diff --git a/cookbooks/cosmos3/generator/transfer/README.md b/cookbooks/cosmos3/generator/transfer/README.md index 0c477056..8c849185 100644 --- a/cookbooks/cosmos3/generator/transfer/README.md +++ b/cookbooks/cosmos3/generator/transfer/README.md @@ -1,7 +1,19 @@ -# Cosmos3 Generator Transfer Examples +# Cosmos3 Generator Transfer Cookbooks -Cosmos3-Nano video **transfer** examples on the native PyTorch (Cosmos Framework) path. -Sample assets under [`assets/`](./assets) cover spatial control signals paired with +Cosmos3-Nano video **transfer** cookbooks on the native PyTorch (Cosmos Framework) path. + +## Basic Examples + +The [`basic_examples/`](./basic_examples/) directory contains the shipped starter +cookbook and sample assets. Community-contributed cookbooks are added as sibling +directories alongside `basic_examples/` — see the +[Contributing Guide](../../../../CONTRIBUTING.md) for the recipe structure. + +| Cookbook | Backend | Notebook | +|---------|---------|----------| +| Video transfer (edge, blur, depth, seg, wsm) | Cosmos Framework | [`basic_examples/run_video_transfer_with_cosmos_framework.ipynb`](./basic_examples/run_video_transfer_with_cosmos_framework.ipynb) | + +Sample assets under [`basic_examples/assets/`](./basic_examples/assets/) cover spatial control signals paired with `prompt.json` files: - **Edge (Canny)** — edge map control plus caption. @@ -26,11 +38,11 @@ come from the control video; see the spec field reference for how `fps` and | Control | Asset folder | Inference input | Generation duration | | --- | --- | --- | --- | -| Edge (Canny) | `assets/edge/` | `control_edge.mp4` + `prompt.json` | 121 frames @ 30 FPS | -| Blur | `assets/blur/` | `control_blur.mp4` + `prompt.json` | 121 frames @ 30 FPS | -| Depth | `assets/depth/` | `control_depth.mp4` + `prompt.json` | 121 frames @ 30 FPS | -| Segmentation | `assets/seg/` | `control_seg.mp4` + `prompt.json` | 121 frames @ 30 FPS | -| World scenario (WSM) | `assets/wsm/` | `control_wsm.mp4` + `prompt.json` | 101 frames @ 10 FPS | +| Edge (Canny) | `basic_examples/assets/edge/` | `control_edge.mp4` + `prompt.json` | 121 frames @ 30 FPS | +| Blur | `basic_examples/assets/blur/` | `control_blur.mp4` + `prompt.json` | 121 frames @ 30 FPS | +| Depth | `basic_examples/assets/depth/` | `control_depth.mp4` + `prompt.json` | 121 frames @ 30 FPS | +| Segmentation | `basic_examples/assets/seg/` | `control_seg.mp4` + `prompt.json` | 121 frames @ 30 FPS | +| World scenario (WSM) | `basic_examples/assets/wsm/` | `control_wsm.mp4` + `prompt.json` | 101 frames @ 10 FPS | Transfer inference is selected automatically when any hint key is present in the spec. @@ -40,7 +52,7 @@ Transfer inference is selected automatically when any hint key is present in the Set up the environment: [Cosmos Framework setup](../../README.md#cosmos-framework). Activate the framework venv, then run inference (checked-in `specs/*.json` use paths -relative to `specs/`). Transfer on Nano looks like: +relative to `basic_examples/specs/`). Transfer on Nano looks like: ```bash cd cookbooks/cosmos3/generator/transfer @@ -49,7 +61,7 @@ cd cookbooks/cosmos3/generator/transfer torchrun --nproc-per-node=1 \ -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ - -i specs/edge.json \ + -i basic_examples/specs/edge.json \ -o ./output/ \ --checkpoint-path Cosmos3-Nano \ --seed 2026 @@ -58,7 +70,7 @@ torchrun --nproc-per-node=1 \ torchrun --nproc-per-node=1 \ -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ - -i specs/blur.json \ + -i basic_examples/specs/blur.json \ -o ./output/ \ --checkpoint-path Cosmos3-Nano \ --seed 2026 @@ -67,7 +79,7 @@ torchrun --nproc-per-node=1 \ torchrun --nproc-per-node=1 \ -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ - -i specs/depth.json \ + -i basic_examples/specs/depth.json \ -o ./output/ \ --checkpoint-path Cosmos3-Nano \ --seed 2026 @@ -76,7 +88,7 @@ torchrun --nproc-per-node=1 \ torchrun --nproc-per-node=1 \ -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ - -i specs/seg.json \ + -i basic_examples/specs/seg.json \ -o ./output/ \ --checkpoint-path Cosmos3-Nano \ --seed 2026 @@ -85,14 +97,14 @@ torchrun --nproc-per-node=1 \ torchrun --nproc-per-node=1 \ -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ - -i specs/wsm.json \ + -i basic_examples/specs/wsm.json \ -o ./output/ \ --checkpoint-path Cosmos3-Nano \ --seed 2026 ``` The input spec sets `prompt_path` and a hint block with `control_path` pointing at the -checked-in assets under [`assets/`](./assets) via paths relative to [`specs/`](./specs). +checked-in assets under [`basic_examples/assets/`](./basic_examples/assets/) via paths relative to [`basic_examples/specs/`](./basic_examples/specs/). Outputs are written under the directory passed to `-o`, with one subdirectory per sample name, for example `output/transfer_edge/vision.mp4`. Batch size must be 1 for transfer. @@ -137,10 +149,10 @@ Key fields: ### Cookbook entrypoints -- [`run_video_transfer_with_cosmos_framework.ipynb`](./run_video_transfer_with_cosmos_framework.ipynb) — +- [`run_video_transfer_with_cosmos_framework.ipynb`](./basic_examples/run_video_transfer_with_cosmos_framework.ipynb) — full tutorial on a **GPU host**: environment setup, `nvidia-smi` check, then five inference blocks (edge, blur, depth, seg, wsm) with previews. See [Cosmos3 environment setup](../../README.md). -- [`specs/`](./specs) — checked-in Framework input JSON per control (paths relative to `specs/`). +- [`basic_examples/specs/`](./basic_examples/specs/) — checked-in Framework input JSON per control (paths relative to `basic_examples/specs/`). ### Troubleshooting diff --git a/cookbooks/cosmos3/generator/transfer/assets/blur/control_blur.mp4 b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/blur/control_blur.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/blur/control_blur.mp4 rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/blur/control_blur.mp4 diff --git a/cookbooks/cosmos3/generator/transfer/assets/blur/prompt.json b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/blur/prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/blur/prompt.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/blur/prompt.json diff --git a/cookbooks/cosmos3/generator/transfer/assets/depth/control_depth.mp4 b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/depth/control_depth.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/depth/control_depth.mp4 rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/depth/control_depth.mp4 diff --git a/cookbooks/cosmos3/generator/transfer/assets/depth/prompt.json b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/depth/prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/depth/prompt.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/depth/prompt.json diff --git a/cookbooks/cosmos3/generator/transfer/assets/edge/control_edge.mp4 b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/edge/control_edge.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/edge/control_edge.mp4 rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/edge/control_edge.mp4 diff --git a/cookbooks/cosmos3/generator/transfer/assets/edge/prompt.json b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/edge/prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/edge/prompt.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/edge/prompt.json diff --git a/cookbooks/cosmos3/generator/transfer/assets/negative_prompt.json b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/negative_prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/negative_prompt.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/negative_prompt.json diff --git a/cookbooks/cosmos3/generator/transfer/assets/seg/control_seg.mp4 b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/seg/control_seg.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/seg/control_seg.mp4 rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/seg/control_seg.mp4 diff --git a/cookbooks/cosmos3/generator/transfer/assets/seg/prompt.json b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/seg/prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/seg/prompt.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/seg/prompt.json diff --git a/cookbooks/cosmos3/generator/transfer/assets/wsm/control_wsm.mp4 b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/wsm/control_wsm.mp4 similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/wsm/control_wsm.mp4 rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/wsm/control_wsm.mp4 diff --git a/cookbooks/cosmos3/generator/transfer/assets/wsm/prompt.json b/cookbooks/cosmos3/generator/transfer/basic_examples/assets/wsm/prompt.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/assets/wsm/prompt.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/assets/wsm/prompt.json diff --git a/cookbooks/cosmos3/generator/transfer/preview_helpers.py b/cookbooks/cosmos3/generator/transfer/basic_examples/preview_helpers.py similarity index 100% rename from cookbooks/cosmos3/generator/transfer/preview_helpers.py rename to cookbooks/cosmos3/generator/transfer/basic_examples/preview_helpers.py diff --git a/cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb b/cookbooks/cosmos3/generator/transfer/basic_examples/run_video_transfer_with_cosmos_framework.ipynb similarity index 100% rename from cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb rename to cookbooks/cosmos3/generator/transfer/basic_examples/run_video_transfer_with_cosmos_framework.ipynb diff --git a/cookbooks/cosmos3/generator/transfer/specs/blur.json b/cookbooks/cosmos3/generator/transfer/basic_examples/specs/blur.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/specs/blur.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/specs/blur.json diff --git a/cookbooks/cosmos3/generator/transfer/specs/depth.json b/cookbooks/cosmos3/generator/transfer/basic_examples/specs/depth.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/specs/depth.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/specs/depth.json diff --git a/cookbooks/cosmos3/generator/transfer/specs/edge.json b/cookbooks/cosmos3/generator/transfer/basic_examples/specs/edge.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/specs/edge.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/specs/edge.json diff --git a/cookbooks/cosmos3/generator/transfer/specs/seg.json b/cookbooks/cosmos3/generator/transfer/basic_examples/specs/seg.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/specs/seg.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/specs/seg.json diff --git a/cookbooks/cosmos3/generator/transfer/specs/wsm.json b/cookbooks/cosmos3/generator/transfer/basic_examples/specs/wsm.json similarity index 100% rename from cookbooks/cosmos3/generator/transfer/specs/wsm.json rename to cookbooks/cosmos3/generator/transfer/basic_examples/specs/wsm.json diff --git a/cookbooks/cosmos3/reasoner/README.md b/cookbooks/cosmos3/reasoner/README.md index b3d3542d..3718e45a 100644 --- a/cookbooks/cosmos3/reasoner/README.md +++ b/cookbooks/cosmos3/reasoner/README.md @@ -1,15 +1,28 @@ -# Cosmos3 Reasoner Examples +# Cosmos3 Reasoner Cookbooks Run the Cosmos3 Reasoner (vision-language reasoning over images and video) across -multiple inference backends. Sample inputs live under [`assets/`](./assets). +multiple inference backends. Environment setup for every backend is centralized in the shared [Cosmos3 cookbooks environment setup](../README.md) guide; each backend below links to the section you need. +## Basic Examples + +The [`basic_examples/`](./basic_examples/) directory contains the shipped starter +cookbooks and sample inputs. Community-contributed cookbooks are added as sibling +directories alongside `basic_examples/` — see the +[Contributing Guide](../../../CONTRIBUTING.md) for the recipe structure. + +| Cookbook | Backend | Notebook | +|---------|---------|----------| +| Reasoner inference | Cosmos Framework | [`basic_examples/run_with_cosmos_framework.ipynb`](./basic_examples/run_with_cosmos_framework.ipynb) | +| Reasoner inference | vLLM | [`basic_examples/run_with_vllm.ipynb`](./basic_examples/run_with_vllm.ipynb) | +| Reasoner inference | NIM | [`basic_examples/run_with_nim.ipynb`](./basic_examples/run_with_nim.ipynb) | + ## Reasoner Prompt Guide -See the [Reasoner Prompt Guide](./reasoner_prompt_guide.md). +See the [Reasoner Prompt Guide](./basic_examples/reasoner_prompt_guide.md). ## Run with Cosmos Framework @@ -29,7 +42,7 @@ cat > outputs/cookbooks/cosmos3/reasoner/inputs/robot_image.json <<'JSON' "model_mode": "reasoner", "name": "robot_image", "prompt": "Describe what is happening in this image in one sentence.", - "vision_path": "../../cookbooks/cosmos3/reasoner/assets/robot_153.jpg", + "vision_path": "../../cookbooks/cosmos3/reasoner/basic_examples/assets/robot_153.jpg", "enable_sound": false } JSON @@ -54,7 +67,7 @@ The generated text is written to ### Notebook walkthrough -[`run_with_cosmos_framework.ipynb`](./run_with_cosmos_framework.ipynb) is the full +[`run_with_cosmos_framework.ipynb`](./basic_examples/run_with_cosmos_framework.ipynb) is the full tutorial. It writes text and image smoke tests, then walks through image capability sections — detailed captioning, robot task planning, 2D grounding, describe-anything, and action-trajectory prompts — rendering the prompt, media @@ -73,7 +86,7 @@ Set up the environment and start the server: [Start the server](../README.md#start-the-server) (launch commands). The quickstart below uses **Cosmos3-Nano** on port 8000. The -[`run_with_vllm.ipynb`](./run_with_vllm.ipynb) notebook defaults to +[`run_with_vllm.ipynb`](./basic_examples/run_with_vllm.ipynb) notebook defaults to **Cosmos3-Super** on port **8001** — use that launch command from the env setup guide and point the client at `http://localhost:8001/v1`. @@ -83,7 +96,7 @@ Once the server is ready, query it with the OpenAI client: from pathlib import Path import openai -image_path = Path("assets/robot_153.jpg").resolve() +image_path = Path("basic_examples/assets/robot_153.jpg").resolve() image_url = image_path.as_uri() client = openai.OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1") @@ -108,7 +121,7 @@ print(response.choices[0].message.content) ### Notebook walkthrough -[`run_with_vllm.ipynb`](./run_with_vllm.ipynb) uses the **Cosmos3-Super** launch +[`run_with_vllm.ipynb`](./basic_examples/run_with_vllm.ipynb) uses the **Cosmos3-Super** launch from the [environment setup guide](../README.md#start-the-server) and walks through many more image and video examples: detailed captioning, VQA, temporal localization, embodied reasoning, common-sense reasoning, 2D @@ -138,7 +151,7 @@ import mimetypes from pathlib import Path import openai -image_path = Path("assets/robot_153.jpg").resolve() +image_path = Path("basic_examples/assets/robot_153.jpg").resolve() mime = mimetypes.guess_type(image_path.name)[0] or "application/octet-stream" image_url = f"data:{mime};base64,{base64.b64encode(image_path.read_bytes()).decode('ascii')}" @@ -170,7 +183,7 @@ for the full request reference. ### Notebook walkthrough -[`run_with_nim.ipynb`](./run_with_nim.ipynb) is the NIM counterpart to the vLLM +[`run_with_nim.ipynb`](./basic_examples/run_with_nim.ipynb) is the NIM counterpart to the vLLM notebook: it launches the NIM container, waits for readiness, and then runs the same image and video examples — detailed captioning, VQA, temporal localization, embodied reasoning, common-sense reasoning, 2D grounding, describe-anything, diff --git a/cookbooks/cosmos3/reasoner/assets/action_cot_driving_scene.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/action_cot_driving_scene.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/action_cot_driving_scene.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/action_cot_driving_scene.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/action_cot_trajectory.png b/cookbooks/cosmos3/reasoner/basic_examples/assets/action_cot_trajectory.png similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/action_cot_trajectory.png rename to cookbooks/cosmos3/reasoner/basic_examples/assets/action_cot_trajectory.png diff --git a/cookbooks/cosmos3/reasoner/assets/assisted_task_next_action.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/assisted_task_next_action.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/assisted_task_next_action.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/assisted_task_next_action.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/common_sense_reasoning.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/common_sense_reasoning.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/common_sense_reasoning.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/common_sense_reasoning.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/describe_anything.png b/cookbooks/cosmos3/reasoner/basic_examples/assets/describe_anything.png similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/describe_anything.png rename to cookbooks/cosmos3/reasoner/basic_examples/assets/describe_anything.png diff --git a/cookbooks/cosmos3/reasoner/assets/drive_scene_next_action.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/drive_scene_next_action.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/drive_scene_next_action.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/drive_scene_next_action.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/grounding_2d.png b/cookbooks/cosmos3/reasoner/basic_examples/assets/grounding_2d.png similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/grounding_2d.png rename to cookbooks/cosmos3/reasoner/basic_examples/assets/grounding_2d.png diff --git a/cookbooks/cosmos3/reasoner/assets/physical_plausibility.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/physical_plausibility.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/physical_plausibility.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/physical_plausibility.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/robot_153.jpg b/cookbooks/cosmos3/reasoner/basic_examples/assets/robot_153.jpg similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/robot_153.jpg rename to cookbooks/cosmos3/reasoner/basic_examples/assets/robot_153.jpg diff --git a/cookbooks/cosmos3/reasoner/assets/robot_planning.png b/cookbooks/cosmos3/reasoner/basic_examples/assets/robot_planning.png similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/robot_planning.png rename to cookbooks/cosmos3/reasoner/basic_examples/assets/robot_planning.png diff --git a/cookbooks/cosmos3/reasoner/assets/robotics_next_action.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/robotics_next_action.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/robotics_next_action.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/robotics_next_action.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/situation_understanding.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/situation_understanding.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/situation_understanding.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/situation_understanding.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/temporal_localization_1.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/temporal_localization_1.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/temporal_localization_1.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/temporal_localization_1.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/temporal_localization_2.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/temporal_localization_2.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/temporal_localization_2.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/temporal_localization_2.mp4 diff --git a/cookbooks/cosmos3/reasoner/assets/video_caption.mp4 b/cookbooks/cosmos3/reasoner/basic_examples/assets/video_caption.mp4 similarity index 100% rename from cookbooks/cosmos3/reasoner/assets/video_caption.mp4 rename to cookbooks/cosmos3/reasoner/basic_examples/assets/video_caption.mp4 diff --git a/cookbooks/cosmos3/reasoner/reasoner_prompt_guide.md b/cookbooks/cosmos3/reasoner/basic_examples/reasoner_prompt_guide.md similarity index 100% rename from cookbooks/cosmos3/reasoner/reasoner_prompt_guide.md rename to cookbooks/cosmos3/reasoner/basic_examples/reasoner_prompt_guide.md diff --git a/cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb b/cookbooks/cosmos3/reasoner/basic_examples/run_with_cosmos_framework.ipynb similarity index 100% rename from cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb rename to cookbooks/cosmos3/reasoner/basic_examples/run_with_cosmos_framework.ipynb diff --git a/cookbooks/cosmos3/reasoner/run_with_nim.ipynb b/cookbooks/cosmos3/reasoner/basic_examples/run_with_nim.ipynb similarity index 100% rename from cookbooks/cosmos3/reasoner/run_with_nim.ipynb rename to cookbooks/cosmos3/reasoner/basic_examples/run_with_nim.ipynb diff --git a/cookbooks/cosmos3/reasoner/run_with_vllm.ipynb b/cookbooks/cosmos3/reasoner/basic_examples/run_with_vllm.ipynb similarity index 100% rename from cookbooks/cosmos3/reasoner/run_with_vllm.ipynb rename to cookbooks/cosmos3/reasoner/basic_examples/run_with_vllm.ipynb