Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 21 additions & 12 deletions docs/docs/extraction/support-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,41 @@ Before you begin using [NeMo Retriever Library](overview.md), ensure that you ha

## Core and Advanced Pipeline Features

The Nemo Retriever Library extraction core pipeline features run on a single A10G or better GPU.
The NeMo Retriever Library extraction core pipeline features run on a single A10G or better GPU.
The core pipeline models (for document type inputs) include the following:

**ToDo: also link NIM doc pages for each model**

- [llama-nemotron-embed-1b-v2](https://huggingface.co/nvidia/llama-nemotron-embed-vl-1b-v2) — Embedding model for converting text chunks into vectors.
- [nemotron-page-elements-v3](https://huggingface.co/nvidia/nemotron-page-elements-v3) — Detects and classifies images on a page as a table, chart or infographic.
- [nemotron-table-structure-v1](https://huggingface.co/nvidia/nemotron-table-structure-v1) — Detects rows, columns, and cells within a table to preserve table structure and convert to Markdown format.
- [nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2) — Image OCR model to detect and extract text from images.
- [llama-nemotron-embed-1b-v2](https://huggingface.co/nvidia/llama-nemotron-embed-vl-1b-v2) — Embedding model for converting text chunks into vectors. NVIDIA NIM: [NeMo Retriever Text Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html).
- [nemotron-page-elements-v3](https://huggingface.co/nvidia/nemotron-page-elements-v3) — Detects and classifies images on a page as a table, chart or infographic. NVIDIA NIM: [NVIDIA NIM for Object Detection — NeMo Retriever Page Elements v3](https://docs.nvidia.com/nim/ingestion/object-detection/latest/support-matrix.html#nemo-retriever-page-elements-v3).
- [nemotron-table-structure-v1](https://huggingface.co/nvidia/nemotron-table-structure-v1) — Detects rows, columns, and cells within a table to preserve table structure and convert to Markdown format. NVIDIA NIM: [NVIDIA NIM for Object Detection — NeMo Retriever Table Structure v1](https://docs.nvidia.com/nim/ingestion/object-detection/latest/support-matrix.html#nemo-retriever-table-structure-v1).
- [nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2) — Image OCR model to detect and extract text from images. NVIDIA NIM: [NVIDIA NIM for Image OCR (NeMo Retriever OCR)](https://docs.nvidia.com/nim/ingestion/image-ocr/latest/overview.html) (see the [Image OCR support matrix](https://docs.nvidia.com/nim/ingestion/image-ocr/latest/support-matrix.html) for the currently published NIM model IDs).

Advanced features (for example, for audio/video) require additional GPU support and disk space.
This includes the following:

- [parakeet-1-1b-ctc-en-us](https://huggingface.co/nvidia/parakeet-ctc-1.1b) for transcript extraction from [audio and video](audio.md).
- [nemotron-parse](https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.2) - for [maximally accurate table extraction](nemoretriever-parse.md).
- [nemotron-nano-12b-v2-vl](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2) for image captioning of unstructured (not charts, tables, infographics) images.
- [parakeet-1-1b-ctc-en-us](https://huggingface.co/nvidia/parakeet-ctc-1.1b) for transcript extraction from [audio and video](audio.md). NVIDIA NIM: [Parakeet CTC (en-US) ASR](https://docs.nvidia.com/nim/speech/latest/asr/deploy-asr-models/parakeet-ctc-en-us.html).
- [nemotron-parse](https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.2) - for [maximally accurate table extraction](nemoretriever-parse.md). NVIDIA NIM: [Query the Nemotron-Parse-v1.2 API](https://docs.nvidia.com/nim/vision-language-models/latest/examples/nemotron-parse/api.html).
- [nemotron-nano-12b-v2-vl](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2) for image captioning of unstructured (not charts, tables, infographics) images. NVIDIA NIM: [Query the Nemotron Nano 12B v2 VL API](https://docs.nvidia.com/nim/vision-language-models/latest/examples/nemotron-nano-12b-v2-vl/api.html).

!!! note

While nemotron-nano-12b-v2-vl is the default VLM, you can configure and use other vision language models for image captioning based on your specific use case requirements. For more information, refer to [Extract Captions from Images](python-api-reference.md#extract-captions-from-images).

- [llama-nemotron-rerank-vl-1b-v2](https://huggingface.co/nvidia/llama-nemotron-rerank-vl-1b-v2) for improved retrieval accuracy.
- [llama-nemotron-rerank-vl-1b-v2](https://huggingface.co/nvidia/llama-nemotron-rerank-vl-1b-v2) for improved retrieval accuracy. NVIDIA NIM: [NeMo Retriever Text Reranking NIM](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html).

## HuggingFace Model Storage Requirements:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent "HuggingFace" vs "Hugging Face" spelling

The section heading uses HuggingFace (one word) while the body text on the very next line uses "Hugging Face" (two words, NVIDIA's preferred branding). Aligning the heading to the two-word form keeps the page consistent.

Suggested change
## HuggingFace Model Storage Requirements:
## Hugging Face Model Storage Requirements:
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/support-matrix.md
Line: 29

Comment:
**Inconsistent "HuggingFace" vs "Hugging Face" spelling**

The section heading uses `HuggingFace` (one word) while the body text on the very next line uses "Hugging Face" (two words, NVIDIA's preferred branding). Aligning the heading to the two-word form keeps the page consistent.

```suggestion
## Hugging Face Model Storage Requirements:
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


**ToDo: add model weight sizes on disk for each HF model**
Approximate **Hugging Face checkpoint / weight footprint** (files such as `model*.safetensors`, `weights.pth`, or other published weight bundles in the model repository). Values are rounded from the current public file listing and can change when the repository is updated.

| Model (as linked above) | Hugging Face repo | Approximate weights on disk |
|-------------------------|-------------------|----------------------------|
| llama-nemotron-embed-1b-v2 (VL) | [`nvidia/llama-nemotron-embed-vl-1b-v2`](https://huggingface.co/nvidia/llama-nemotron-embed-vl-1b-v2) | ~3.1 GiB |
| nemotron-page-elements-v3 | [`nvidia/nemotron-page-elements-v3`](https://huggingface.co/nvidia/nemotron-page-elements-v3) | ~0.41 GiB |
| nemotron-table-structure-v1 | [`nvidia/nemotron-table-structure-v1`](https://huggingface.co/nvidia/nemotron-table-structure-v1) | ~0.81 GiB |
| nemotron-ocr-v2 | [`nvidia/nemotron-ocr-v2`](https://huggingface.co/nvidia/nemotron-ocr-v2) | ~0.51 GiB |
| parakeet-1-1b-ctc-en-us | [`nvidia/parakeet-ctc-1.1b`](https://huggingface.co/nvidia/parakeet-ctc-1.1b) | ~4.0 GiB (`model.safetensors`; the repo also ships a separate `parakeet-ctc-1.1b.nemo` export of similar size—use one format if you want to avoid roughly doubling disk use) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Long inline prose in table cell may render poorly

The parakeet row's "Approximate weights on disk" cell contains a multi-sentence parenthetical (model.safetensors; the repo also ships a separate .nemo export…). Most Markdown renderers do not wrap table cells gracefully, so this produces a very wide column. Consider moving the note to a numbered footnote at the bottom of the table (matching the ¹ ² ³ style already used later in the document), or a short !!! note admonition below the table.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/support-matrix.md
Line: 39

Comment:
**Long inline prose in table cell may render poorly**

The parakeet row's "Approximate weights on disk" cell contains a multi-sentence parenthetical (`model.safetensors`; the repo also ships a separate `.nemo` export…). Most Markdown renderers do not wrap table cells gracefully, so this produces a very wide column. Consider moving the note to a numbered footnote at the bottom of the table (matching the ¹ ² ³ style already used later in the document), or a short `!!! note` admonition below the table.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

| nemotron-parse | [`nvidia/NVIDIA-Nemotron-Parse-v1.2`](https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.2) | ~3.5 GiB |
| nemotron-nano-12b-v2-vl | [`nvidia/NVIDIA-Nemotron-Nano-12B-v2`](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2) | ~22.9 GiB |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 HF weight size exceeds the NIM disk figure — clarification recommended

The new table lists nemotron-nano-12b-v2-vl HF weights at ~22.9 GiB, but the NIM Hardware Requirements table below shows "VLM | Additional Disk Space | ~16 GB." A reader will naturally compare the two and wonder how a 23 GiB checkpoint fits into 16 GB of disk. The NIM table presumably reflects quantized/optimised NIM container artifacts, not raw HF weights — a short parenthetical or footnote here (e.g., "NIM uses a quantized deployment artifact; see the NIM Hardware Requirements section for deployment disk figures") would prevent confusion.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/support-matrix.md
Line: 41

Comment:
**HF weight size exceeds the NIM disk figure — clarification recommended**

The new table lists `nemotron-nano-12b-v2-vl` HF weights at ~22.9 GiB, but the NIM Hardware Requirements table below shows "VLM | Additional Disk Space | ~16 GB." A reader will naturally compare the two and wonder how a 23 GiB checkpoint fits into 16 GB of disk. The NIM table presumably reflects quantized/optimised NIM container artifacts, not raw HF weights — a short parenthetical or footnote here (e.g., "NIM uses a quantized deployment artifact; see the NIM Hardware Requirements section for deployment disk figures") would prevent confusion.

How can I resolve this? If you propose a fix, please make it concise.

| llama-nemotron-rerank-vl-1b-v2 | [`nvidia/llama-nemotron-rerank-vl-1b-v2`](https://huggingface.co/nvidia/llama-nemotron-rerank-vl-1b-v2) | ~3.1 GiB |

## NIM Hardware Requirements:

Expand Down
Loading