Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
cd3c368
Update PDF blueprint architecture diagram
kheiss-uwzoo Feb 19, 2026
70b5a80
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 24, 2026
7f0248c
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 25, 2026
0dd5f1b
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 26, 2026
dea2770
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 27, 2026
3ff2f1f
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Feb 27, 2026
a886244
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 2, 2026
b44f7ad
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 2, 2026
addf637
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 2, 2026
5900322
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 4, 2026
d12df70
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 4, 2026
67e674b
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 10, 2026
83c3c42
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 10, 2026
371d883
Introduce release branch 26.03 with version 26.3.0-RC1
jdye64 Mar 11, 2026
4af706f
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
a5812fa
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
6ecb070
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
72173fc
Release prep: Update version to 26.03.0-RC1 (#1574)
jdye64 Mar 11, 2026
852910c
(retriever) Add .split() for text chunking by token count (#1547) (#1…
edknv Mar 11, 2026
64c694b
(retriever) add documentation for image file support (#1571) (#1577)
edknv Mar 11, 2026
d38abb2
[26.03] Refactor get_*_model_name to avoid caching fallback model nam…
charlesbluca Mar 11, 2026
fbd2e28
[26.03] (helm) More nemotron rebranding (#1581)
charlesbluca Mar 11, 2026
ba92f69
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 11, 2026
1835ba7
Add source_id column back to lancedb
jdye64 Mar 12, 2026
db03ed7
upmerge
jperez999 Mar 11, 2026
5cbf38e
fix reranker in inproc (#1588)
jperez999 Mar 12, 2026
6459e60
Add source_id to output columns
jdye64 Mar 12, 2026
ed95c44
fix in process extract to handle txt (#1589)
jperez999 Mar 12, 2026
9568b50
Release prep: 26.03.0-RC2 (#1591)
jdye64 Mar 12, 2026
4a8301e
Increase default Redis TTL from 1-2h to 48h to prevent job expiry dur…
jioffe502 Mar 11, 2026
4f4e512
Add Helm RTX PRO 4500 override, extend obj-det warmup batch size over…
charlesbluca Mar 12, 2026
41d2b07
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 12, 2026
be53306
(retriever) update nemotron_parse extraction method (#1599) (#1604)
edknv Mar 12, 2026
491aed0
(retriever) auto-route image files in .extract() for both inprocess a…
edknv Mar 12, 2026
82088d7
Dump libfreetype source in release container (#1600) (#1606)
charlesbluca Mar 12, 2026
10c7435
Unit test failure fixes (#1607)
jdye64 Mar 12, 2026
11662db
Fix markdown outputs for batch and inprocess. (#1601)
jioffe502 Mar 12, 2026
02c2dcd
(retriever) update pre/post-processing for improved recall (#1596) (#…
edknv Mar 12, 2026
f55a733
Remove get_hf_revision logic from code not inside the nemo_retriever …
jdye64 Mar 13, 2026
c00b6bf
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Mar 13, 2026
83a936c
Added air gap instructions to helm file (#1616)
kheiss-uwzoo Mar 13, 2026
4d9ce5f
fix for network call reranking (#1619)
jperez999 Mar 13, 2026
0a60c1a
Release prep: Update versions to 26.3.0-RC4 (#1620)
jdye64 Mar 13, 2026
86cda76
Updated RNs to show forthcoming changes (#1623)
kheiss-uwzoo Mar 13, 2026
e5e3b36
update rns (#1624)
kheiss-uwzoo Mar 13, 2026
ce8133d
Fix score (#1627)
jperez999 Mar 14, 2026
8908e21
rm assert on rerank and readme (#1628)
jperez999 Mar 14, 2026
7d112c3
cherry-pick 15b2bc05681599329276e46e83edfa0f15bb4318 from main
randerzander Mar 16, 2026
823775d
Release prep: update version references to 26.3.0 (#1638)
jdye64 Mar 16, 2026
7b54385
26.03 RNs (#1641)
kheiss-uwzoo Mar 17, 2026
b7be9ba
update quickstart library mode (#1642)
kheiss-uwzoo Mar 18, 2026
1c6ec79
update release version from 26.1.3 to 26.3.0 on Release Notes (#1643)
kheiss-uwzoo Mar 18, 2026
cfd0b72
Kheiss/bullets (#1644)
kheiss-uwzoo Mar 18, 2026
818de0a
Update README.md
kheiss-uwzoo Mar 18, 2026
671d78a
Updating & simplifying main README (#1647) (#1650)
jperez999 Mar 18, 2026
85168e2
updates to release notes to fix bullets and doc link (#1651)
kheiss-uwzoo Mar 18, 2026
4075ae9
Kheiss/5970976 (#1652)
kheiss-uwzoo Mar 18, 2026
ebb1253
Kheiss/5966534 (#1653)
kheiss-uwzoo Mar 18, 2026
924a18e
Kheiss/5970976 - change location of air gap documentation (#1656)
kheiss-uwzoo Mar 18, 2026
4129d5b
Revert doc naming changes
jdye64 Mar 19, 2026
22d58bf
Confirmed product naming of NeMo Retriever Library in files and code …
kheiss-uwzoo Mar 19, 2026
17e0148
update helm file (#1679)
kheiss-uwzoo Mar 20, 2026
3d4fdae
updated quickstart to current version following reversion (#1683)
kheiss-uwzoo Mar 23, 2026
b1f56bb
Kheiss/quickstart lib mode update (#1682)
kheiss-uwzoo Mar 23, 2026
19e77e1
Update RNs to current version (#1687)
kheiss-uwzoo Mar 23, 2026
0e0bebc
Kheiss/update quickstart (#1688)
kheiss-uwzoo Mar 23, 2026
77cb39a
update reference diagram for overview (#1689)
kheiss-uwzoo Mar 23, 2026
56c2c51
fixed reference information about name change from nv-ingest to NeMo …
kheiss-uwzoo Mar 23, 2026
6758c17
changed opening note to NVIDIA Ingest (nv-ingest) has been renamed N…
kheiss-uwzoo Mar 23, 2026
3db9a49
remove duplicate caption() section with wrong parameters (NVBug 60006…
kheiss-uwzoo Mar 23, 2026
f0f9e97
Kheiss/6000618 (#1694)
kheiss-uwzoo Mar 23, 2026
cf22e8c
fix syntax (#1696)
kheiss-uwzoo Mar 23, 2026
cc33bea
Kheiss/6000353 - update links to Helm chart (#1697)
kheiss-uwzoo Mar 23, 2026
fa30ff8
Document RTX PRO 4500 Blackwell (GB203) in hardware support matrix 59…
kheiss-uwzoo Mar 23, 2026
726340c
fixed the contributing.md (#1706)
sosahi Mar 24, 2026
ad96fc9
add contributing.md back to repository (#1709)
kheiss-uwzoo Mar 24, 2026
bcaf8f3
Kheiss/6000353 - update links to older RNs (#1712)
kheiss-uwzoo Mar 24, 2026
486a0de
Kheiss/5966538 - document Python 3.12+ as a prerequisite for NeMo Ret…
kheiss-uwzoo Mar 24, 2026
f07e881
Aligns NeMo Retriever Library extraction docs with the current defaul…
kheiss-uwzoo Mar 25, 2026
f6e5869
Align nemotron-parse overview with three methods (NVBug 5965574); (#1…
kheiss-uwzoo Mar 25, 2026
998f26b
Kheiss/updates0325 (#1734)
kheiss-uwzoo Mar 25, 2026
a6ef79a
removed duplication of the word NVIDIA (#1736)
kheiss-uwzoo Mar 26, 2026
a07ac1d
removed reference to zipking (#1737)
kheiss-uwzoo Mar 26, 2026
fd1353a
Fixed bug 5966370 (#1744)
kheiss-uwzoo Mar 27, 2026
c63daab
Align production GPU examples with support matrix (NVBug 5965601) (#…
kheiss-uwzoo Mar 30, 2026
9dc88b5
Kheiss/5966722 (#1743)
kheiss-uwzoo Mar 30, 2026
6c3c2a6
Updated files per bugs 5970369, 5966307, and 5966925 (#1740)
kheiss-uwzoo Mar 30, 2026
53262b4
Align VLM caption model and MinIO defaults with runtime (#1739)
kheiss-uwzoo Mar 30, 2026
1a91164
added licensing info to documentation (#1750)
kheiss-uwzoo Mar 30, 2026
b5d7b96
updated quickstart guide file per 5966239 (#1751)
kheiss-uwzoo Mar 30, 2026
4744677
update support matrix to add footnotes
kheiss-uwzoo Mar 30, 2026
e8759e2
update support matrix to add footnotes (#1752)
kheiss-uwzoo Mar 30, 2026
f39912f
Merge remote-tracking branch 'upstream/26.03' into 26.03
kheiss-uwzoo Mar 30, 2026
29f787b
Kheiss/5966297update (#1758)
kheiss-uwzoo Mar 31, 2026
c5e1c22
Align VLM caption model, fix V2 ingest() example, document run_pipel…
kheiss-uwzoo Mar 31, 2026
7461ce4
Merge remote-tracking branch 'upstream/26.03' into 26.03
kheiss-uwzoo Mar 31, 2026
d56a8cb
Merge branch '26.03' into main
kheiss-uwzoo Apr 2, 2026
3e80634
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Apr 2, 2026
7f73df3
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Apr 14, 2026
4ce21b5
Merge remote-tracking branch 'upstream/main'
kheiss-uwzoo Apr 15, 2026
21a756b
Creating NRL only posting for GitHub
kheiss-uwzoo Apr 15, 2026
e7cb523
Merge branch 'main' into kheiss/NRLonly
kheiss-uwzoo Apr 16, 2026
31117de
NRL centric GitHub pages
kheiss-uwzoo Apr 16, 2026
ebd9fd5
ci(docs): add NRL GitHub Pages workflow, mkdocs config, and helper sc…
kheiss-uwzoo Apr 16, 2026
17f2504
docs: add NVIDIA logo icon to NRL staging overrides for MkDocs build
kheiss-uwzoo Apr 16, 2026
648b597
docs(nrl): emit site root index.html via redirect to Library overview
kheiss-uwzoo Apr 16, 2026
3eac3ca
Apply suggestion from @greptile-apps[bot]
kheiss-uwzoo Apr 17, 2026
1482710
docs: NRL workflows, navigation, and rename note across extraction pages
kheiss-uwzoo Apr 17, 2026
286921b
docs: fix broken internal links and anchors for MkDocs NRL build
kheiss-uwzoo Apr 17, 2026
97f4e49
docs: update internal links for clarity in reranking documentation
kheiss-uwzoo Apr 17, 2026
5718b99
docs: replace instructional 'see' with 'refer to' in extraction topics
kheiss-uwzoo Apr 17, 2026
a43263c
NRL only doc updates
kheiss-uwzoo Apr 20, 2026
8a3b8be
Merge branch 'main' into kheiss/NRLonly
kheiss-uwzoo Apr 20, 2026
22476ee
Update docs/mkdocs.nrl-github-pages.yml
kheiss-uwzoo Apr 20, 2026
4e3001c
Update docs/docs/extraction/overview.md
kheiss-uwzoo Apr 20, 2026
5284d29
Merge branch 'main' into kheiss/NRLonly
kheiss-uwzoo Apr 20, 2026
a494070
Updating file for helm removal and reintroduction of docker compose
kheiss-uwzoo Apr 21, 2026
34f873c
update file per Greptile
kheiss-uwzoo Apr 21, 2026
daed3bc
update file per Greptile review
kheiss-uwzoo Apr 21, 2026
5177ce5
Merge branch 'main' into kheiss/NRLonly
kheiss-uwzoo Apr 21, 2026
4393480
Removed superfluous note from all touched files
kheiss-uwzoo Apr 21, 2026
d4b50db
update extraction/images/overview-extraction.png
kheiss-uwzoo Apr 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions docs/docs/extraction/agentic-retrieval-concept.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Agentic retrieval (concept)

!!! note

This documentation describes NeMo Retriever Library.


Agentic retrieval means **iterative, tool-driven** retrieval: an agent plans steps, issues searches, may refine filters, and optionally reranks until it has enough context to answer.

NeMo Retriever Library focuses on document ingestion, embeddings, vector stores, hybrid search, and reranking. Orchestration frameworks call these building blocks from your application.
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/benchmarking.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# NeMo Retriever Library integration testing framework

!!! note

This documentation describes NeMo Retriever Library.


A configurable, dataset-agnostic testing framework for end-to-end validation of NeMo Retriever Library pipelines. This framework uses structured YAML configuration for type safety, validation, and parameter management.

## Dataset Prerequisites
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/choose-your-path.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Choose your path

!!! note

This documentation describes NeMo Retriever Library.


Use this page to pick documentation and deployment options that match your goal.

## I want to run locally or embed the library
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/chunking.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Split Documents

!!! note

This documentation describes NeMo Retriever Library.


Splitting, also known as chunking, breaks large documents or text into smaller, manageable sections to improve retrieval efficiency.
After chunking, only the most relevant pieces of information are retrieved for a given query.
Chunking also prevents text from exceeding the context window of the embedding model.
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/cli-reference.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# CLI Reference

!!! note

This documentation describes NeMo Retriever Library.


After you install the Python dependencies, you can use the [NeMo Retriever Library](overview.md) command line interface (CLI).
To use the CLI, use the `nemo-retriever` command.

Expand Down
6 changes: 1 addition & 5 deletions docs/docs/extraction/concepts.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Concepts

!!! note

This documentation describes NeMo Retriever Library.


These terms appear throughout NeMo Retriever Library documentation.

## Job
Expand All @@ -26,6 +21,7 @@ Optionally, the library can compute **embeddings** for extracted content and sto
## Deployment modes

- **Library mode** — Run without the full container stack where appropriate ([quickstart](quickstart-library-mode.md)).
- **Docker Compose (self-hosted)** — [Container stack quickstart](quickstart-guide.md) for running the full microservices pipeline locally.
- **Helm / Kubernetes** — [Helm-based deployment](https://github.com/NVIDIA/NeMo-Retriever/blob/main/helm/README.md) for cluster operations.
- **Notebooks** — [Jupyter examples](notebooks.md) for experimentation and RAG demos.

Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/content-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,6 @@ The definitions used in this documentation are the following:

Metadata can be extracted from a source or content, or generated by using models, heuristics, or other methods.

!!! note

This documentation describes NeMo Retriever Library.



## Source File Metadata

Expand Down
147 changes: 143 additions & 4 deletions docs/docs/extraction/contributing.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,148 @@
# Contributing to NeMo Retriever Library

!!! note
External contributions will be welcome soon, and they are greatly appreciated. For repository policy, coding standards, and the contribution process, refer to **[Contributing to NeMo Retriever](https://github.com/NVIDIA/NeMo-Retriever/blob/main/CONTRIBUTING.md)** on GitHub.

This documentation describes NeMo Retriever Library.
The sections below describe how to configure your machine and Git remotes so you can work on documentation (or code) against **[NVIDIA/NeMo-Retriever](https://github.com/NVIDIA/NeMo-Retriever)** using a fork and a separate publishing clone.

---

External contributions will be welcome soon, and they are greatly appreciated!
For more information, refer to [Contributing to NeMo Retriever](https://github.com/NVIDIA/NeMo-Retriever/blob/main/CONTRIBUTING.md).
## Set up your writing and development environment

### SSH authentication (one time for each computer)

1. **Create an SSH key** on your computer. Follow steps 1–3 in [Generating a new SSH key and adding it to the ssh-agent](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent). (You only need the key-generation steps; you can skip configuring ssh-agent if your organization prefers not to use it.)

2. **Add the public key to GitHub** using [Adding a new SSH key to your GitHub account](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account).

### Commit signing for GitHub (one time for each computer)

1. **Create a GPG key** following [Generating a new GPG key](https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key).

2. **Tell Git which key to use:**

```bash
git config --global user.signingkey YOUR_KEY_ID
```

3. **Sign every commit by default (recommended if your org requires signed commits):**

```bash
git config --global commit.gpgsign true
```

4. **Optional — sign a single commit:**

```bash
git commit -S -m "your message"
```

or

```bash
git commit --gpg-sign -m "your message"
```

5. **Optional — skip signing for one commit:**

```bash
git commit --no-gpg-sign -m "Unsigned commit"
```

---

## Set up your writing and development clone (fork)

You do day-to-day work in a clone of **your fork**, with `upstream` pointing at NVIDIA’s repo.

1. **Get access** to **[https://github.com/NVIDIA/NeMo-Retriever](https://github.com/NVIDIA/NeMo-Retriever)** (and permission to fork it, per your organization).

2. **Create a fork**

- Open the **Fork** menu, then choose **Create a new fork**.
- Accept the default repository name (`NeMo-Retriever`) unless your org requires another name.
- **Deselect** “Copy the main branch only” if you need other branches locally; you can recover later with `git fetch upstream --tags` (see below).
- Click **Create fork**.

3. **Clone the fork** onto your machine:

- Pick a parent folder, for example `C:\_work\NeMo-Retriever-fork` or `C:\_repositories\NeMo-Retriever-fork`.
- Open a terminal in that folder, then clone:

```bash
git clone git@github.com:<your-github-username>/NeMo-Retriever.git
```

- Enter the repository directory (default folder name is usually `NeMo-Retriever`):

```bash
cd NeMo-Retriever
```

- **Add NVIDIA’s repo as `upstream`:**

```bash
git remote add upstream https://github.com/NVIDIA/NeMo-Retriever.git
```

- If the fork was created with **only** the default branch, fetch the rest from upstream when needed:

```bash
git fetch upstream --tags
```

Confirm remotes:

```bash
git remote -v
```

You should see `origin` pointing at your fork and `upstream` at `NVIDIA/NeMo-Retriever`.

---

## Set up your publishing clone (canonical repo)

Some workflows use a **second clone** of the **official** repository (not your fork) for publishing or internal automation.

1. Choose a **different** directory from your fork clone. On Windows, your team may require this clone inside **WSL**; follow internal guidance.

2. Clone NVIDIA’s repository:

```bash
git clone git@github.com:NVIDIA/NeMo-Retriever.git
```

After setup you typically have **two** working copies: one from your fork (with `upstream` configured) and one straight from `NVIDIA/NeMo-Retriever`.

---

## Make a documentation change

### Target branches

Decide where the change lands:

- **`main` only**
- A **release** branch only (for example `release/25.9.0`)
- **Both** `main` and a release branch — commit to `main` first, then [cherry-pick](https://git-scm.com/docs/git-cherry-pick) the commits onto the release branch.

### Keep your fork and local clone in sync with NVIDIA

From your **fork** clone, on each branch you care about (example uses `main`; substitute `develop` or a release branch as needed):

```bash
git checkout main
git fetch upstream
git merge upstream/main
git push origin main
```

Use a **space** between the remote name and the branch: `git push origin main`. (`git push origin/main` is invalid and Git will report an error.)

Repeat `checkout` / `fetch` / `merge` / `push` for every branch you maintain (`main`, `develop`, release branches, and so on).

---

## Related

- [Contributing to NeMo Retriever](https://github.com/NVIDIA/NeMo-Retriever/blob/main/CONTRIBUTING.md) — authoritative contribution guidelines in the repository
5 changes: 0 additions & 5 deletions docs/docs/extraction/custom-metadata.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Use Custom Metadata to Filter Search Results

!!! note

This documentation describes NeMo Retriever Library.


You can upload custom metadata for documents during ingestion.
By uploading custom metadata you can attach additional information to documents,
and use it for filtering results during retrieval operations.
Expand Down
4 changes: 0 additions & 4 deletions docs/docs/extraction/data-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@

Use this documentation to learn how [NeMo Retriever Library](overview.md) handles and uploads data.

!!! note

This documentation describes NeMo Retriever Library.


## Overview

Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/embedding-nims-models.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Embedding NIMs and models

!!! note

This documentation describes NeMo Retriever Library.


Embeddings turn extracted text and multimodal content into vectors for semantic search. NeMo Retriever Library integrates with NVIDIA NIM microservices for embedding. Model names and compatibility vary by release; refer to the [Support matrix](support-matrix.md) and the [NVIDIA NIM catalog](https://build.nvidia.com/).

For multimodal or VLM embeddings, refer to [Multimodal embeddings (VLM)](vlm-embed.md).
Expand Down
4 changes: 0 additions & 4 deletions docs/docs/extraction/environment-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,6 @@
The following are the environment variables that you can use to configure [NeMo Retriever Library](overview.md).
You can specify these in your .env file or directly in your environment.

!!! note

This documentation describes NeMo Retriever Library.


## General Environment Variables

Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/evaluate-on-your-data.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Evaluate on your data

!!! note

This documentation describes NeMo Retriever Library.


Retrieval and ingestion performance **depend on your documents**, hardware, and pipeline settings. Use the following when measuring quality and throughput on **your** datasets.

## Benchmarking and baselines
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/extraction-charts-infographics.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Charts and infographics

!!! note

This documentation describes NeMo Retriever Library.


Charts and infographic regions are classified as graphic elements and processed with the corresponding NVIDIA NIM workflows (for example, **yolox-graphic-elements** in current releases). Outputs use the same metadata schema as other extracted objects.

**Related**
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/extraction-ocr-scanned.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# OCR and scanned documents

!!! note

This documentation describes NeMo Retriever Library.


Scanned PDFs and image-only pages rely on OCR and hybrid paths that combine native text extraction with OCR when needed. For extract methods such as `ocr` and `pdfium_hybrid`, refer to the [Python API reference](python-api-reference.md).

**Related**
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/extraction-tables.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Tables

!!! note

This documentation describes NeMo Retriever Library.


NeMo Retriever Library detects tables as structured page elements, processes them through the appropriate NIMs, and exports formats suitable for downstream RAG (including Markdown-oriented representations where configured). Availability depends on pipeline and model configuration; refer to the [Support matrix](support-matrix.md).

**Related**
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,6 @@

This documentation contains the Frequently Asked Questions (FAQ) for [NeMo Retriever Library](overview.md).

!!! note

This documentation describes NeMo Retriever Library.



## What if I already have a retrieval pipeline? Can I just use NeMo Retriever Library?

Expand Down
6 changes: 1 addition & 5 deletions docs/docs/extraction/getting-started-about.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# About getting started

!!! note

This documentation describes NeMo Retriever Library.


This section walks you from **access and prerequisites** through **first deployment** and **hands-on notebooks**.

Typical order:
Expand All @@ -13,6 +8,7 @@ Typical order:
2. Confirm [Prerequisites](prerequisites.md) and the [Support matrix](support-matrix.md) for your OS, GPU, and software stack.
3. Deploy using one of:
- [Library mode](quickstart-library-mode.md) (without full stack containers where appropriate)
- [Docker Compose (self-hosted)](quickstart-guide.md) for the reference microservices stack in containers
- [Helm chart](https://github.com/NVIDIA/NeMo-Retriever/blob/main/helm/README.md) for Kubernetes environments
4. Explore [Jupyter Notebooks](notebooks.md) for end-to-end examples.

Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/hosted-nims-when-to-use.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# When to use NVIDIA-hosted NIMs

!!! note

This documentation describes NeMo Retriever Library.


[NVIDIA-hosted NIMs](https://build.nvidia.com/) run inference on NVIDIA-managed infrastructure. You call models with API keys (refer to [Get your API key](ngc-api-key.md)) without operating GPU nodes yourself.

Consider hosted NIMs when:
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/how-to-use-this-documentation.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# How to use this documentation

!!! note

This documentation describes NeMo Retriever Library.


Use the sections below as a reading order that matches how you run NeMo Retriever Library.

## NeMo Retriever Library (local or embedded)
Expand Down
5 changes: 0 additions & 5 deletions docs/docs/extraction/image-captioning.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# Image captioning

!!! note

This documentation describes NeMo Retriever Library.


Image captioning generates natural-language descriptions for unstructured image content. Retrieval can then use text embeddings over captions and visual embeddings where you configure them.

**Related**
Expand Down
Binary file modified docs/docs/extraction/images/overview-extraction.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading