Skip to content

Removed "This documentation describes NeMo Retriever Library" from files#1895

Open
kheiss-uwzoo wants to merge 122 commits intomainfrom
kheiss/NRLonly
Open

Removed "This documentation describes NeMo Retriever Library" from files#1895
kheiss-uwzoo wants to merge 122 commits intomainfrom
kheiss/NRLonly

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

& Change

Removed this MkDocs admonition block everywhere it appeared:

!!! note
This documentation describes NeMo Retriever Library.

Scope

61 files under docs/docs/extraction/*.md were updated (including agentic-retrieval-concept.md).
prerequisites.md did not contain that block, so it was unchanged.
A quick follow-up pass removed an extra blank line after the first # title when it became \n\n\n after the removal (so pages like agentic-retrieval-concept.md and concepts.md keep a normal single blank line under the H1).

kheiss-uwzoo and others added 30 commits February 19, 2026 10:36
Update all hardcoded version references from 26.1.2 to 26.3.0-RC1
across helm charts, docker-compose, FastAPI, docs, and examples.

Made-with: Cursor
Co-authored-by: Kurt Heiss <kheiss@nvidia.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
…ing long VLM captioning

Large PDFs with VLM captioning enabled can take 2-22+ hours depending on hardware.
The previous defaults (STATE_TTL=7200s, RESULT_DATA_TTL=3600s) caused job state to
expire mid-processing, resulting in 404 "Job ID not found or state has expired" errors
even though the pipeline completed successfully.

Raises both defaults to 172800s (48 hours), providing sufficient headroom for all
observed workloads. Users can still override via RESULT_DATA_TTL_SECONDS and
STATE_TTL_SECONDS environment variables.

Fixes: Customer bug 5914605

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kheiss-uwzoo and others added 21 commits April 15, 2026 11:44
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Fix mkdocs.nrl-github-pages.yml so Workflow: Document ingestion targets workflow-document-ingestion.md (not the V2 API guide); add workflow and metadata schema pages.

- Add workflow-document-ingestion.md, workflow pages, and multimodal-metadata-schema.md; update workflow cross-links.

- Align quick start links and GitHub URLs with NeMo-Retriever; use Python and CLI Quick Start Guide labels; refresh quickstart-guide examples and MIG references.

- Insert the standard NVIDIA Ingest (nv-ingest) rename note after the H1 on every extraction topic page for consistent messaging.

Made-with: Cursor
- Point environment and troubleshooting links to environment-config.md and troubleshoot.md.

- In user-defined-functions, link to content-metadata, multimodal-metadata-schema, nimclient.md, and GitHub default_pipeline.yaml.

- Add explicit HTML anchors in content-metadata.md so schema table fragment links resolve without macro/attr_list issues.

Made-with: Cursor
- Replace "see" with "refer to" for consistency in linking to the Support matrix and Benchmarking documentation.
- Update cross-references and table notes to use refer to / Refer to for consistency.

- Rename 'See also' to 'Related topics' in key-features.md.

- Remove temporary docs/scripts/replace_see_with_refer_to.py helper.

Made-with: Cursor
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners April 21, 2026 19:52
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label Apr 21, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR removes a redundant !!! note admonition block ("This documentation describes NeMo Retriever Library.") from 62 Markdown files under docs/docs/extraction/. The mechanical removal is correct across all files, and the follow-up blank-line cleanup was applied to most pages. A handful of files (content-metadata.md, data-store.md, environment-config.md, faq.md) still have a leftover double blank line where the block sat mid-page rather than immediately under the H1.

Confidence Score: 5/5

Safe to merge — all changes are documentation-only with no code impact.

Every finding is P2 (cosmetic blank lines). No code, schema, or logic is touched. The only substantive content change beyond the admonition removal is the expanded contributing.md, which was already flagged in a prior review thread.

content-metadata.md, data-store.md, environment-config.md, and faq.md have residual double blank lines; contributing.md contains a large out-of-scope addition already noted by a prior reviewer.

Important Files Changed

Filename Overview
docs/docs/extraction/agentic-retrieval-concept.md Admonition block removed cleanly; single blank line under H1 preserved correctly.
docs/docs/extraction/contributing.md Admonition removed but 139 lines of new SSH/GPG setup and fork workflow content added — far beyond stated PR scope (already flagged in a prior review thread).
docs/docs/extraction/content-metadata.md Admonition removed; one trailing blank line remains between the preceding paragraph and the next section heading, resulting in a double blank line.
docs/docs/extraction/faq.md Admonition removed; double blank line left before first section heading.
docs/docs/extraction/data-store.md Admonition removed; one blank line remains before the next section, leaving a double blank line.
docs/docs/extraction/environment-config.md Admonition removed; one blank line remains before the next section, leaving a double blank line.
docs/docs/extraction/prerequisites.md Admonition removed correctly (contrary to PR description which stated this file was unchanged).
docs/docs/extraction/concepts.md Admonition removed and a new Docker Compose deployment mode bullet added to the Deployment modes section.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[62 Markdown files\nunder docs/docs/extraction/] --> B{Admonition block\npresent?}
    B -- Yes --> C[Remove\n'!!! note' block]
    B -- No --> D[No change\nprerequisites.md\nnote: PR desc incorrect]
    C --> E{Block was\ndirectly under H1?}
    E -- Yes --> F[Remove extra blank line\n✅ Applied correctly\nmost files]
    E -- No / mid-page --> G[One blank line remains\n⚠ Double blank line\ncontent-metadata.md\ndata-store.md\nenvironment-config.md\nfaq.md]
    F --> H[Clean page]
    G --> H
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/docs/extraction/content-metadata.md
Line: 10-12

Comment:
**Leftover double blank line after admonition removal**

Removing the 5-line admonition block left an extra blank line here, so there are now two consecutive blank lines before the `## Source File Metadata` heading. The same pattern appears in `data-store.md`, `environment-config.md`, and `faq.md`. A single blank line between a paragraph and a heading is the standard MkDocs/Markdown convention.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "update extraction/images/overview-extrac..." | Re-trigger Greptile

Comment on lines 1 to +10
# Contributing to NeMo Retriever Library

!!! note
External contributions will be welcome soon, and they are greatly appreciated. For repository policy, coding standards, and the contribution process, refer to **[Contributing to NeMo Retriever](https://github.com/NVIDIA/NeMo-Retriever/blob/main/CONTRIBUTING.md)** on GitHub.

This documentation describes NeMo Retriever Library.
The sections below describe how to configure your machine and Git remotes so you can work on documentation (or code) against **[NVIDIA/NeMo-Retriever](https://github.com/NVIDIA/NeMo-Retriever)** using a fork and a separate publishing clone.

---

External contributions will be welcome soon, and they are greatly appreciated!
For more information, refer to [Contributing to NeMo Retriever](https://github.com/NVIDIA/NeMo-Retriever/blob/main/CONTRIBUTING.md).
## Set up your writing and development environment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Scope far exceeds stated PR intent

This file went from 8 lines to 148 lines, gaining a full SSH/GPG setup guide, fork workflow, and publishing-clone instructions. The PR description only mentions removing the !!! note admonition block. Reviewers scanning for a simple admonition-removal PR could easily miss this new content — it warrants a dedicated review pass to verify accuracy and alignment with internal onboarding docs.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/contributing.md
Line: 1-10

Comment:
**Scope far exceeds stated PR intent**

This file went from 8 lines to 148 lines, gaining a full SSH/GPG setup guide, fork workflow, and publishing-clone instructions. The PR description only mentions removing the `!!! note` admonition block. Reviewers scanning for a simple admonition-removal PR could easily miss this new content — it warrants a dedicated review pass to verify accuracy and alignment with internal onboarding docs.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants