Skip to content

Kheiss/up ovr#1861

Open
kheiss-uwzoo wants to merge 105 commits intomainfrom
kheiss/up-ovr
Open

Kheiss/up ovr#1861
kheiss-uwzoo wants to merge 105 commits intomainfrom
kheiss/up-ovr

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

NVIDIA NeMo Retriever Library is a scalable, performance-oriented framework for document content and metadata extraction. It supports both NVIDIA NIM microservices and a wide range of models to find, contextualize, and extract text, tables, charts, and infographics for use in downstream generative and retrieval-augmented applications.

kheiss-uwzoo and others added 30 commits February 19, 2026 10:36
Update all hardcoded version references from 26.1.2 to 26.3.0-RC1
across helm charts, docker-compose, FastAPI, docs, and examples.

Made-with: Cursor
Co-authored-by: Kurt Heiss <kheiss@nvidia.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
…ing long VLM captioning

Large PDFs with VLM captioning enabled can take 2-22+ hours depending on hardware.
The previous defaults (STATE_TTL=7200s, RESULT_DATA_TTL=3600s) caused job state to
expire mid-processing, resulting in 404 "Job ID not found or state has expired" errors
even though the pipeline completed successfully.

Raises both defaults to 172800s (48 hours), providing sufficient headroom for all
observed workloads. Users can still override via RESULT_DATA_TTL_SECONDS and
STATE_TTL_SECONDS environment variables.

Fixes: Customer bug 5914605

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners April 15, 2026 20:54
@kheiss-uwzoo kheiss-uwzoo requested a review from jioffe502 April 15, 2026 20:54
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 15, 2026

Greptile Summary

The introductory paragraph of docs/docs/extraction/overview.md is updated to describe NeMo Retriever as a "framework" supporting both NIM microservices and a broader range of models, replacing the narrower "microservice" framing.

  • Line 25 still reads "NeMo Retriever Library is a microservice service that does the following:" — both redundant and now inconsistent with the "framework" framing introduced here.

Confidence Score: 5/5

Documentation-only change; safe to merge after addressing the minor terminology inconsistency on line 25.

The only finding is a P2 style inconsistency ("microservice service" vs "framework") introduced on an unchanged line. No code is touched, no logic is affected, and all remaining feedback is non-blocking.

docs/docs/extraction/overview.md — line 25 terminology should be updated to match the new "framework" framing.

Important Files Changed

Filename Overview
docs/docs/extraction/overview.md Intro paragraph rewritten from "microservice" to "framework" framing with broader model support mentioned; line 25 still uses the old "microservice service" terminology, creating a minor internal inconsistency.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Document Input\npdf, docx, pptx, images, video, audio...] --> B[NeMo Retriever Library]
    B --> C[Page Splitting & Classification\ntext, tables, charts, infographics]
    C --> D[Extraction via\nNIM microservices or local models]
    D --> E[OCR & Contextualization]
    E --> F{Optional}
    F --> G[Embedding Computation]
    F --> H[Vector DB Storage\nLanceDB / Milvus]
    G --> H
    H --> I[JSON Metadata Output\nfor RAG / Generative Apps]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/docs/extraction/overview.md
Line: 25

Comment:
**Terminology inconsistency: "microservice service" vs "framework"**

The intro paragraph was updated from "microservice" to "framework", but line 25 still reads "NeMo Retriever Library is a microservice service that does the following:" — contradicting the new framing and also containing the redundant phrase "microservice service."

```suggestion
NeMo Retriever Library is a framework that does the following:
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "Merge branch 'main' into kheiss/up-ovr" | Re-trigger Greptile

Comment thread docs/docs/extraction/overview.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants