Skip to content

docs: remove docs code reference#674

Open
andreatgretel wants to merge 2 commits into
mainfrom
andreatgretel/docs/remove-code-reference-docs
Open

docs: remove docs code reference#674
andreatgretel wants to merge 2 commits into
mainfrom
andreatgretel/docs/remove-code-reference-docs

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel commented May 18, 2026

📋 Summary

Removes the generated code reference docs from both MkDocs and Fern so the docs no longer publish or link to the retired API reference surface. This also removes the generation plumbing and adds publish-branch cleanup for archived Fern versions so stale reference pages do not survive in docs-website archives.

🔗 Related Issue

N/A

🔄 Changes

🗑️ Removed

  • Deleted the MkDocs docs/code_reference/** pages, Fern fern/versions/latest/pages/code_reference/** pages, mkdocstrings CSS, and py2fern normalization script.
  • Removed code reference nav/config, dependency entries, Make targets, workflow env, and ignored Fern artifacts.
  • Removed stale reference links from MkDocs/Fern concept and plugin docs, plus contributor and agent docs.

🔧 Changed

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • fern/scripts/fern-published-branch.py - Archived Fern versions copy cleaned current versions of the affected concept/plugin pages during publish sync so stale reference links are removed from historical docs.

🧪 Testing

  • .venv/bin/ruff check --fix .
  • .venv/bin/ruff format .
  • make check-fern-docs passes with 0 errors and 2 existing warnings
  • .venv/bin/mkdocs build passes with existing docs warnings
  • git diff --check
  • Source keyword sweep for retired reference strings
  • docs-website dry-run sync plus make check-fern-docs
  • Claude review and follow-up found no actionable findings
  • make test passes (N/A - docs-only; not run)
  • Unit tests added/updated (N/A - no testable logic)
  • E2E tests added/updated (N/A - docs-only)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (N/A - no architecture changes)

@andreatgretel andreatgretel marked this pull request as ready for review May 18, 2026 21:37
@andreatgretel andreatgretel requested a review from a team as a code owner May 18, 2026 21:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

MkDocs preview: https://49f2a817.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-674.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@github-actions
Copy link
Copy Markdown
Contributor

PR #674 Review — docs: remove docs code reference

Summary

This is a docs-only PR (79 additions / 1690 deletions) that removes the
generated API reference surface from both publishing pipelines:

  • MkDocs: deletes docs/code_reference/**, drops mkdocstrings /
    mkdocstrings-python from pyproject.toml and uv.lock, removes
    the mkdocstrings CSS, and trims mkdocs.yml nav.
  • Fern: deletes fern/versions/latest/pages/code_reference/**, removes
    the Code Reference nav section from fern/versions/latest.yml, drops
    the libraries: block and all /code-reference/* redirect rules from
    fern/docs.yml, removes py2fern from deps, and deletes
    fern/scripts/normalize-py2fern-indexes.py.
  • Plumbing: removes the generate-fern-api-reference[-native] Make
    targets, the DOCS_PY2FERN workflow env, and the fern/code-reference/
    gitignore entry.
  • Concept/plugin pages: rewrites stale /code-reference/... links into
    short prose mentions (columns, custom_columns, model-configs,
    person_sampling, security, tool_use_and_mcp, validators,
    plugins/example, plugins/overview).
  • Agent docs: updates .agents/, CONTRIBUTING.md, fern/AGENTS.md,
    and fern/README.md so they no longer reference the retired surface.

The only behavioral change is in fern/scripts/fern-published-branch.py,
which now strips the Code Reference nav and code_reference page tree
from archived Fern versions during publish sync, and refreshes the
affected concept/plugin pages in those archives so the inline links
stripped on latest also disappear from historical docs.

Findings

Correctness

  • remove_retired_reference_archive flow looks sound. It runs after
    clear_published_tree + source copy + merge_preserved_versions, so
    the published tree at this point is: source latest (no
    code_reference) + preserved v* versions (which may still have
    code_reference). The script (a) strips the nav block from every
    v*.yml, (b) deletes any */pages/code_reference directory under
    versions/, and (c) overlays the cleaned-on-latest versions of the
    9 affected concept/plugin pages into each v*/pages/. That's a
    consistent end state.
  • remove_navigation_section shares the same end-of-block heuristic
    as extract_/replace_navigation_section
    (next line that
    startswith(" - ") and is non-empty). For a section that is last
    in the file, end falls through to len(lines), which is the desired
    behavior. ✅
  • glob("v*/pages") is intentionally narrow — it only matches
    version directories whose names start with v, matching the
    REDIRECT_VERSION_RE convention elsewhere in this file. If a future
    archive uses an older-versions/... shape, page refreshes there would
    be skipped silently. Not a regression for this PR; worth noting if
    archive naming ever broadens.
  • glob(f"*/pages/{RETIRED_REFERENCE_DIR}") would also match
    latest/pages/code_reference, but latest no longer contains that
    directory after the source copy, so the broader glob is harmless and
    keeps the cleanup robust against a stray re-add.
  • Redirect removal is a deliberate trade-off. All the
    /nemo/datadesigner/code_reference/*/code-reference/* redirects
    in fern/docs.yml are deleted. Users following indexed search
    results to /code_reference/... will now get 404s instead of being
    redirected to the (nonexistent) new code-reference pages. Since the
    destination is also gone, redirecting wouldn't help — but you may want
    to consider a single redirect of the code_reference root to
    /concepts/columns or the API overview page. Not blocking; a product
    call.
  • fern-published-branch.py lines 18-19 use split string literals
    ("Code " + "Reference", "code" + "_reference").
    This is a
    workaround for the "source keyword sweep for retired reference
    strings" check listed in the PR's testing checklist. It works, but
    it's the kind of cleverness that future maintainers will revert
    without realizing why. A one-line # noqa-style comment explaining
    the sweep would prevent that — e.g. # Split to satisfy the retired- reference keyword sweep; do not collapse. Optional.

Conventions

  • Matches the surrounding style of fern-published-branch.py:
    module-level constants, from __future__ import annotations, modern
    type annotations (list[str], re.Pattern[str]), no relative
    imports, PublishedBranchError for failures. ✅
  • Concept-page rewrites preserve voice and Markdown link conventions
    used elsewhere in fern/versions/latest/pages/concepts/.
  • pyproject.toml and uv.lock are kept in sync; transitive removal
    of astroid (mkdocstrings → griffe → astroid) is correctly reflected
    in the lockfile.
  • Makefile .PHONY list is kept in sync with the deleted targets.
  • Typo in the existing source line at
    fern/versions/latest/pages/concepts/person_sampling.mdx:43
    ("For mor details") is replaced rather than corrected — fine for this
    PR, but a free fix you could land alongside.

Performance

  • No runtime/library performance impact. Loss of the docs build step
    for the API reference will modestly speed up make check-fern-docs
    and the docs-preview workflow.

Test coverage

  • Docs-only; no logic tests are required for the deletions.
  • fern-published-branch.py has no unit tests in this repo (pre-existing
    state). The new remove_retired_reference_archive is therefore
    exercised only by the publish dry-run noted in the PR checklist
    ("docs-website dry-run sync plus make check-fern-docs"). Adding a
    small pytest around the YAML mutation helpers would be a low-cost
    follow-up but is out of scope here.
  • The PR explicitly verifies make check-fern-docs (0 errors, 2
    pre-existing warnings) and mkdocs build. Adequate for a docs PR.

Security

  • No secrets, no new network calls, no executable changes outside the
    publish-sync script. shutil.rmtree is constrained to paths inside
    published_root / "fern" / "versions" (the script's own temp
    workspace), so no risk of overreach.
  • No prompt-injection / external-content concerns.

Risks / things to double-check after merge

  1. Inbound links from external sources (Google, blog posts,
    internal NVIDIA wiki) pointing at /code_reference/... or
    /code-reference/... will 404. If telemetry shows non-trivial hits,
    consider a single catch-all redirect to a relevant concept page.
  2. docs-website archive cleanup runs only at next publish. Until
    then, archived versions on the live site still surface broken
    /code-reference/... links from their concept pages. The PR's
    approach (refresh from latest on publish) handles this on the
    next run; just be aware the gap is one publish cycle.
  3. fern/AGENTS.md still references code-reference/ in some
    commentary (worth a final grep before merging).

Verdict

Approve / non-blocking comments only. This is a clean, well-scoped
removal of a retired surface. The sole logic change in
fern-published-branch.py is straightforward and consistent with the
existing nav-mutation helpers. Two optional follow-ups: (a) add a
brief comment explaining the split string literals, and (b) consider a
single catch-all redirect to soften the 404 cliff for external
inbound links. Neither blocks merge.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR removes the generated code-reference docs surface from both MkDocs and Fern, strips the generation tooling (py2fern, mkdocstrings), and updates the publish-sync script to clean stale reference pages from archived Fern versions.

  • Deletes all docs/code_reference/** and fern/versions/latest/pages/code_reference/** pages, removes the mkdocstrings plugin and CSS, and removes the py2fern and mkdocstrings-python dependencies from pyproject.toml.
  • Updates fern/scripts/fern-published-branch.py to replace sync_code_reference_archive (which kept archived versions updated) with remove_retired_reference_archive, which strips the retired Code Reference nav section, removes the code_reference directory tree, and copies the now-cleaned concept/plugin pages into each archived version.
  • Cleans up cross-links in concept and plugin docs that pointed to the retired reference surface, and removes the Code Reference section from fern/docs.yml, fern/versions/latest.yml, and mkdocs.yml.

Confidence Score: 5/5

This is a docs-only cleanup with no runtime code changes; safe to merge.

All changes are documentation removals and publish-script refactoring. The new remove_retired_reference_archive logic in fern-published-branch.py mirrors the section-boundary detection used by the pre-existing extract_navigation_section and replace_navigation_section helpers, so the approach is consistent and well-tested by the existing workflow. No application logic, APIs, or data paths are touched.

No files require special attention.

Important Files Changed

Filename Overview
fern/scripts/fern-published-branch.py Replaces sync_code_reference_archive with remove_retired_reference_archive; adds remove_navigation_section helper using the same section-boundary logic as the existing extract/replace helpers. Logic is correct.
fern/versions/latest.yml Removes the Code Reference section (70 lines) from the latest version nav; no remaining references to deleted pages.
fern/docs.yml Removes the libraries config block and all code_reference redirects; also updates the comment referencing mkdocstrings. Clean removal.
mkdocs.yml Removes Code Reference nav section, mkdocstrings plugin config, watch paths for source packages, and mkdocstrings.css from extra_css. Consistent cleanup.
Makefile Removes generate-fern-api-reference targets, py2fern variables, and updates prepare-fern-docs to no longer depend on the removed targets.
pyproject.toml Removes mkdocstrings-python, mkdocstrings, and py2fern from the docs dependency group.
.github/workflows/docs-preview.yml Removes DOCS_PY2FERN env var from the check-fern-docs workflow step.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[sync_source called] --> B[Preserve archived versions to tmpdir]
    B --> C[clear_published_tree]
    C --> D[copytree source → published root]
    D --> E[merge_preserved_versions\nrestore archived v*.yml + pages/]
    E --> F[remove_retired_reference_archive]
    F --> G[For each archived v*.yml\nremove_navigation_section\nCode Reference]
    G --> H[glob versions_dir/*/pages/code_reference\nshutil.rmtree each]
    H --> I[For each v*/pages dir\ncopy cleaned concept/plugin pages\nfrom latest source]
    I --> J[materialize_version_nav_pages]
    J --> K[restore_versions_block]
    K --> L[validate_redirect_targets]
    L --> M[write_publish_metadata]
Loading

Reviews (2): Last reviewed commit: "docs: address generated reference review" | Re-trigger Greptile

@johnnygreco
Copy link
Copy Markdown
Contributor

Thanks for putting this together, @andreatgretel!

Summary

This removes the MkDocs/Fern generated code reference surface, its generation plumbing, dependency entries, nav, redirects, and published-archive cleanup path. The implementation matches the PR description: source-tree sweeps are clean for the old paths/tools, and a dry-run of fern-published-branch.py sync-source against the current docs-website archive removed the retired archive nav/pages cleanly.

Findings

Warnings — Worth addressing

.agents/agents/docs-searcher.md:66 — Last generated-reference breadcrumb remains

  • What: The docs search agent now says to "Prioritize user guides and examples over generated reference material when both exist." Since this PR removes the generated reference material entirely, this leaves a conceptual reference to the retired surface.
  • Why: It is not a broken public docs link, but it weakens the clean-removal story for agent-facing docs and can send future agents looking for a docs category that no longer exists.
  • Suggestion: Remove this bullet, or rephrase it around the docs that actually remain, e.g. Prioritize user guides, concepts, tutorials, and recipes according to the user's task.

What Looks Good

  • The main removal is broad and tidy: deleted pages, nav entries, Make targets, workflow env, generated-artifact ignore rules, docs dependencies, and lockfile entries are all covered.
  • The Fern published-branch cleanup is doing the important archive work: in a local dry-run against docs-website, the stale versioned code_reference directories and nav sections were removed, and the affected concept/plugin pages were refreshed.
  • The public docs link cleanup is consistent across MkDocs and Fern mirrors; the remaining code reference hits I found are generic code-symbol audit wording, not links to the removed docs section.

Verdict

Needs changes: please remove or reword the remaining generated-reference breadcrumb in .agents/agents/docs-searcher.md.


This review was generated by an AI assistant.

@andreatgretel
Copy link
Copy Markdown
Contributor Author

thanks for the careful review! fixed that last breadcrumb by rewording it around guides, concepts, tutorials, and recipes. pushed in 20555a7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants