Skip to content

BUG: Relative URLs broken by <base> tag in multi-page HTML output #834

@NicolasRouquette

Description

@NicolasRouquette

Summary

Two classes of URLs are incorrectly resolved in multi-page HTML output due to interactions with the <base> tag:

  1. Relative image URLs (from Markdown ![alt](path)) resolve from the site root instead of the page directory.
  2. Absolute internal asset URLs (/-verso-data/katex/...) are converted to page-relative paths like ../../-verso-data/..., which overshoot the root when combined with the <base> tag.

Reproduction

Setup

  • Verso v4.29.0, multi-page HTML output
  • A page at path Verification/Key-Theorems/ with <base href="./../../">
  • Markdown content with an image reference: ![diagram](graphs/foo.svg)
  • KaTeX feature enabled

Expected behavior

  • <img src="..."> resolves to {site-root}/Verification/Key-Theorems/graphs/foo.svg
  • <script src="..."> for KaTeX resolves to {site-root}/-verso-data/katex/katex.js

Actual behavior

  • <img src="graphs/foo.svg"> → browser resolves via <base> to {site-root}/graphs/foo.svg (404)
  • <script src="../../-verso-data/katex/katex.js"> → browser resolves via <base> to {two-levels-above-root}/-verso-data/katex/katex.js (404)

Browser console errors:

GET https://example.com/pages/-verso-data/katex/katex.css net::ERR_ABORTED 404
GET https://example.com/pages/-verso-data/katex/katex.js net::ERR_ABORTED 404
GET https://example.com/pages/-verso-data/katex/math.js net::ERR_ABORTED 404

Root cause

Bug 1: Relative image URLs

The page function in Html.lean generates <base href="./../../"> for nested pages. This makes the browser resolve all relative URLs from the site root, not from the page's directory. User-authored image references (from Markdown) are page-relative and need to be prefixed with the page path.

relativizeLinks was called with an empty path #[], so Html.relativize had no page context to work with. Even if the path were provided, rwTag did not adjust relative content URLs.

Bug 2: Absolute internal asset URLs

rwAttr converts absolute URLs like /-verso-data/katex/katex.js using path.relativize, which produces ../../-verso-data/katex/katex.js. But since <base href="./../../"> already navigates to the site root, the ../../ prefix causes the browser to overshoot, resolving relative to the parent of the site root.

The correct approach is simply stripping the leading / to produce -verso-data/katex/katex.js, which the <base> tag then resolves correctly from the site root. Note that the -verso-search/ assets in the page template already use this pattern (they are hardcoded without a leading /), so they work correctly.

Fix

The fix involves two files:

VersoManual/Html.leanrelativize function

  1. rwAttr: Replace path.relativize attr.snd with (attr.snd.drop 1).toString — just strip the / prefix instead of generating ../ sequences, since <base> already handles root navigation.

  2. rwTag: Before calling rwAttr, check for <img> tags with relative src paths and prefix them with the page path. This must happen before rwAttr to distinguish originally-relative paths (which need prefixing) from originally-absolute paths (which just need / stripped). Add isRelativeContentUrl helper to identify truly relative content URLs.

VersoManual.leanrelativizeLinks function

  1. relativizeLinks: Accept a Path parameter and pass it through to Html.relativize, so the page path is available for image URL adjustment. Update all three call sites (emitFindHtml, single-page emitter, emitPart) to pass the actual page path.

Before (v4.29.0)

-- Html.lean
rwAttr (attr : String × String) : ReaderT Path Id (String × String) := do
    if urlAttr attr.fst && "/".isPrefixOf attr.snd then
      let path := (← read)
      pure { attr with
        snd := path.relativize attr.snd
      }
    else
      pure attr
rwTag ... := do
    if tag == "base" then return none
    if attrs.any (·.1 == "data-verso-remote") then return none
    return some <| .tag tag (← attrs.mapM rwAttr) content

-- VersoManual.lean
def relativizeLinks (html : Html) : Html :=
    Html.relativize #[] html

After (fix)

-- Html.lean
rwAttr (attr : String × String) : ReaderT Path Id (String × String) := do
    if urlAttr attr.fst && "/".isPrefixOf attr.snd then
      pure { attr with snd := (attr.snd.drop 1).toString }
    else
      pure attr
rwTag ... := do
    if tag == "base" then return none
    if attrs.any (·.1 == "data-verso-remote") then return none
    let path := (← read)
    let attrs := if tag == "img" && path.size > 0 then
      attrs.map fun attr =>
        if attr.fst == "src" && isRelativeContentUrl attr.snd then
          let pathPrefix := String.join (path.toList.map (· ++ "/"))
          { attr with snd := pathPrefix ++ attr.snd }
        else attr
    else attrs
    let attrs ← attrs.mapM rwAttr
    return some <| .tag tag attrs content

-- VersoManual.lean
def relativizeLinks (path : Path) (html : Html) : Html :=
    Html.relativize path html

Verification

After the fix:

  • Image: <img src="Verification/Key-Theorems/graphs/foo.svg"><base> resolves to {site-root}/Verification/Key-Theorems/graphs/foo.svg
  • KaTeX: <script src="-verso-data/katex/katex.js"><base> resolves to {site-root}/-verso-data/katex/katex.js
  • Internal links (e.g., href="../../Overview/#overview") continue to work correctly ✓
  • CSS/JS assets (book.css, -verso-search/*.js) are unaffected (already root-relative) ✓

Branch with fix: fix/relative-img-urls

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions