Skip to content

fix: generation pipeline bugs — broken related-pages links, collapsed callouts, unrendered markdown#56

Merged
myakove merged 1 commit intomainfrom
fix/issue-55-generation-pipeline-bugs
Apr 18, 2026
Merged

fix: generation pipeline bugs — broken related-pages links, collapsed callouts, unrendered markdown#56
myakove merged 1 commit intomainfrom
fix/issue-55-generation-pipeline-bugs

Conversation

@myakove
Copy link
Copy Markdown
Contributor

@myakove myakove commented Apr 17, 2026

Fixes #55

Changes

Bug 1: Related Pages links all href="#" (High)

The HTML sanitizer in renderer.py blocked relative URLs like page-slug.html because they didn't match allowed schemes (http://, https://, #, /, mailto:).

Fix: Added _is_safe_url() helper that allows scheme-less relative URLs while blocking dangerous schemes (javascript:, data:, protocol-relative //evil.com, HTML entity-encoded colons like javascript:alert(1)).

Bug 2: Adjacent blockquote callouts collapse (Medium)

Consecutive > callouts (Note/Warning/Tip) merged into a single blockquote, losing severity styling.

Fix: Added separate_adjacent_callouts() in postprocess.py that detects adjacent callouts with different prefixes and inserts blank line separators. Handles both backtick and tilde code fences.

Bug 3: Markdown inside <details> not rendered (Medium)

The Python markdown library can't parse Markdown inside raw HTML blocks, so **bold** appeared literally.

Fix:

  • Updated all AI writing rules to forbid <details>/<summary> tags (shared _NO_HTML_DETAILS constant)
  • Added convert_details_to_headings() post-processor to convert any remaining <details> blocks to ## headings
  • Fence-aware: skips code blocks using _CODE_BLOCK_RE.split()

Wiring

Both post-processors applied in api/projects.py before render_site().

Files Changed

  • src/docsfy/renderer.py — URL sanitizer fix
  • src/docsfy/postprocess.py — Two new post-processing functions
  • src/docsfy/prompts.py — Updated AI writing rules
  • src/docsfy/api/projects.py — Pipeline wiring

Testing

  • All 376 tests pass
  • Reviewed by 3 internal reviewers + Cursor peer review

Summary by CodeRabbit

  • Bug Fixes

    • Prevented adjacent callout blockquotes from collapsing by inserting blank lines between them; this separation is applied during final site rendering.
    • Improved URL sanitization to reject unsafe or protocol‑relative URLs for safer links.
  • New Features

    • Support for both backtick (```) and tilde (~~~) fenced code blocks when processing content.
    • Automatic conversion of HTML details/summary into Markdown headings.
  • Documentation

    • Writing prompts updated to forbid HTML details so content uses regular headings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5777990a-ed31-443e-a89b-8566783fdae0

📥 Commits

Reviewing files that changed from the base of the PR and between c7ef2ae and 4d1c643.

📒 Files selected for processing (4)
  • src/docsfy/api/projects.py
  • src/docsfy/postprocess.py
  • src/docsfy/prompts.py
  • src/docsfy/renderer.py

Walkthrough

Adds two Markdown postprocessing steps (separate adjacent callouts; convert HTML <details> into headings) into the page-generation pipeline, expands code-fence recognition to include ~~~ fences, updates prompts to forbid HTML details, and tightens URL sanitization used during rendering.

Changes

Cohort / File(s) Summary
Pipeline Integration
src/docsfy/api/projects.py
Apply separate_adjacent_callouts() then convert_details_to_headings() to each page's Markdown immediately before render_site(...), replacing the previous pages input to the renderer.
Post-processing Utilities
src/docsfy/postprocess.py
Added separate_adjacent_callouts(md_text: str) to insert blank lines between adjacent callout blockquotes and convert_details_to_headings(md_text: str) to convert <details><summary> into ## headings; extended _CODE_BLOCK_RE to recognize ~~~ fences and added regexes for callouts/details.
Prompt Rules
src/docsfy/prompts.py
Added _NO_HTML_DETAILS prompt fragment and appended it to guide/recipe/reference/concept writing rules and incremental-update templates to forbid HTML <details>/<summary>.
Renderer URL Sanitization
src/docsfy/renderer.py
Introduced _is_safe_url() helper and unified quoted/unquoted `href

Sequence Diagram(s)

sequenceDiagram
    participant Generator as Page Generator
    participant PostProc as Post-processors
    participant Renderer as Site Renderer

    rect rgba(100, 150, 200, 0.5)
    Note over Generator,Renderer: Generation → Postprocess → Render pipeline
    end

    Generator->>Generator: generate & validate pages (slug→content)
    Generator->>PostProc: pass pages dict
    PostProc->>PostProc: for each page: separate_adjacent_callouts(content)
    PostProc->>PostProc: then convert_details_to_headings(content)
    PostProc->>Renderer: return transformed pages dict
    Renderer->>Renderer: sanitize URLs via _is_safe_url() and render HTML
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Suggested labels

size/XL

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically summarizes the main changes: fixing three critical generation pipeline bugs (broken related-pages links, collapsed callouts, unrendered markdown) across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-55-generation-pipeline-bugs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@myakove-bot
Copy link
Copy Markdown
Collaborator

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: Disabled for this repository
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: All label categories are enabled (default configuration)

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /automerge - Enable automatic merging when all requirements are met (maintainers and approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest python-module-install - Test Python package installation
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. Status Checks: All required status checks must pass
  3. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  4. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

  • myakove

Reviewers:

  • myakove
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
  • automerge
AI Features
  • Conventional Title: Mode: fix (claude/claude-opus-4-6[1m])
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/docsfy/renderer.py`:
- Around line 63-86: Normalize and validate the URL by decoding HTML entities
first, then stripping whitespace, and run all checks against this normalized
value in _is_safe_url; specifically call _html_mod.unescape on the raw url, then
.strip() the result, then (1) reject protocol-relative URLs by checking
normalized.startswith("//"), (2) allow absolute paths with
normalized.startswith("/") but not "//", (3) check allowed schemes using
case-insensitive comparison (lowercase the prefix or parsed.scheme) for
("http://","https://","mailto:","#"), and (4) use urllib.parse.urlsplit on the
normalized value to determine if a scheme exists and treat no-scheme (relative)
as safe; ensure all earlier startswith checks use the normalized string so
whitespace/entity-encoded bypasses are prevented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f89eef42-1d1d-4318-be1f-e70d557292e4

📥 Commits

Reviewing files that changed from the base of the PR and between 7c695c4 and 5810d3d.

📒 Files selected for processing (4)
  • src/docsfy/api/projects.py
  • src/docsfy/postprocess.py
  • src/docsfy/prompts.py
  • src/docsfy/renderer.py

Comment thread src/docsfy/renderer.py Outdated
Comment thread src/docsfy/renderer.py Outdated
@myakove
Copy link
Copy Markdown
Contributor Author

myakove commented Apr 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/docsfy/postprocess.py`:
- Around line 56-91: The adjacent-callout logic incorrectly treats any prior '>'
line as the start of a callout, causing inserts between a multi-line callout
body and its continuation; update the block in the loop that currently checks
lines[prev_idx].strip().startswith(">") so it only treats the previous non-empty
blockquote as an adjacent callout if that prior line actually matches
_CALLOUT_RE (i.e., use _CALLOUT_RE.match on the trimmed previous line instead of
a plain startswith(">")); keep the existing behaviour of removing blank lines
via result.pop() and appending separators when a true callout adjacency is
detected, and also replace the two startswith checks for backticks/ tildes with
a single startswith(("```","~~~")) call.
- Around line 28-38: The current regexes are too strict: change _DETAILS_OPEN_RE
to allow any content (including HTML) inside <summary> by matching non-greedily
up to the literal </summary> (e.g. use a pattern like
r"<details[^>]*>\s*<summary\b[^>]*?>([\s\S]*?)</summary>" with re.IGNORECASE) so
summaries containing tags like <strong> match, and relax _DETAILS_CLOSE_RE to
match a closing </details> anywhere (e.g. r"</details>" with re.IGNORECASE)
instead of requiring it on its own line; keep the existing substitution order
(run the open-summary replacement first and then the close replacement, as the
current .sub("\n", ...) call site expects) so you don’t accidentally consume
unrelated whitespace.

In `@src/docsfy/renderer.py`:
- Around line 63-81: The _is_safe_url function is already handling HTML-decoding
before stripping and performing a case-insensitive scheme check so no functional
change is required; leave _is_safe_url as implemented (including the checks for
"http://", "https://", "#", "mailto:", leading "/" vs "//", and the final
urllib.parse.urlsplit-based scheme detection). If you prefer a tiny style tweak
(optional), you can collapse the last conditional by returning not parsed.scheme
instead of the explicit if/return, but this is purely stylistic and not
required.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10ffc358-d0a3-4952-96bb-7ccce1dfa217

📥 Commits

Reviewing files that changed from the base of the PR and between 5810d3d and dce6a4e.

📒 Files selected for processing (4)
  • src/docsfy/api/projects.py
  • src/docsfy/postprocess.py
  • src/docsfy/prompts.py
  • src/docsfy/renderer.py

Comment thread src/docsfy/postprocess.py
Comment thread src/docsfy/postprocess.py
Comment thread src/docsfy/renderer.py Outdated
@myakove
Copy link
Copy Markdown
Contributor Author

myakove commented Apr 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/docsfy/postprocess.py`:
- Around line 32-35: _REGEX _DETAILS_CLOSE_RE currently consumes surrounding
whitespace around the closing </details> tag and replacement code replaces the
whole match with a newline, collapsing blank lines; change the regex to only
match the tag (e.g. re.compile(r"</details>", re.IGNORECASE)) so surrounding
newlines/spaces are preserved, and ensure the code that removes the tag (the
re.sub call that currently uses _DETAILS_CLOSE_RE) replaces the match with an
empty string (not a newline); apply the same change to the other similar
match/replacement pair referenced in the comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3fba3344-0347-4387-b06c-886d7691a32a

📥 Commits

Reviewing files that changed from the base of the PR and between dce6a4e and c7ef2ae.

📒 Files selected for processing (4)
  • src/docsfy/api/projects.py
  • src/docsfy/postprocess.py
  • src/docsfy/prompts.py
  • src/docsfy/renderer.py

Comment thread src/docsfy/postprocess.py
… callouts, unrendered markdown in HTML blocks

Fixes #55

- Allow relative URLs in HTML sanitizer while blocking dangerous schemes
  (javascript:, data:, protocol-relative //evil.com, entity-encoded colons)
- Add separate_adjacent_callouts() to split merged Note/Warning/Tip callouts
- Add convert_details_to_headings() to convert <details> blocks to ## headings
- Update AI prompts to forbid <details>/<summary> tags in all page types
- Apply post-processing before render_site in generation pipeline
@myakove myakove force-pushed the fix/issue-55-generation-pipeline-bugs branch from c7ef2ae to 4d1c643 Compare April 18, 2026 01:28
@myakove
Copy link
Copy Markdown
Contributor Author

myakove commented Apr 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@myakove myakove merged commit 824837e into main Apr 18, 2026
5 of 7 checks passed
@myakove myakove deleted the fix/issue-55-generation-pipeline-bugs branch April 18, 2026 09:56
@myakove-bot
Copy link
Copy Markdown
Collaborator

New container for ghcr.io/myk-org/docsfy:latest published

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: three generation pipeline bugs — broken related-pages links, collapsed callouts, unrendered markdown in HTML blocks

2 participants