Skip to content

feat(markdown-processor)!: 3.0 redesign#37

Merged
sor4chi merged 9 commits into
mainfrom
wip/processor-redesign-v3
May 14, 2026
Merged

feat(markdown-processor)!: 3.0 redesign#37
sor4chi merged 9 commits into
mainfrom
wip/processor-redesign-v3

Conversation

@sor4chi
Copy link
Copy Markdown
Member

@sor4chi sor4chi commented May 12, 2026

3.0 redesign。 #36 で挙げた 3 点 (責務境界での import 分離 / shiki bundle / default sanitize) をまとめて解く。

Breaking

  • /server 廃止 → /processor/full (旧 /server 相当) + /processor/web に再編
  • v2 互換 shim 無し。 全 entry が factory (createMarkdownProcessor*) 形式
  • migration は README + .changeset/processor-redesign-v3.md 参照

設計の要点

  • sanitize default-on (rehype-sanitize + defaultSchemadiv.className 1 行のみ拡張)
  • 拡張機能 (shiki / katex / embed) は sanitize 後段 で動く。 schema を肥やさずに機能を増やせる
  • ::youtube[id] は handler 段で text placeholder + vfile.data 退避、 sanitize 後段の reattachEmbeds<iframe> 再構築。 結果 「::youtube 以外で iframe を出す経路が AST に無い」 を pipeline の不変条件として成立

refs: #36

@sor4chi sor4chi force-pushed the wip/processor-redesign-v3 branch 5 times, most recently from 7daf36c to 4348e17 Compare May 12, 2026 16:20
…s mode, sanitize default-on with handler-side placeholder

Refs #36. Breaking change with no compatibility shim — see README and the
3.0 changeset for migration.

- Drop `/server`; expose `/processor` (factory) + `/processor/full` (shiki
  full bundle, replaces `/server`) + `/processor/web` (shiki web bundle).
- Drop the v2 `parseMarkdownToHTML(md, option)` shim entirely. All entries
  now expose a factory (`createMarkdownProcessor*`) that returns a
  `{ parse, getStylesheet }` object. Consumers build it once and reuse it
  so shiki bundle init is amortised.
- Apply `@shikijs/transformers`'s `transformerStyleToClass` by default
  with multi-theme `github-dark` / `github-light` (+ `defaultColor: false`)
  so token colours land in CSS variables. The full / web presets share
  this default. Highlight stylesheet reachable via
  `processor.getStylesheet()`.
- Untrust-first sanitize: `rehype-sanitize` runs by default in the middle
  of the pipeline (after remarkRehype, before the package's other rehype
  steps) against `hast-util-sanitize`'s defaultSchema. Only one schema
  tweak: `div.className` allow, needed so
  `remark-flexible-code-titles`'s `<div class="remark-code-container">`
  wrapper survives.
- `::youtube[id]` embed uses a handler-side text placeholder pattern:
  `remarkEmbedHandlers.youtube` emits a text node and stashes the embed
  metadata on `vfile.data.maximumEmbeds`. `reattachEmbeds` plugin (after
  sanitize) substitutes the placeholder with the real `<iframe>`. With
  iframe absent from the sanitize schema, `::youtube[id]` is the only
  structural path that can produce an `<iframe>` in the output. Spoofing
  is handled in `reattachEmbeds` by checking `store.get(value)`;
  attacker-written placeholders fall through as text.
- README + 3.0 changeset shipped together for migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sor4chi sor4chi force-pushed the wip/processor-redesign-v3 branch from 4348e17 to b526d5f Compare May 12, 2026 16:29
@sor4chi sor4chi changed the title feat(markdown-processor): 3.0 redesign (WIP) feat(markdown-processor)!: 3.0 redesign May 13, 2026
@sor4chi sor4chi marked this pull request as ready for review May 13, 2026 19:34
@sor4chi sor4chi requested a review from a01sa01to May 13, 2026 19:35
Copy link
Copy Markdown
Member

@a01sa01to a01sa01to left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

複数行コメントの改行位置が気持ち悪いけどそれ以外の部分で

Comment thread README.md
Comment on lines +53 to +54
const embed = store.get(value);
if (!embed) return; // store に無い = handler 経由ではない = attacker、 触らない
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\_\_MAXIMUM_EMBED_0\_\_

で改ざんできますね (まあ位置替わるだけだし、そんなことする人いないだろうが)

PoC

Source

Image Image

Preview

Image Image

Copy link
Copy Markdown
Member Author

@sor4chi sor4chi May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

これなぁ
rehype-sanitizeが強すぎて埋め込み表現iframeだから全部かき消されるんだよね

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

これは挙げたらキリがないので別に放置でいいと思うけどね

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一応iframe除外とか特定の正規表現に引っかかったらとか色々設定できるけどなるべくオプションなしで使いたいよねこういうライブラリは

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR redesigns @saitamau-maximum/markdown-processor for v3 by replacing the old /server API with factory-based processor entrypoints, adding default sanitization, and moving Shiki output to class-based highlighting.

Changes:

  • Replaces /server with /processor, /processor/full, and /processor/web factory APIs.
  • Adds a sanitized markdown pipeline with post-sanitize embed reattachment and class-based Shiki styling.
  • Updates examples, docs, package exports, dependencies, and migration notes for the v3 breaking change.

Reviewed changes

Copilot reviewed 22 out of 27 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
README.md Documents new subpaths, factory usage, stylesheet handling, sanitization, and migration.
pnpm-lock.yaml Updates lockfile for new sanitizer, Shiki transformer, and runtime dependency changes.
packages/markdown-processor/src/server/shiki.ts Removes old server-side Shiki highlighter helper.
packages/markdown-processor/src/server/shiki.test.ts Removes tests for deleted Shiki helper.
packages/markdown-processor/src/server/remark-embed.ts Removes old direct iframe embed plugin.
packages/markdown-processor/src/server/remark-embed.test.ts Removes tests for old embed implementation.
packages/markdown-processor/src/server/index.ts Removes old /server parse API.
packages/markdown-processor/src/server/index.test.ts Removes old /server integration tests.
packages/markdown-processor/src/processor/web.ts Adds web-bundle processor preset.
packages/markdown-processor/src/processor/sanitize-schema.test.ts Adds tests pinning sanitizer schema behavior.
packages/markdown-processor/src/processor/plugins/remark-fallback-directives.ts Adds fallback handling for unsupported directives.
packages/markdown-processor/src/processor/plugins/remark-fallback-directives.test.ts Adds tests for directive fallback behavior.
packages/markdown-processor/src/processor/plugins/remark-embed.ts Adds placeholder-based YouTube embed handling.
packages/markdown-processor/src/processor/plugins/remark-embed.test.ts Adds tests for embed placeholder storage and reattachment.
packages/markdown-processor/src/processor/plugins/rehype-extract-toc.ts Reintroduces TOC extraction under the new processor path.
packages/markdown-processor/src/processor/plugins/rehype-extract-toc.test.ts Adds TOC extraction tests.
packages/markdown-processor/src/processor/plugins/reattach-embeds.ts Adds post-sanitize iframe reconstruction from stored embed metadata.
packages/markdown-processor/src/processor/pipeline.ts Adds the new unified markdown processing pipeline.
packages/markdown-processor/src/processor/index.ts Adds the core factory API and stylesheet access.
packages/markdown-processor/src/processor/index.test.ts Adds tests for factory behavior, class highlighting, and sanitization.
packages/markdown-processor/src/processor/full.ts Adds full-bundle processor preset.
packages/markdown-processor/src/processor/full.test.ts Adds end-to-end tests for the full preset.
packages/markdown-processor/src/index.ts Converts root entry to type-only exports.
packages/markdown-processor/package.json Updates dependencies and export maps for new entrypoints.
examples/react-router-blog-on-cloudflare/app/routes/article.tsx Migrates React Router example to the new factory API.
examples/next-js-blog/src/app/blog/[slug]/page.tsx Migrates Next.js example to the new factory API.
.changeset/processor-redesign-v3.md Adds major-version migration and breaking-change notes.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)

packages/markdown-processor/src/processor/plugins/remark-embed.ts:118

  • The placeholder key is predictable (__MAXIMUM_EMBED_0__, etc.), so user-authored text that preserves this exact value (for example inside inline/code blocks or escaped markdown) can be visited before the real directive placeholder and be replaced with the stored iframe, while the legitimate embed is left as text after store.delete. This breaks the stated spoofing protection and can move embeds to attacker-chosen locations. Use unguessable per-embed tokens or otherwise bind replacement to the handler-emitted node rather than a deterministic counter.
    const placeholder = `${PLACEHOLDER_PREFIX}${store.size}${PLACEHOLDER_SUFFIX}`;
    store.set(placeholder, { kind: "youtube", id: node.id, width, height });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/markdown-processor/src/processor/pipeline.ts Outdated
Comment thread examples/next-js-blog/src/app/blog/[slug]/page.tsx
Comment thread examples/react-router-blog-on-cloudflare/app/routes/article.tsx Outdated
Comment thread packages/markdown-processor/src/processor/plugins/remark-embed.ts Outdated
Comment thread packages/markdown-processor/src/processor/web.ts
Comment thread pnpm-lock.yaml Outdated
sor4chi added 4 commits May 14, 2026 18:00
…ading ids get clobber-prefixed

untrust な heading text 由来の id が sanitize の `clobberPrefix` 保護を
bypass する問題 (PR #37 copilot review より):

- `# constructor` のような heading が `id="constructor"` を生成して
  `document.constructor` などを DOM clobbering で上書きできる
- pipeline 後段にあった `rehype-slug` を sanitize の前段に移動して、
  `hast-util-sanitize` の defaultSchema が `id` を `user-content-` で
  前置するようにする (GitHub README と同じ挙動)

副次: `# Hello $x^2$` 等 katex を含む heading の slug が、 これまで
katex 展開後の DOM テキストから生成され `hello-x2x2x2` のような重複を
持っていた。 slug を前段に置いたことで markdown 段の heading 値から
slug 生成されるようになり、 `user-content-hello-x2` のように整う。

TOC の `data.id` も同じ prefix が乗るので、 既存の anchor link を直書き
している consumer は更新が要る (changeset / migration-v3.md に追記)。
review feedback from @a01sa01to — migration guide が長くて README に
そのまま入れると目的別 doc としては読みづらいので分離。 README からは
1 行のリンクのみ残す。
v3 で token color が inline style から class (`__maximum_md_*`) に変わった
ので、 stylesheet を ssr 出力に流さないと syntax highlight が落ちる
(無色になる)。 next-js / react-router 両 example で getStylesheet() を
取り出して <style> tag で配信する。
…rocessor/web

- comment が 旧名 `rehype-embed-youtube` のまま残っていた箇所を
  `reattachEmbeds` (実装名) に揃える。
- `/processor/web` は subpath 解決と factory 起動経路だけを確認する smoke
  test を追加 (sanitize / TOC / directive 等の振る舞いは `full.test.ts`
  で pin 済みなので重複しない)。
@sor4chi sor4chi force-pushed the wip/processor-redesign-v3 branch from 20ef475 to 78d1268 Compare May 14, 2026 09:06
sor4chi added 3 commits May 14, 2026 18:10
1.3.0 is deprecated (CWE-502 prototype pollution); the fix shipped in
1.3.1. The library is transitive via hast-util-sanitize and
mdast-util-to-hast, so use a pnpm `overrides` entry to bump it
graph-wide. Both parents accept ^1.3.0, so 1.3.1 is api-compat.
…e wrapping

@a01sa01to の「複数行コメントの改行位置が気持ち悪い」 への対応。 喋りすぎていた箇所をまとめて圧縮 + 文の途中で改行しない 1 行 style に揃える。 ロジックは無変更、 net -126 行。
@sor4chi sor4chi force-pushed the wip/processor-redesign-v3 branch from 04c97d8 to 67f0054 Compare May 14, 2026 09:14
@sor4chi sor4chi requested a review from a01sa01to May 14, 2026 09:16
Copy link
Copy Markdown
Member

@a01sa01to a01sa01to left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

確かに「複数行コメントの改行位置が気持ち悪い」は言ったけどさ、 1 行だとそれはそれで見づらいじゃんか
句点とかで適宜改行してくれ~

Comment thread packages/markdown-processor/src/processor/pipeline.ts Outdated
Comment thread examples/next-js-blog/src/app/blog/[slug]/page.tsx Outdated
Comment thread packages/markdown-processor/src/processor/plugins/reattach-embeds.ts Outdated
Comment thread packages/markdown-processor/src/processor/plugins/remark-embed.test.ts Outdated
Comment thread packages/markdown-processor/src/processor/plugins/remark-embed.ts Outdated
Comment thread packages/markdown-processor/src/processor/pipeline.ts Outdated
Comment thread packages/markdown-processor/src/processor/pipeline.ts Outdated
Comment thread packages/markdown-processor/src/processor/pipeline.ts Outdated
Comment thread packages/markdown-processor/src/processor/pipeline.ts Outdated
Comment thread packages/markdown-processor/src/processor/sanitize-schema.test.ts Outdated
…-break comments

複数行コメントを 1 行に潰したり中途で改行していた箇所を、 Asa の review suggestion 通りに節 / 文 境界で割る形に揃える。
あわせて style sweep で消してしまった sanitize 境界の説明コメント (pipeline.ts) を Asa の指摘に従って復元。
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 30 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

README.md:81

  • Use "untrusted" here; "untrust" is not the standard adjective for content/input.
- 標準 markdown 由来の `[xss](javascript:...)`、 raw `<script>` / `<iframe>` / `<style>` / `on*` 属性、 など untrust な markdown content は確実に drop される

.changeset/processor-redesign-v3.md:31

  • Use "untrusted" here; "untrust" is not the standard adjective for markdown/input.
consumer の `rehypePlugins` は sanitize の後段で動く。 untrust な markdown を扱う consumer が独自 plugin で raw HTML を挿入する場合は consumer 責任。

import remarkRehype from "remark-rehype";
import { type Plugin, type Pluggable, unified } from "unified";

import type { Schema } from "hast-util-sanitize";
Comment thread README.md
Comment on lines +77 to +81
### Sanitize (built-in、 untrust 前提)

`rehype-sanitize` が pipeline 内に default で挟まる。 schema は `hast-util-sanitize` の `defaultSchema` をほぼ素のまま使い、 唯一の拡張は `div.className` (`remark-flexible-code-titles` の wrapper を残すため)。

- 標準 markdown 由来の `[xss](javascript:...)`、 raw `<script>` / `<iframe>` / `<style>` / `on*` 属性、 など untrust な markdown content は確実に drop される
Comment on lines +24 to +31
- `rehype-sanitize` を default で pipeline に挟む (untrust 前提)。 schema は `hast-util-sanitize` の `defaultSchema` + `div.className` のみ拡張。 `[xss](javascript:...)` のような markdown 構文 XSS や raw HTML は確実に drop される。
- heading `id` が `user-content-` prefix 付きになる (`#heading` → `#user-content-heading`)。 sanitize の clobber-prefix 機構を活かす為に `rehype-slug` を sanitize の前段に置いた結果で、 GitHub README と同じ振る舞い。 既存の TOC anchor link が prefix 込みに変わるので consumer 側で更新が要る。

**実装メモ**

`::youtube[id]` 由来の `<iframe>` は handler が直接出さず、 text placeholder を hast に置き、 `vfile.data` に embed metadata を退避する。 sanitize 後段の `reattachEmbeds` plugin が placeholder を本物の `<iframe>` に置換するため、 schema を iframe-allow に拡張する必要がない。 「`::youtube[id]` 経由以外で `<iframe>` が出力に出る経路は AST レベルで存在しない」 という構造的保証も併せて成立する。

consumer の `rehypePlugins` は sanitize の後段で動く。 untrust な markdown を扱う consumer が独自 plugin で raw HTML を挿入する場合は consumer 責任。
.use(remarkFallbackDirectives)
.use(remarkRehype, { handlers: { ...remarkEmbedHandlers } })
// slug は sanitize の前段に置く。
// defaultSchema の `clobberPrefix: "user-content-"` が untrust な heading text 由来の id (例えば `# constructor`) を必ず前置するようにする為。
@sor4chi sor4chi requested a review from a01sa01to May 14, 2026 14:10
Copy link
Copy Markdown
Member

@a01sa01to a01sa01to left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot のはまあいいか たぶん大丈夫でしょう

@sor4chi sor4chi merged commit 0391178 into main May 14, 2026
13 checks passed
@sor4chi sor4chi deleted the wip/processor-redesign-v3 branch May 14, 2026 15:08
@github-actions github-actions Bot mentioned this pull request May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

処理の重い entry を責務境界で分離 + デフォルトでサニタイズ・ class-only highlight に (major redesign)

3 participants