feat(markdown-processor)!: 3.0 redesign#37
Conversation
7daf36c to
4348e17
Compare
…s mode, sanitize default-on with handler-side placeholder Refs #36. Breaking change with no compatibility shim — see README and the 3.0 changeset for migration. - Drop `/server`; expose `/processor` (factory) + `/processor/full` (shiki full bundle, replaces `/server`) + `/processor/web` (shiki web bundle). - Drop the v2 `parseMarkdownToHTML(md, option)` shim entirely. All entries now expose a factory (`createMarkdownProcessor*`) that returns a `{ parse, getStylesheet }` object. Consumers build it once and reuse it so shiki bundle init is amortised. - Apply `@shikijs/transformers`'s `transformerStyleToClass` by default with multi-theme `github-dark` / `github-light` (+ `defaultColor: false`) so token colours land in CSS variables. The full / web presets share this default. Highlight stylesheet reachable via `processor.getStylesheet()`. - Untrust-first sanitize: `rehype-sanitize` runs by default in the middle of the pipeline (after remarkRehype, before the package's other rehype steps) against `hast-util-sanitize`'s defaultSchema. Only one schema tweak: `div.className` allow, needed so `remark-flexible-code-titles`'s `<div class="remark-code-container">` wrapper survives. - `::youtube[id]` embed uses a handler-side text placeholder pattern: `remarkEmbedHandlers.youtube` emits a text node and stashes the embed metadata on `vfile.data.maximumEmbeds`. `reattachEmbeds` plugin (after sanitize) substitutes the placeholder with the real `<iframe>`. With iframe absent from the sanitize schema, `::youtube[id]` is the only structural path that can produce an `<iframe>` in the output. Spoofing is handled in `reattachEmbeds` by checking `store.get(value)`; attacker-written placeholders fall through as text. - README + 3.0 changeset shipped together for migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4348e17 to
b526d5f
Compare
| const embed = store.get(value); | ||
| if (!embed) return; // store に無い = handler 経由ではない = attacker、 触らない |
There was a problem hiding this comment.
これなぁ
rehype-sanitizeが強すぎて埋め込み表現iframeだから全部かき消されるんだよね
There was a problem hiding this comment.
一応iframe除外とか特定の正規表現に引っかかったらとか色々設定できるけどなるべくオプションなしで使いたいよねこういうライブラリは
There was a problem hiding this comment.
Pull request overview
This PR redesigns @saitamau-maximum/markdown-processor for v3 by replacing the old /server API with factory-based processor entrypoints, adding default sanitization, and moving Shiki output to class-based highlighting.
Changes:
- Replaces
/serverwith/processor,/processor/full, and/processor/webfactory APIs. - Adds a sanitized markdown pipeline with post-sanitize embed reattachment and class-based Shiki styling.
- Updates examples, docs, package exports, dependencies, and migration notes for the v3 breaking change.
Reviewed changes
Copilot reviewed 22 out of 27 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents new subpaths, factory usage, stylesheet handling, sanitization, and migration. |
| pnpm-lock.yaml | Updates lockfile for new sanitizer, Shiki transformer, and runtime dependency changes. |
| packages/markdown-processor/src/server/shiki.ts | Removes old server-side Shiki highlighter helper. |
| packages/markdown-processor/src/server/shiki.test.ts | Removes tests for deleted Shiki helper. |
| packages/markdown-processor/src/server/remark-embed.ts | Removes old direct iframe embed plugin. |
| packages/markdown-processor/src/server/remark-embed.test.ts | Removes tests for old embed implementation. |
| packages/markdown-processor/src/server/index.ts | Removes old /server parse API. |
| packages/markdown-processor/src/server/index.test.ts | Removes old /server integration tests. |
| packages/markdown-processor/src/processor/web.ts | Adds web-bundle processor preset. |
| packages/markdown-processor/src/processor/sanitize-schema.test.ts | Adds tests pinning sanitizer schema behavior. |
| packages/markdown-processor/src/processor/plugins/remark-fallback-directives.ts | Adds fallback handling for unsupported directives. |
| packages/markdown-processor/src/processor/plugins/remark-fallback-directives.test.ts | Adds tests for directive fallback behavior. |
| packages/markdown-processor/src/processor/plugins/remark-embed.ts | Adds placeholder-based YouTube embed handling. |
| packages/markdown-processor/src/processor/plugins/remark-embed.test.ts | Adds tests for embed placeholder storage and reattachment. |
| packages/markdown-processor/src/processor/plugins/rehype-extract-toc.ts | Reintroduces TOC extraction under the new processor path. |
| packages/markdown-processor/src/processor/plugins/rehype-extract-toc.test.ts | Adds TOC extraction tests. |
| packages/markdown-processor/src/processor/plugins/reattach-embeds.ts | Adds post-sanitize iframe reconstruction from stored embed metadata. |
| packages/markdown-processor/src/processor/pipeline.ts | Adds the new unified markdown processing pipeline. |
| packages/markdown-processor/src/processor/index.ts | Adds the core factory API and stylesheet access. |
| packages/markdown-processor/src/processor/index.test.ts | Adds tests for factory behavior, class highlighting, and sanitization. |
| packages/markdown-processor/src/processor/full.ts | Adds full-bundle processor preset. |
| packages/markdown-processor/src/processor/full.test.ts | Adds end-to-end tests for the full preset. |
| packages/markdown-processor/src/index.ts | Converts root entry to type-only exports. |
| packages/markdown-processor/package.json | Updates dependencies and export maps for new entrypoints. |
| examples/react-router-blog-on-cloudflare/app/routes/article.tsx | Migrates React Router example to the new factory API. |
| examples/next-js-blog/src/app/blog/[slug]/page.tsx | Migrates Next.js example to the new factory API. |
| .changeset/processor-redesign-v3.md | Adds major-version migration and breaking-change notes. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
packages/markdown-processor/src/processor/plugins/remark-embed.ts:118
- The placeholder key is predictable (
__MAXIMUM_EMBED_0__, etc.), so user-authored text that preserves this exact value (for example inside inline/code blocks or escaped markdown) can be visited before the real directive placeholder and be replaced with the stored iframe, while the legitimate embed is left as text afterstore.delete. This breaks the stated spoofing protection and can move embeds to attacker-chosen locations. Use unguessable per-embed tokens or otherwise bind replacement to the handler-emitted node rather than a deterministic counter.
const placeholder = `${PLACEHOLDER_PREFIX}${store.size}${PLACEHOLDER_SUFFIX}`;
store.set(placeholder, { kind: "youtube", id: node.id, width, height });
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ading ids get clobber-prefixed untrust な heading text 由来の id が sanitize の `clobberPrefix` 保護を bypass する問題 (PR #37 copilot review より): - `# constructor` のような heading が `id="constructor"` を生成して `document.constructor` などを DOM clobbering で上書きできる - pipeline 後段にあった `rehype-slug` を sanitize の前段に移動して、 `hast-util-sanitize` の defaultSchema が `id` を `user-content-` で 前置するようにする (GitHub README と同じ挙動) 副次: `# Hello $x^2$` 等 katex を含む heading の slug が、 これまで katex 展開後の DOM テキストから生成され `hello-x2x2x2` のような重複を 持っていた。 slug を前段に置いたことで markdown 段の heading 値から slug 生成されるようになり、 `user-content-hello-x2` のように整う。 TOC の `data.id` も同じ prefix が乗るので、 既存の anchor link を直書き している consumer は更新が要る (changeset / migration-v3.md に追記)。
review feedback from @a01sa01to — migration guide が長くて README に そのまま入れると目的別 doc としては読みづらいので分離。 README からは 1 行のリンクのみ残す。
v3 で token color が inline style から class (`__maximum_md_*`) に変わった ので、 stylesheet を ssr 出力に流さないと syntax highlight が落ちる (無色になる)。 next-js / react-router 両 example で getStylesheet() を 取り出して <style> tag で配信する。
…rocessor/web - comment が 旧名 `rehype-embed-youtube` のまま残っていた箇所を `reattachEmbeds` (実装名) に揃える。 - `/processor/web` は subpath 解決と factory 起動経路だけを確認する smoke test を追加 (sanitize / TOC / directive 等の振る舞いは `full.test.ts` で pin 済みなので重複しない)。
20ef475 to
78d1268
Compare
1.3.0 is deprecated (CWE-502 prototype pollution); the fix shipped in 1.3.1. The library is transitive via hast-util-sanitize and mdast-util-to-hast, so use a pnpm `overrides` entry to bump it graph-wide. Both parents accept ^1.3.0, so 1.3.1 is api-compat.
…e wrapping @a01sa01to の「複数行コメントの改行位置が気持ち悪い」 への対応。 喋りすぎていた箇所をまとめて圧縮 + 文の途中で改行しない 1 行 style に揃える。 ロジックは無変更、 net -126 行。
04c97d8 to
67f0054
Compare
…-break comments 複数行コメントを 1 行に潰したり中途で改行していた箇所を、 Asa の review suggestion 通りに節 / 文 境界で割る形に揃える。 あわせて style sweep で消してしまった sanitize 境界の説明コメント (pipeline.ts) を Asa の指摘に従って復元。
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 30 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (2)
README.md:81
- Use "untrusted" here; "untrust" is not the standard adjective for content/input.
- 標準 markdown 由来の `[xss](javascript:...)`、 raw `<script>` / `<iframe>` / `<style>` / `on*` 属性、 など untrust な markdown content は確実に drop される
.changeset/processor-redesign-v3.md:31
- Use "untrusted" here; "untrust" is not the standard adjective for markdown/input.
consumer の `rehypePlugins` は sanitize の後段で動く。 untrust な markdown を扱う consumer が独自 plugin で raw HTML を挿入する場合は consumer 責任。
| import remarkRehype from "remark-rehype"; | ||
| import { type Plugin, type Pluggable, unified } from "unified"; | ||
|
|
||
| import type { Schema } from "hast-util-sanitize"; |
| ### Sanitize (built-in、 untrust 前提) | ||
|
|
||
| `rehype-sanitize` が pipeline 内に default で挟まる。 schema は `hast-util-sanitize` の `defaultSchema` をほぼ素のまま使い、 唯一の拡張は `div.className` (`remark-flexible-code-titles` の wrapper を残すため)。 | ||
|
|
||
| - 標準 markdown 由来の `[xss](javascript:...)`、 raw `<script>` / `<iframe>` / `<style>` / `on*` 属性、 など untrust な markdown content は確実に drop される |
| - `rehype-sanitize` を default で pipeline に挟む (untrust 前提)。 schema は `hast-util-sanitize` の `defaultSchema` + `div.className` のみ拡張。 `[xss](javascript:...)` のような markdown 構文 XSS や raw HTML は確実に drop される。 | ||
| - heading `id` が `user-content-` prefix 付きになる (`#heading` → `#user-content-heading`)。 sanitize の clobber-prefix 機構を活かす為に `rehype-slug` を sanitize の前段に置いた結果で、 GitHub README と同じ振る舞い。 既存の TOC anchor link が prefix 込みに変わるので consumer 側で更新が要る。 | ||
|
|
||
| **実装メモ** | ||
|
|
||
| `::youtube[id]` 由来の `<iframe>` は handler が直接出さず、 text placeholder を hast に置き、 `vfile.data` に embed metadata を退避する。 sanitize 後段の `reattachEmbeds` plugin が placeholder を本物の `<iframe>` に置換するため、 schema を iframe-allow に拡張する必要がない。 「`::youtube[id]` 経由以外で `<iframe>` が出力に出る経路は AST レベルで存在しない」 という構造的保証も併せて成立する。 | ||
|
|
||
| consumer の `rehypePlugins` は sanitize の後段で動く。 untrust な markdown を扱う consumer が独自 plugin で raw HTML を挿入する場合は consumer 責任。 |
| .use(remarkFallbackDirectives) | ||
| .use(remarkRehype, { handlers: { ...remarkEmbedHandlers } }) | ||
| // slug は sanitize の前段に置く。 | ||
| // defaultSchema の `clobberPrefix: "user-content-"` が untrust な heading text 由来の id (例えば `# constructor`) を必ず前置するようにする為。 |




3.0 redesign。 #36 で挙げた 3 点 (責務境界での import 分離 / shiki bundle / default sanitize) をまとめて解く。
Breaking
/server廃止 →/processor/full(旧/server相当) +/processor/webに再編createMarkdownProcessor*) 形式.changeset/processor-redesign-v3.md参照設計の要点
rehype-sanitize+defaultSchemaにdiv.className1 行のみ拡張)::youtube[id]は handler 段で text placeholder +vfile.data退避、 sanitize 後段のreattachEmbedsで<iframe>再構築。 結果 「::youtube以外で iframe を出す経路が AST に無い」 を pipeline の不変条件として成立refs: #36