BACK-475 - Add-Word-(docx)-upload-to-enable-image-extraction-for-pasted-Word-content by kuwork · Pull Request #648 · MrLesk/Backlog.md

kuwork · 2026-05-11T17:11:46Z

What

Allow users to upload Word documents (.docx) directly into the Web UI editor. The backend extracts text and images, converting them to Markdown with proper image references.

Why

The existing paste-as-markdown feature (BACK-208) cannot extract images from pasted Word content because browser clipboard APIs don't expose embedded images as extractable blobs. By supporting direct .docx file upload, mammoth can read the docx archive and extract embedded images to the temp assets directory.

Changes

Backend (src/core/docx-converter.ts): New module using mammoth to convert .docx → HTML. Embedded images are extracted via mammoth's convertImage callback and uploaded to backlog/assets/.temp/ via AssetManager.
Backend (src/server/index.ts): New POST /api/docx/convert endpoint. Accepts multipart/form-data, validates .docx extension, returns { html, images, messages }.
Frontend (src/web/components/PasteAwareMDEditor.tsx): Added Word upload button to editor toolbar (extraCommands), drag-and-drop support, and a hidden file picker. Uploads file to backend, then runs cleanHtml + Turndown in the browser to produce Markdown.
Frontend (src/web/utils/paste-as-markdown.ts): Extracted cleanHtml as an exported async function with a new keepMedia option. This allows the docx upload path to preserve server-side extracted images while the paste path continues to filter invalid local images.
Frontend (src/web/lib/api.ts): Added convertDocx() API client method.
Tests (src/test/server-docx-convert.test.ts): Integration tests for the conversion endpoint (validation, conversion, image extraction to temp directory).
Dependencies: Added mammoth for docx parsing.

How it works

User clicks the Word icon in the editor toolbar or drags a .docx file onto the editor.
Frontend uploads the file to POST /api/docx/convert.
Backend uses mammoth to convert docx → HTML. During conversion, embedded images are extracted and saved to .temp/ with UUID filenames.
Backend returns { html, images, messages }.
Frontend calls cleanHtml(html, { keepMedia: true }) to normalize Word HTML (flatten table cells, convert mso-lists, strip classes, etc.) while preserving <img> tags.
Frontend runs Turndown + post-processing to get clean Markdown.
Markdown is inserted at the current cursor position in the editor.
When the user saves, the existing POST /api/assets/promote flow promotes temp images to the permanent paste directory.

Testing

bun test src/test/server-docx-convert.test.ts — backend endpoint tests (4 pass)
bun test src/test/build.test.ts — CLI compile still works (no jsdom in bundle)
bunx tsc --noEmit — type check passes

closes BACK-475

…umbers

(cherry picked from commit e1ea342)

…ted-Word-content (cherry picked from commit c92e3b3)

kuwork added 2 commits May 9, 2026 23:38

back-467 - Add-local-file-preview-with-syntax-highlighting-and-line-n…

6ea4ea4

…umbers

BACK-208 - Add-paste-as-markdown-support-in-Web-UI

b26797e

(cherry picked from commit e1ea342)

kuwork force-pushed the docx-upload branch from f57e089 to c92e3b3 Compare May 12, 2026 13:56

back-475 - Add-Word-(docx)-upload-to-enable-image-extraction-for-pas…

a8531fc

…ted-Word-content (cherry picked from commit c92e3b3)

kuwork force-pushed the docx-upload branch from c92e3b3 to a8531fc Compare May 13, 2026 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BACK-475 - Add-Word-(docx)-upload-to-enable-image-extraction-for-pasted-Word-content#648

BACK-475 - Add-Word-(docx)-upload-to-enable-image-extraction-for-pasted-Word-content#648
kuwork wants to merge 3 commits into
MrLesk:mainfrom
kuwork:docx-upload

kuwork commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kuwork commented May 11, 2026

What

Why

Changes

How it works

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant