Skip to content

feat: implement pdf-to-md-swift converter #48

@PsychQuantClaw

Description

@PsychQuantClaw

Summary

Restore and finish pdf-to-md-swift as a proper Layer 3 converter in macdoc.

The current main branch only carries the package manifest / README / CLI wiring from the earlier merge, but the tracked converter source and tests are missing. This follow-up issue closes that gap by landing the actual implementation.

Conversion Requirements

  • Input: .pdf
  • Output: Markdown stream / .md
  • Architecture: direct PDF → Markdown path, avoiding hub loss through LaTeX
  • Protocol: implement DocumentConverter with StreamingOutput

Layer 1 / Core Dependencies

  • PDFKit for native macOS PDF parsing
  • common-converter-swift (DocumentConverter, StreamingOutput, ConversionOptions)

Implementation Notes

  • Extract PDF content page-by-page
  • Heuristically map headings, paragraphs, ordered lists, unordered lists
  • Preserve page boundaries with Markdown thematic breaks
  • Support frontmatter + hard line break options already exposed in CLI
  • Keep logic in packages/pdf-to-md-swift/ as an independent Swift package

Test Strategy

  • Package-level unit tests for headings / paragraphs / list detection
  • Page-break handling across multi-page PDFs
  • Hyphenated line join behavior
  • Frontmatter and hard-break options
  • swift test inside packages/pdf-to-md-swift

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions