fix: Update regex dependency for security fix and add uvx documentation#989
fix: Update regex dependency for security fix and add uvx documentation#989ambicuity wants to merge 2 commits intodatalab-to:masterfrom
Conversation
…dd uvx documentation - Update regex dependency from ^2024.4.28 to >=2024.4.28 to allow installation of versions that include security fixes (Fixes datalab-to#975) - Add uvx installation instructions to README.md for fast CLI workflow without global pip install (Fixes datalab-to#971)
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
This PR addresses Issue #975 by relaxing the regex dependency constraint to allow installing versions that include security fixes, and addresses Issue #971 by documenting uvx as an alternative CLI workflow.
Changes:
- Updated
regexversion constraint inpyproject.tomlto permit newer releases. - Regenerated
poetry.lockto reflect the updated dependency resolution. - Added
uvxinstallation/run instructions to the README.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pyproject.toml |
Relaxes regex constraint so users can install patched versions. |
poetry.lock |
Updates locked dependency set after the constraint change. |
README.md |
Adds uvx usage instructions for running the CLI without global installs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pyproject.toml
Outdated
| rapidfuzz = "^3.8.1" | ||
| surya-ocr = "^0.17.1" | ||
| regex = "^2024.4.28" | ||
| regex = ">=2024.4.28" |
There was a problem hiding this comment.
regex is now unbounded above (>=2024.4.28), unlike the rest of the dependencies here which all use constraints that include an implicit upper bound (e.g., ^ / ~). Consider using a bounded range that still permits the patched releases (e.g., allow 2025.x) but avoids accidentally pulling in a future breaking release (date-based major bumps).
| Alternatively, you can use [uvx](https://docs.astral.sh/uv/guides/tools/) for a fast CLI workflow without installing packages globally: | ||
|
|
||
| ```shell | ||
| uvx --from marker-pdf marker_single /path/to/file.pdf | ||
| ``` | ||
|
|
||
| For the full installation with additional document format support: | ||
|
|
||
| ```shell | ||
| uvx --from "marker-pdf[full]" marker_single /path/to/file.pdf | ||
| ``` |
There was a problem hiding this comment.
The uvx examples omit --output_dir. By default marker_single writes to settings.OUTPUT_DIR, which resolves under the installed package directory (site-packages) and may be non-writable / unexpected when run via uvx. Update the examples to pass an explicit output directory (e.g., current working directory) so the command reliably produces output where users expect.
|
I have read the CLA Document and I hereby sign the CLA |
- Use bounded regex constraint (>=2024.4.28,<2026) to prevent accidental future breaking releases while still allowing 2025.x security-patched versions - Add --output_dir flag to uvx examples for predictable output location
Summary
This PR addresses two open issues:
Issue #975: Security vulnerability in regex
^2024.4.28to>=2024.4.28Issue #971: Add uvx documentation
Fixes #975
Fixes #971