Strip metadata and links from PDFs locally — no upload, no tracking.
Removes /Link annotations and wipes the Info dictionary plus any XMP metadata stream attached to the catalog. Ships as both a programmatic library and an npx CLI. One runtime dependency: pdf-lib.
- Requirements
- Install
- Usage
- Why this exists
- CLI
- API
- Limitations
- Compared to alternatives
- Contributing
- License
- Node.js
>= 22 LTS. Use fnm for fast Rust-based version switching. - Any modern package manager: pnpm, npm, yarn, bun.
As a library
pnpm add @coroboros/pdf-cleanernpm install @coroboros/pdf-cleaneryarn add @coroboros/pdf-cleanerbun add @coroboros/pdf-cleanerAs a CLI
# Run without installing
npx @coroboros/pdf-cleaner cv.pdf# Install globally for repeated use
pnpm add -g @coroboros/pdf-cleaner
pdf-cleaner --helpProgrammatic
import { readFile, writeFile } from 'node:fs/promises';
import { clean } from '@coroboros/pdf-cleaner';
const cleaned = await clean(await readFile('cv.pdf'));
await writeFile('cv_clean.pdf', cleaned);CLI
npx @coroboros/pdf-cleaner cv.pdfPDFs carry hidden authorship. The Info dictionary embeds /Title, /Author, /Producer, creation and modification dates, and any XMP metadata stream attached to the catalog. Hyperlinks travel via /Link annotations on each page. Hosted cleaners strip both, then upload the bytes. @coroboros/pdf-cleaner runs the same strips in-process on a single dependency (pdf-lib). No network calls, no telemetry. See bench/baseline.md for the round-trip numbers and the regression budget.
pdf-cleaner <input> [options]
Strip metadata and links from a PDF or a directory of PDFs. Writes the cleaned bytes alongside the input with a _clean.pdf suffix unless --out or --in-place is set.
Arguments
| Arg | Type | Description |
|---|---|---|
<input> |
string (required) |
A .pdf file, or a directory of .pdf files. Directory mode is top-level only — subdirectories are not traversed. |
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--out <dir> |
string |
alongside input | Output directory for cleaned files. Created if missing. |
--in-place |
boolean |
false |
Overwrite the input(s) in place. TTY prompts for confirmation; non-TTY contexts require --yes. |
--yes, -y |
boolean |
false |
Skip the --in-place confirmation prompt. Required to run --in-place in CI, scripts, or any non-TTY context. |
--keep-links |
boolean |
false |
Preserve /Link annotations. Other annotation subtypes are preserved regardless. |
--keep-metadata |
boolean |
false |
Preserve the Info dictionary (/Title, /Author, /Subject, /Keywords, /Creator, /Producer, /CreationDate, /ModDate) and any XMP metadata stream. |
--help, -h |
boolean |
— | Print the usage block and exit 0. |
--version, -v |
boolean |
— | Print the package version and exit 0. |
Exit codes
| Code | Meaning |
|---|---|
0 |
Success. Every input file produced a cleaned output. |
1 |
User error. Bad input path, unknown flag, or --in-place in a non-TTY context without --yes. |
2 |
Per-file cleaning error. At least one file failed. Other files in directory mode still complete. |
3 |
Unexpected error not classified above. |
Examples
# Single file → cv_clean.pdf alongside the input
pdf-cleaner cv.pdf
# Single file → custom output directory
pdf-cleaner cv.pdf --out ./out
# Directory of PDFs (top-level only, non-recursive)
pdf-cleaner ./input --out ./output
# Overwrite the originals — prompts in a TTY, requires --yes otherwise
pdf-cleaner cv.pdf --in-place
pdf-cleaner cv.pdf --in-place --yes
# Granular opt-out
pdf-cleaner cv.pdf --keep-links
pdf-cleaner cv.pdf --keep-metadataCleanInput
The bytes that clean accepts.
type CleanInput = Uint8Array | ArrayBuffer;Node Buffer is accepted via structural compatibility — Buffer extends Uint8Array.
CleanOptions
Per-call overrides for clean. Every field is optional; the two boolean flags default to false so the defaults strip aggressively.
| Option | Type | Default | Description |
|---|---|---|---|
keepLinks |
boolean |
false |
Preserve /Link annotations on every page. Other annotation subtypes (text notes, highlights, form widgets) are preserved regardless. |
keepMetadata |
boolean |
false |
Preserve the Info dictionary (/Title, /Author, /Subject, /Keywords, /Creator, /Producer, /CreationDate, /ModDate) and any XMP metadata stream attached to the catalog. |
signal |
AbortSignal |
(none) | Cancel the operation cooperatively. Checked before pdf-lib load, after load, and after the strip phase. Aborting throws CleanError with code: 'ABORTED' and cause = signal.reason. The cancellation is non-cooperative inside pdf-lib itself — once load or save is entered, it runs to completion before the next check fires. |
CleanError
Thrown by clean for inputs it cannot process. Inherits from Error, supports Error.cause for wrapping.
class CleanError extends Error {
readonly name: 'CleanError';
readonly code: CleanErrorCode;
constructor(code: CleanErrorCode, message: string, options?: { cause?: unknown });
}The code field is a stable string discriminant safe for runtime branching. See Errors for the code list.
CleanErrorCode
type CleanErrorCode = 'INVALID_INPUT' | 'PARSE_FAILED' | 'ENCRYPTED' | 'ABORTED';clean(input, options?)
Strip metadata and links from a PDF and return the cleaned bytes.
Parameters
| Option | Type | Default | Description |
|---|---|---|---|
input |
CleanInput |
(required) | The PDF bytes. Must be non-empty. |
options? |
CleanOptions |
{} |
Per-call overrides. See the type for each field. |
Returns — Promise<Uint8Array>. The cleaned PDF bytes. clean() is idempotent on the observable surface — calling it on its own output is a no-op.
Throws — CleanError. INVALID_INPUT when the input is not bytes, is null, or is empty. PARSE_FAILED when the bytes do not parse as a valid PDF; the underlying parser error is preserved on Error.cause. ENCRYPTED when the PDF carries an /Encrypt entry — decrypt before cleaning. ABORTED when options.signal fires; signal.reason is preserved on Error.cause.
Notes — see bench/baseline.md for the round-trip numbers and the regression budget.
Examples
// Default — strip both links and metadata
const cleaned = await clean(bytes);// Wipe metadata, keep working hyperlinks
const cleaned = await clean(bytes, { keepLinks: true });// Pre-publish CV — strip metadata, keep links so the LinkedIn URL still clicks
import { readFile, writeFile } from 'node:fs/promises';
const original = await readFile('cv.pdf');
const cleaned = await clean(original, { keepLinks: true });
await writeFile('cv_public.pdf', cleaned);// Server-side use — bound the work with an AbortSignal
const cleaned = await clean(bytes, { signal: AbortSignal.timeout(5000) });| Code | Description |
|---|---|
INVALID_INPUT |
input is missing, null, not a CleanInput, or empty. |
PARSE_FAILED |
The bytes do not parse as a valid PDF. The original parser error is available via Error.cause. |
ENCRYPTED |
The PDF carries an /Encrypt trailer entry. Decrypt before cleaning. |
ABORTED |
options.signal fired during the operation. signal.reason is preserved on Error.cause. |
- Stripping is limited to
/Linkannotations and the standard metadata surfaces (Info dictionary plus any XMP metadata stream). Other annotation subtypes are preserved. - Encrypted PDFs are rejected with
ENCRYPTED. Decrypt them first. - Directory mode walks the top level only — subdirectories are not traversed.
- Text content, embedded images, page geometry, fonts, bookmarks, and form fields are preserved untouched.
- Out of scope: text redaction, watermark removal, compression, OCR, JavaScript action stripping, attachment removal.
| Feature | pdf-lib (raw) |
qpdf / node-qpdf2 |
exiftool-vendored |
muhammara |
@coroboros/pdf-cleaner |
|---|---|---|---|---|---|
| Strip Info dictionary | DIY | DIY (binary flags) | yes (-all=) |
DIY | yes |
| Strip XMP metadata stream | DIY | DIY (binary flags) | yes (-all=) |
DIY | yes |
Strip /Link annotations |
DIY | DIY | no | DIY | yes |
| Pure JS — no native binary | yes | no (qpdf binary) | no (Perl binary) | no (C++ bindings) | yes |
| In-process — no network upload | yes | yes | yes | yes | yes |
| CLI included | no | no (lib only) | no (lib only) | no | yes |
AbortSignal cancellation |
no | no | no | no | yes |
Coded ENCRYPTED rejection |
throws (no code) | no | n/a | unknown | yes |
The market gap is in-process strip plus a bundled CLI. pdf-lib ships the engine but no strip helper; every byte you remove, you write the code for. qpdf and muhammara carry native binaries, and the npm wrappers focus on encryption rather than metadata. exiftool clears the Info dict and XMP cleanly but never touches the annotation array, so /Link rectangles stay clickable in the output. Hosted cleaners cover everything except the one rule that mattered first: the file leaves your machine. @coroboros/pdf-cleaner runs the three strips in-process on pdf-lib. The same install ships a coded CleanError, AbortSignal cancellation at every phase, an npx CLI, and an ENCRYPTED rejection code for password-protected PDFs.
Bug reports and PRs welcome.
- Open an issue before submitting non-trivial PRs.
- Commits follow Conventional Commits.
- Run
pnpm lint && pnpm typecheck && pnpm testbefore pushing. - Run
pnpm benchagainstbench/baseline.mdwhen touchingsrc/clean.ts— no regression > 10 % at fixed feature set. - Target the
mainbranch.