@coroboros/pdf-cleaner

Strip metadata and links from PDFs locally — no upload, no tracking.

Removes /Link annotations and wipes the Info dictionary plus any XMP metadata stream attached to the catalog. Ships as both a programmatic library and an npx CLI. One runtime dependency: pdf-lib.

Requirements

Node.js >= 22 LTS. Use fnm for fast Rust-based version switching.
Any modern package manager: pnpm, npm, yarn, bun.

Install

As a library

pnpm add @coroboros/pdf-cleaner

npm install @coroboros/pdf-cleaner

yarn add @coroboros/pdf-cleaner

bun add @coroboros/pdf-cleaner

As a CLI

# Run without installing
npx @coroboros/pdf-cleaner cv.pdf

# Install globally for repeated use
pnpm add -g @coroboros/pdf-cleaner
pdf-cleaner --help

Usage

Programmatic

import { readFile, writeFile } from 'node:fs/promises';
import { clean } from '@coroboros/pdf-cleaner';

const cleaned = await clean(await readFile('cv.pdf'));
await writeFile('cv_clean.pdf', cleaned);

CLI

npx @coroboros/pdf-cleaner cv.pdf

Why this exists

PDFs carry hidden authorship. The Info dictionary embeds /Title, /Author, /Producer, creation and modification dates, and any XMP metadata stream attached to the catalog. Hyperlinks travel via /Link annotations on each page. Hosted cleaners strip both, then upload the bytes. @coroboros/pdf-cleaner runs the same strips in-process on a single dependency (pdf-lib). No network calls, no telemetry. See bench/baseline.md for the round-trip numbers and the regression budget.

CLI

pdf-cleaner <input> [options]

Strip metadata and links from a PDF or a directory of PDFs. Writes the cleaned bytes alongside the input with a _clean.pdf suffix unless --out or --in-place is set.

Arguments

Arg	Type	Description
`<input>`	`string` (required)	A `.pdf` file, or a directory of `.pdf` files. Directory mode is top-level only — subdirectories are not traversed.

Options

Flag	Type	Default	Description
`--out <dir>`	`string`	alongside input	Output directory for cleaned files. Created if missing.
`--in-place`	`boolean`	`false`	Overwrite the input(s) in place. TTY prompts for confirmation; non-TTY contexts require `--yes`.
`--yes`, `-y`	`boolean`	`false`	Skip the `--in-place` confirmation prompt. Required to run `--in-place` in CI, scripts, or any non-TTY context.
`--keep-links`	`boolean`	`false`	Preserve `/Link` annotations. Other annotation subtypes are preserved regardless.
`--keep-metadata`	`boolean`	`false`	Preserve the Info dictionary (`/Title`, `/Author`, `/Subject`, `/Keywords`, `/Creator`, `/Producer`, `/CreationDate`, `/ModDate`) and any XMP metadata stream.
`--help`, `-h`	`boolean`	—	Print the usage block and exit `0`.
`--version`, `-v`	`boolean`	—	Print the package version and exit `0`.

Exit codes

Code	Meaning
`0`	Success. Every input file produced a cleaned output.
`1`	User error. Bad input path, unknown flag, or `--in-place` in a non-TTY context without `--yes`.
`2`	Per-file cleaning error. At least one file failed. Other files in directory mode still complete.
`3`	Unexpected error not classified above.

Examples

# Single file → cv_clean.pdf alongside the input
pdf-cleaner cv.pdf

# Single file → custom output directory
pdf-cleaner cv.pdf --out ./out

# Directory of PDFs (top-level only, non-recursive)
pdf-cleaner ./input --out ./output

# Overwrite the originals — prompts in a TTY, requires --yes otherwise
pdf-cleaner cv.pdf --in-place
pdf-cleaner cv.pdf --in-place --yes

# Granular opt-out
pdf-cleaner cv.pdf --keep-links
pdf-cleaner cv.pdf --keep-metadata

API

Types

CleanInput

The bytes that clean accepts.

type CleanInput = Uint8Array | ArrayBuffer;

Node Buffer is accepted via structural compatibility — Buffer extends Uint8Array.

CleanOptions

Per-call overrides for clean. Every field is optional; the two boolean flags default to false so the defaults strip aggressively.

Option	Type	Default	Description
`keepLinks`	`boolean`	`false`	Preserve `/Link` annotations on every page. Other annotation subtypes (text notes, highlights, form widgets) are preserved regardless.
`keepMetadata`	`boolean`	`false`	Preserve the Info dictionary (`/Title`, `/Author`, `/Subject`, `/Keywords`, `/Creator`, `/Producer`, `/CreationDate`, `/ModDate`) and any XMP metadata stream attached to the catalog.
`signal`	`AbortSignal`	(none)	Cancel the operation cooperatively. Checked before pdf-lib `load`, after `load`, and after the strip phase. Aborting throws `CleanError` with `code: 'ABORTED'` and `cause = signal.reason`. The cancellation is non-cooperative inside pdf-lib itself — once `load` or `save` is entered, it runs to completion before the next check fires.

CleanError

Thrown by clean for inputs it cannot process. Inherits from Error, supports Error.cause for wrapping.

class CleanError extends Error {
  readonly name: 'CleanError';
  readonly code: CleanErrorCode;
  constructor(code: CleanErrorCode, message: string, options?: { cause?: unknown });
}

The code field is a stable string discriminant safe for runtime branching. See Errors for the code list.

CleanErrorCode

type CleanErrorCode = 'INVALID_INPUT' | 'PARSE_FAILED' | 'ENCRYPTED' | 'ABORTED';

Cleaning

clean(input, options?)

Strip metadata and links from a PDF and return the cleaned bytes.

Parameters

Option	Type	Default	Description
`input`	`CleanInput`	(required)	The PDF bytes. Must be non-empty.
`options?`	`CleanOptions`	`{}`	Per-call overrides. See the type for each field.

Returns — Promise<Uint8Array>. The cleaned PDF bytes. clean() is idempotent on the observable surface — calling it on its own output is a no-op.

Throws — CleanError. INVALID_INPUT when the input is not bytes, is null, or is empty. PARSE_FAILED when the bytes do not parse as a valid PDF; the underlying parser error is preserved on Error.cause. ENCRYPTED when the PDF carries an /Encrypt entry — decrypt before cleaning. ABORTED when options.signal fires; signal.reason is preserved on Error.cause.

Notes — see bench/baseline.md for the round-trip numbers and the regression budget.

Examples

// Default — strip both links and metadata
const cleaned = await clean(bytes);

// Wipe metadata, keep working hyperlinks
const cleaned = await clean(bytes, { keepLinks: true });

// Pre-publish CV — strip metadata, keep links so the LinkedIn URL still clicks
import { readFile, writeFile } from 'node:fs/promises';
const original = await readFile('cv.pdf');
const cleaned = await clean(original, { keepLinks: true });
await writeFile('cv_public.pdf', cleaned);

// Server-side use — bound the work with an AbortSignal
const cleaned = await clean(bytes, { signal: AbortSignal.timeout(5000) });

Errors

Code	Description
`INVALID_INPUT`	`input` is missing, `null`, not a `CleanInput`, or empty.
`PARSE_FAILED`	The bytes do not parse as a valid PDF. The original parser error is available via `Error.cause`.
`ENCRYPTED`	The PDF carries an `/Encrypt` trailer entry. Decrypt before cleaning.
`ABORTED`	`options.signal` fired during the operation. `signal.reason` is preserved on `Error.cause`.

Limitations

Stripping is limited to /Link annotations and the standard metadata surfaces (Info dictionary plus any XMP metadata stream). Other annotation subtypes are preserved.
Encrypted PDFs are rejected with ENCRYPTED. Decrypt them first.
Directory mode walks the top level only — subdirectories are not traversed.
Text content, embedded images, page geometry, fonts, bookmarks, and form fields are preserved untouched.
Out of scope: text redaction, watermark removal, compression, OCR, JavaScript action stripping, attachment removal.

Compared to alternatives

Feature	`pdf-lib` (raw)	`qpdf` / `node-qpdf2`	`exiftool-vendored`	`muhammara`	`@coroboros/pdf-cleaner`
Strip Info dictionary	DIY	DIY (binary flags)	yes (`-all=`)	DIY	yes
Strip XMP metadata stream	DIY	DIY (binary flags)	yes (`-all=`)	DIY	yes
Strip `/Link` annotations	DIY	DIY	no	DIY	yes
Pure JS — no native binary	yes	no (qpdf binary)	no (Perl binary)	no (C++ bindings)	yes
In-process — no network upload	yes	yes	yes	yes	yes
CLI included	no	no (lib only)	no (lib only)	no	yes
`AbortSignal` cancellation	no	no	no	no	yes
Coded `ENCRYPTED` rejection	throws (no code)	no	n/a	unknown	yes

The market gap is in-process strip plus a bundled CLI. pdf-lib ships the engine but no strip helper; every byte you remove, you write the code for. qpdf and muhammara carry native binaries, and the npm wrappers focus on encryption rather than metadata. exiftool clears the Info dict and XMP cleanly but never touches the annotation array, so /Link rectangles stay clickable in the output. Hosted cleaners cover everything except the one rule that mattered first: the file leaves your machine. @coroboros/pdf-cleaner runs the three strips in-process on pdf-lib. The same install ships a coded CleanError, AbortSignal cancellation at every phase, an npx CLI, and an ENCRYPTED rejection code for password-protected PDFs.

Contributing

Bug reports and PRs welcome.

Open an issue before submitting non-trivial PRs.
Commits follow Conventional Commits.
Run pnpm lint && pnpm typecheck && pnpm test before pushing.
Run pnpm bench against bench/baseline.md when touching src/clean.ts — no regression > 10 % at fixed feature set.
Target the main branch.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
assets		assets
bench		bench
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.node-version		.node-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE.md		LICENSE.md
README.md		README.md
biome.json		biome.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@coroboros/pdf-cleaner

Contents

Requirements

Install

Usage

Why this exists

CLI

API

Types

Cleaning

Errors

Limitations

Compared to alternatives

Contributing

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@coroboros/pdf-cleaner

Contents

Requirements

Install

Usage

Why this exists

CLI

API

Types

Cleaning

Errors

Limitations

Compared to alternatives

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages