Skip to content

Conversation

@shebinleo
Copy link
Owner

@shebinleo shebinleo commented Jul 13, 2025

Extract all embedded images from PDFs.

// From file path
const imagePaths = await pdf2html.extractImages('path/to/document.pdf');
console.log('Extracted images:', imagePaths);
// Output: ['/absolute/path/to/files/image/document1.jpg', '/absolute/path/to/files/image/document2.png', ...]

// From buffer
const pdfBuffer = fs.readFileSync('path/to/document.pdf');
const imagePaths = await pdf2html.extractImages(pdfBuffer);

// With custom output directory
const imagePaths = await pdf2html.extractImages(pdfBuffer, {
    outputDirectory: './extracted-images', // Custom output directory
});

// With custom buffer size for large PDFs
const imagePaths = await pdf2html.extractImages('large-document.pdf', {
    outputDirectory: './output',
    maxBuffer: 1024 * 1024 * 10, // 10MB buffer
});

@sonarqubecloud
Copy link

@github-actions
Copy link

Coverage after merging extract-images-from-pdf into main will be

98.48%

Coverage Report
FileStmtsBranchesFuncsLinesUncovered Lines
index.js100%100%100%100%
lib
   CommandExecutor.js94.74%85.71%100%96%34–35
   FileManager.js96.97%100%71.43%100%
   HTMLParser.js100%100%100%100%
   ImageProcessor.js100%100%100%100%
   PDFBoxWrapper.js86.44%61.54%88.89%94.59%53–54, 60, 83, 83, 83, 83
   PDFProcessor.js98.65%94.44%100%100%82
   TikaWrapper.js100%100%100%100%
   config.js100%100%100%100%
   errors.js100%100%100%100%

@shebinleo shebinleo merged commit ae61af5 into main Jul 13, 2025
5 checks passed
@shebinleo shebinleo deleted the extract-images-from-pdf branch July 13, 2025 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants