Skip to content

fix: add PDF-only file type validation to multer upload middleware#461

Open
ChetanSenta wants to merge 2 commits into
FireFistisDead:masterfrom
ChetanSenta:fix/pdf-upload-file-type-validation
Open

fix: add PDF-only file type validation to multer upload middleware#461
ChetanSenta wants to merge 2 commits into
FireFistisDead:masterfrom
ChetanSenta:fix/pdf-upload-file-type-validation

Conversation

@ChetanSenta
Copy link
Copy Markdown

@ChetanSenta ChetanSenta commented Jun 1, 2026

Summary

The /upload endpoint in server.js was using multer without any file type filter, allowing users to upload any file format (.txt, .png, .zip, .docx, etc.). These files were silently forwarded to the FastAPI RAG service which attempted to parse them as PDFs via PyPDFLoader, causing unhandled crashes and unhelpful 500 errors with no feedback to the user.

This PR adds a fileFilter to the multer configuration that validates both the MIME type (application/pdf) and file extension (.pdf) before accepting the upload. A dedicated Express error handler is also added to return a clean 400 Bad Request JSON response when an invalid file type is submitted.

Related issue

Closes #459

Testing

  • I ran the relevant checks locally
  • I verified the app still starts
  • I tested the affected flow end-to-end

Manual test cases performed:

Uploaded a .txt file → received 400 with { "error": "Only PDF files are allowed." } ✅
Uploaded a .png file → received 400 with { "error": "Only PDF files are allowed." } ✅
Uploaded a .zip file → received 400 with { "error": "Only PDF files are allowed." } ✅
Uploaded a valid .pdf file → processed correctly as before ✅
Asked a question after valid upload → RAG service responded correctly ✅

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas

Notes

Only server.js is modified — no changes to the RAG service, frontend, or dependencies
The fileFilter checks both mimetype AND file extension to handle edge cases where browsers misreport MIME types
This fix is a prerequisite for any future file management features (e.g. Issue #1 — delete uploaded PDFs after processing)

Security

  • No sensitive data included

Summary by CodeRabbit

  • New Features

    • Uploads now allow multiple files per request and are stored on the server.
  • Bug Fixes

    • PDF uploads now require both PDF MIME type and a .pdf extension for stronger validation.
    • Invalid file uploads return a clear HTTP 400 JSON error when a non-PDF is sent.
  • Chores

    • Uploaded files use a new unique timestamped filename convention.

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

Someone is attempting to deploy a commit to the firefistisdead's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Reworks Multer to store uploads in uploads/ with timestamp+UUID filenames, enforces PDF MIME type and .pdf extension via a fileFilter (removing prior limits), adds an Express error middleware returning 400 JSON for the specific PDF validation error, and adds a duplicate top-level crypto import.

Changes

Upload Validation & Error Handling

Layer / File(s) Summary
Duplicate top-level crypto import
server.js
Adds a second const crypto = require(...) near the top of server.js, creating a duplicate module-scope declaration.
Multer disk storage and PDF fileFilter
server.js
Multer diskStorage now writes to uploads/ with filenames ${Date.now()}-${crypto.randomUUID()}<ext>, enforces file.mimetype === 'application/pdf' and .pdf extension in fileFilter, and removes previous limits (size/single-file).
Upload error response handler
server.js
Adds global Express error middleware that intercepts errors with message "Only PDF files are allowed." and returns 400 JSON { error: err.message }; other errors are forwarded.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • FireFistisDead/pdf-qa-bot#293: The main PR’s server.js reworks the Multer PDF fileFilter and adds an error handler keyed off err.message === "Only PDF files are allowed.", which overlaps directly with the retrieved PR’s Multer error sanitization that maps the same “Only PDF files are allowed” failure—both touch the same server-side PDF-type validation/error path.
  • FireFistisDead/pdf-qa-bot#183: Both PRs modify server.js’s /upload Multer validation to reject non-.pdf/non-application/pdf uploads and align on the same "Only PDF files are allowed." 400-error behavior.
  • FireFistisDead/pdf-qa-bot#276: Both PRs modify server.js’s PDF upload handling/validation (Multer fileFilter and the exact “Only PDF files are allowed.” error behavior).

Suggested labels

level:intermediate, quality:clean

Suggested reviewers

  • FireFistisDead

Poem

A rabbit nibbles through the code, so spry, 🐇
Rejecting shapes that just don't qualify,
.pdf alone may cross the gate,
Errors now return at status eight (four oh oh!),
The upload path is tidy — hop, hop, hi!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive The PR includes a change to use crypto.randomUUID() for collision-resistant filenames, which is not explicitly mentioned in issue #459 requirements and may be an out-of-scope addition. Clarify whether the filename randomization via crypto.randomUUID() is a separate fix or if it was mentioned in related discussions; if unrelated to issue #459, consider moving to a separate PR.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding PDF-only file type validation to the multer upload middleware.
Description check ✅ Passed The description covers all key template sections: summary, related issue, comprehensive testing, checklist items, and notes; all required information is provided.
Linked Issues check ✅ Passed The PR fully implements all requirements from issue #459: multer fileFilter with MIME type and extension validation, error handling that returns 400 with proper JSON response, and comprehensive manual testing confirming expected behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added backend Express or API gateway work bug Something isn't working docs Documentation only enhancement New feature or request feature A new feature or improvement fix A targeted fix or cleanup frontend Frontend-related work invalid This doesn't seem right level:advanced question Further information is requested rag-service FastAPI / model service work type:security type:testing labels Jun 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server.js`:
- Around line 449-450: The current multer.diskStorage filename function uses
Date.now() + '-' + file.originalname which can collide; update the filename
callback (the filename function in multer.diskStorage) to generate a
collision-resistant name by using a UUID or crypto.randomBytes (e.g.,
crypto.randomUUID() or random hex) combined with a timestamp and the file
extension from path.extname(file.originalname), then pass that string to
cb(null, ...) instead of Date.now() + '-' + file.originalname so concurrent
uploads cannot overwrite each other.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 68f5cb26-b3c3-4700-9965-e12c404eb05a

📥 Commits

Reviewing files that changed from the base of the PR and between 055d3af and dec6e8e.

📒 Files selected for processing (1)
  • server.js

Comment thread server.js Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
server.js (2)

600-605: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The new 400 PDF handler is currently bypassed for /upload.

/upload already runs multerErrorHandler right after upload.single("file"), and that handler returns a 415 response for the same "Only PDF files are allowed." error. So callers still get the 415 payload, not the 400 JSON this new middleware is trying to standardize.

Suggested fix
-  if (err && err.message && err.message.includes("Only PDF files are allowed")) {
-    console.warn(`[upload] Invalid file type attempted: ${err.message}`);
-    return res.status(415).json({
-      error: "Invalid file type",
-      message: "Only PDF files are allowed. Please upload a valid PDF document.",
-    });
-  }
+  if (err?.message === "Only PDF files are allowed.") {
+    console.warn(`[upload] Invalid file type attempted: ${err.message}`);
+    return res.status(400).json({ error: err.message });
+  }

Then remove the duplicate app-level handler at Lines 1125-1129.

Also applies to: 1125-1129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server.js` around lines 600 - 605, The multer-level error handler
multerErrorHandler currently sends a 415 for the "Only PDF files are allowed."
message and prevents the new standardized 400 PDF middleware from running for
the /upload route (which calls upload.single("file") then multerErrorHandler).
Change multerErrorHandler so that when err.message includes "Only PDF files are
allowed" it does not send the response but instead calls next(err) to let the
app-level standardized PDF handler produce the 400 JSON; then remove the
duplicate app-level 415 handler that also checks for that message (the duplicate
currently returning 415) so only the standardized 400 PDF handler emits the
response.

449-467: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Re-add the Multer size limit.

upload no longer passes limits, even though Line 391 still computes MAX_PDF_SIZE_BYTES and multerErrorHandler still handles LIMIT_FILE_SIZE. As written, an arbitrarily large PDF will be fully written to uploads/ instead of being rejected early.

Suggested fix
-const upload = multer({ storage, fileFilter });
+const upload = multer({
+  storage,
+  fileFilter,
+  limits: {
+    fileSize: MAX_PDF_SIZE_BYTES,
+  },
+});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server.js` around lines 449 - 467, The multer instance created as upload no
longer enforces a file size limit so large PDFs can be fully written; add the
limits option when creating upload to reject oversized files early by passing
limits: { fileSize: MAX_PDF_SIZE_BYTES } to multer({ storage, fileFilter, limits
}) (ensure MAX_PDF_SIZE_BYTES referenced earlier is used), so multer will
trigger LIMIT_FILE_SIZE and be handled by multerErrorHandler instead of writing
huge files to uploads/.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@server.js`:
- Around line 600-605: The multer-level error handler multerErrorHandler
currently sends a 415 for the "Only PDF files are allowed." message and prevents
the new standardized 400 PDF middleware from running for the /upload route
(which calls upload.single("file") then multerErrorHandler). Change
multerErrorHandler so that when err.message includes "Only PDF files are
allowed" it does not send the response but instead calls next(err) to let the
app-level standardized PDF handler produce the 400 JSON; then remove the
duplicate app-level 415 handler that also checks for that message (the duplicate
currently returning 415) so only the standardized 400 PDF handler emits the
response.
- Around line 449-467: The multer instance created as upload no longer enforces
a file size limit so large PDFs can be fully written; add the limits option when
creating upload to reject oversized files early by passing limits: { fileSize:
MAX_PDF_SIZE_BYTES } to multer({ storage, fileFilter, limits }) (ensure
MAX_PDF_SIZE_BYTES referenced earlier is used), so multer will trigger
LIMIT_FILE_SIZE and be handled by multerErrorHandler instead of writing huge
files to uploads/.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a8b48b2b-3fff-490e-b842-7ec2e44aa641

📥 Commits

Reviewing files that changed from the base of the PR and between dec6e8e and 46bc979.

📒 Files selected for processing (1)
  • server.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Express or API gateway work bug Something isn't working docs Documentation only enhancement New feature or request feature A new feature or improvement fix A targeted fix or cleanup frontend Frontend-related work invalid This doesn't seem right level:advanced question Further information is requested rag-service FastAPI / model service work type:security type:testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: No file type validation on upload allows non-PDF files to crash the RAG service

1 participant