Fix prompt injection vulnerability in AI chat endpoint#1599
Conversation
The file content endpoint at /api/repositories/[id]/files/content accepted an unsanitized 'path' query parameter that was interpolated directly into raw.githubusercontent.com URLs. This allowed: 1. Path traversal via ../ to read sensitive files (.env, config, secrets) 2. Null byte injection to bypass extension checks 3. Binary file downloads for data exfiltration 4. Unbounded file reads for DoS via memory exhaustion Changes to app/api/repositories/[id]/files/content/route.ts: - Added validateFilePath(): regex-based path traversal detection - Added encodePathSegments(): per-segment URL encoding preserving structure - Added isTextFile(): binary file rejection (only text files allowed) - Added 1MB content size limit (both Content-Length header and actual size) - Added 10s fetch timeout via AbortSignal.timeout - Path segments validated: no .., ., null bytes, backslashes, leading / - Path characters restricted to [a-zA-Z0-9._-\/] Changes to app/api/auth/signup/route.ts: - Removed hardcoded 'fallback_secret' for IP fingerprinting - Falls back to JWT_SECRET if NEXTAUTH_SECRET is not set - Gracefully skips fingerprinting if no secret is available Tests: 59 tests covering all attack vectors and edge cases - Path traversal: 9 tests (../, encoded, backslash, null bytes, dots, absolute) - Input validation: 6 tests (missing, invalid ID, long path, valid paths) - Binary rejection: 13 tests (PNG, JPG, PDF, EXE, DOCX, ZIP, MP3, etc.) - URL construction: 3 tests (segment encoding, branch encoding, defaults) - Error handling: 5 tests (404, GitHub errors, non-GitHub, internal, timeout) - Auth: 2 tests - GitHub URL parsing: 6 tests (.git suffix, trailing slash, non-GitHub) - Content size limits: 3 tests (Content-Length, actual size, within limit) - Path edge cases: 7 tests (consecutive slashes, trailing slash, special chars) - GitHub API responses: 3 tests (rate limit, server error, unauthorized) Closes nisshchayarathi#1581
Add prompt injection defense utilities and update the AI chat route to: 1. Sanitize repository context to remove dangerous instruction patterns 2. Prepend safety system prompt that overrides any potential injection 3. Use structured delimiters to separate user questions from repository data 4. Build fully grounded prompts that treat file contents as read-only reference material The fix addresses the root cause where malicious files in a repository could contain instructions that override the AI's safety guidelines. The solution implements defense-in-depth with multiple layers of protection.
|
@anshika1179 is attempting to deploy a commit to the Nisshchaya's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThis PR adds multi-layered defense against prompt-injection attacks on the AI chat endpoint. It introduces a sanitization utility library with injection-pattern detection, integrates it into the chat route to neutralize embedded malicious instructions, hardens file-content retrieval with path validation and type filtering, and removes a hardcoded secret from signup authentication. ChangesPrompt Injection Defense & Content Security
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🎉 Thanks for your contribution, @atul-upadhyay-7!Your PR has passed our automated GSSoC quality checks. Here's a quick summary:
A maintainer will review your PR soon. Please be patient and available for feedback. 💪 GSSoC'26 automation · Maintainer: @nisshchayarathi |
1 similar comment
🎉 Thanks for your contribution, @atul-upadhyay-7!Your PR has passed our automated GSSoC quality checks. Here's a quick summary:
A maintainer will review your PR soon. Please be patient and available for feedback. 💪 GSSoC'26 automation · Maintainer: @nisshchayarathi |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/api/ai/chat/route.ts`:
- Around line 275-286: The prompt currently concatenates knowledgeContext
directly into enhancedPrompt, bypassing sanitization and the repository data
envelope; update assembleChatPrompt (or its call) to accept a dedicated
maintainerContext/knowledgeContext parameter and ensure knowledgeContext is run
through sanitizeTextContent and wrapped in the same <REPOSITORY_DATA> (or
labeled) block before being combined with buildSafetySystemPrompt and
contextPayload so all maintainer-provided text passes the same prompt-injection
defenses.
In `@app/api/repositories/`[id]/files/content/route.ts:
- Around line 103-108: The current check (using lastSlash, filename and if
(!filename.includes(".")) return true) is too permissive for extensionless
files; replace that unconditional allow with an explicit allowlist: introduce a
Set of allowed extensionless filenames (e.g., "README", "LICENSE", "Makefile",
"Dockerfile", "CHANGELOG", "TODO", "Procfile", etc.) and change the branch to
return true only if filename exists in that Set (e.g.,
ALLOWED_EXTENSIONLESS.has(filename)); keep the existing dot-based rejection for
other files. Ensure you update any tests or callers of this logic that expect
extensionless files to be filtered accordingly.
In `@lib/utils/promptSanitization.ts`:
- Around line 153-156: The truncation in promptSanitization (variable joined and
MAX_TOTAL_CONTEXT_CHARS) can cut off the closing </REPOSITORY_DATA> tag; change
the truncation logic to detect whether the truncated substring would leave an
unmatched opening <REPOSITORY_DATA> (e.g., count or search for last
"<REPOSITORY_DATA>" and "</REPOSITORY_DATA>" in joined.substring(0,
MAX_TOTAL_CONTEXT_CHARS)) and if so append a closing "</REPOSITORY_DATA>" before
adding the "\n[additional context truncated]" marker so the tags remain balanced
and the subsequent <USER_QUESTION> block stays outside repository data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 3822de47-a988-4ef9-954a-c5c18c909432
📒 Files selected for processing (6)
app/api/ai/chat/route.tsapp/api/auth/signup/route.tsapp/api/repositories/[id]/files/content/route.tslib/services/__tests__/file-content-security.test.tslib/utils/__tests__/promptSanitization.test.tslib/utils/promptSanitization.ts
| const safetySystemPrompt = buildSafetySystemPrompt(repository.name); | ||
| const contextPayload = assembleChatPrompt({ | ||
| repositoryName: repository.name, | ||
| repositoryDescription: repository.description || "N/A", | ||
| languages: langText, | ||
| stats: statsText, | ||
| retrievedFilesContent, | ||
| crossRepoContext: "", | ||
| question, | ||
| }); | ||
|
|
||
| User Question: ${question} | ||
| `; | ||
| const enhancedPrompt = `${safetySystemPrompt}\n\n${knowledgeContext}${contextPayload}`; |
There was a problem hiding this comment.
Route maintainer knowledge through the same sanitized context boundary.
Line 286 injects knowledgeContext straight into the prompt, outside <REPOSITORY_DATA> and without sanitizeTextContent. That means instruction-like text in repository knowledge fields still bypasses the new prompt-injection defenses.
Suggested direction
- const contextPayload = assembleChatPrompt({
+ const contextPayload = assembleChatPrompt({
repositoryName: repository.name,
repositoryDescription: repository.description || "N/A",
languages: langText,
stats: statsText,
retrievedFilesContent,
- crossRepoContext: "",
+ crossRepoContext: knowledgeContext,
question,
});
- const enhancedPrompt = `${safetySystemPrompt}\n\n${knowledgeContext}${contextPayload}`;
+ const enhancedPrompt = `${safetySystemPrompt}\n\n${contextPayload}`;If knowledgeContext needs its own priority/label, I'd extend assembleChatPrompt to accept a dedicated maintainerContext block instead of concatenating it raw.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/api/ai/chat/route.ts` around lines 275 - 286, The prompt currently
concatenates knowledgeContext directly into enhancedPrompt, bypassing
sanitization and the repository data envelope; update assembleChatPrompt (or its
call) to accept a dedicated maintainerContext/knowledgeContext parameter and
ensure knowledgeContext is run through sanitizeTextContent and wrapped in the
same <REPOSITORY_DATA> (or labeled) block before being combined with
buildSafetySystemPrompt and contextPayload so all maintainer-provided text
passes the same prompt-injection defenses.
| // Allow files with no extension (often config files) | ||
| const lastSlash = filePath.lastIndexOf("/"); | ||
| const filename = lastSlash >= 0 ? filePath.substring(lastSlash + 1) : filePath; | ||
| if (!filename.includes(".")) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Allowing all extension-less files is overly permissive.
Binary executables on Unix often have no extension. The current logic allows any file without a dot in its name, which could let binary content through (e.g., compiled binaries, data files stored without extensions).
Consider using an explicit allowlist for common extensionless text files instead of allowing all:
Proposed fix: use explicit allowlist for extensionless files
+const ALLOWED_EXTENSIONLESS_FILES = new Set([
+ "makefile", "dockerfile", "procfile", "gemfile", "rakefile",
+ "license", "readme", "changelog", "contributing", "authors",
+ "codeowners", "vagrantfile", "brewfile", "justfile",
+]);
+
function isTextFile(filePath: string): boolean {
const textExtensions = [
".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs",
// ... existing extensions ...
- "Makefile", "Dockerfile", "Procfile",
- "LICENSE", "README", "CHANGELOG", "CONTRIBUTING",
];
const lowerPath = filePath.toLowerCase();
// Check if path ends with a known text extension
for (const ext of textExtensions) {
if (lowerPath.endsWith(ext)) {
return true;
}
}
- // Allow files with no extension (often config files)
+ // Allow specific known extensionless text files
const lastSlash = filePath.lastIndexOf("/");
const filename = lastSlash >= 0 ? filePath.substring(lastSlash + 1) : filePath;
- if (!filename.includes(".")) {
- return true;
+ if (!filename.includes(".") && ALLOWED_EXTENSIONLESS_FILES.has(filename.toLowerCase())) {
+ return true;
}
return false;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/api/repositories/`[id]/files/content/route.ts around lines 103 - 108, The
current check (using lastSlash, filename and if (!filename.includes(".")) return
true) is too permissive for extensionless files; replace that unconditional
allow with an explicit allowlist: introduce a Set of allowed extensionless
filenames (e.g., "README", "LICENSE", "Makefile", "Dockerfile", "CHANGELOG",
"TODO", "Procfile", etc.) and change the branch to return true only if filename
exists in that Set (e.g., ALLOWED_EXTENSIONLESS.has(filename)); keep the
existing dot-based rejection for other files. Ensure you update any tests or
callers of this logic that expect extensionless files to be filtered
accordingly.
| const joined = blocks.join("\n\n"); | ||
| if (joined.length > MAX_TOTAL_CONTEXT_CHARS) { | ||
| return joined.substring(0, MAX_TOTAL_CONTEXT_CHARS) + "\n[additional context truncated]"; | ||
| } |
There was a problem hiding this comment.
Keep <REPOSITORY_DATA> balanced when truncating.
Line 154 truncates the already-wrapped payload mid-string, so an oversized context can lose its closing </REPOSITORY_DATA> tag. That lets the following <USER_QUESTION> block appear inside repository data, which breaks the separation this defense depends on.
Possible fix
- const joined = blocks.join("\n\n");
- if (joined.length > MAX_TOTAL_CONTEXT_CHARS) {
- return joined.substring(0, MAX_TOTAL_CONTEXT_CHARS) + "\n[additional context truncated]";
- }
- return joined;
+ let total = 0;
+ const boundedBlocks: string[] = [];
+
+ for (const block of blocks) {
+ const separator = boundedBlocks.length > 0 ? "\n\n" : "";
+ const remaining = MAX_TOTAL_CONTEXT_CHARS - total - separator.length;
+
+ if (remaining <= 0) break;
+
+ if (block.length <= remaining) {
+ boundedBlocks.push(`${separator}${block}`);
+ total += separator.length + block.length;
+ continue;
+ }
+
+ const closingTag = "\n</REPOSITORY_DATA>";
+ const truncationNote = "\n[additional context truncated]";
+ const maxBodyLength = remaining - truncationNote.length - closingTag.length;
+
+ if (maxBodyLength > 0) {
+ const openTagEnd = block.indexOf(">\n") + 2;
+ const openTag = block.slice(0, openTagEnd);
+ const body = block.slice(openTagEnd, -closingTag.length).slice(0, maxBodyLength);
+ boundedBlocks.push(`${separator}${openTag}${body}${truncationNote}${closingTag}`);
+ }
+ break;
+ }
+
+ return boundedBlocks.join("");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/utils/promptSanitization.ts` around lines 153 - 156, The truncation in
promptSanitization (variable joined and MAX_TOTAL_CONTEXT_CHARS) can cut off the
closing </REPOSITORY_DATA> tag; change the truncation logic to detect whether
the truncated substring would leave an unmatched opening <REPOSITORY_DATA>
(e.g., count or search for last "<REPOSITORY_DATA>" and "</REPOSITORY_DATA>" in
joined.substring(0, MAX_TOTAL_CONTEXT_CHARS)) and if so append a closing
"</REPOSITORY_DATA>" before adding the "\n[additional context truncated]" marker
so the tags remain balanced and the subsequent <USER_QUESTION> block stays
outside repository data.
What this fixes
The AI chat endpoint was vulnerable to prompt injection attacks where malicious files in a repository could contain instructions that override the AI's safety guidelines. The endpoint combined user input with repository context without proper sanitization or separation, allowing attackers to inject harmful instructions.
Root cause
The chat route constructed prompts by directly concatenating repository content (from files retrieved via RAG) with user questions and system prompts. This allowed:
What changed
How to verify
Edge cases considered
Closes #1592
Summary by CodeRabbit
Release Notes
Bug Fixes