Skip to content

fix(rollback): use dynamic repository default branch instead of hardcoded 'main'#1604

Open
atul-upadhyay-7 wants to merge 3 commits into
nisshchayarathi:mainfrom
atul-upadhyay-7:fix/issue-1591
Open

fix(rollback): use dynamic repository default branch instead of hardcoded 'main'#1604
atul-upadhyay-7 wants to merge 3 commits into
nisshchayarathi:mainfrom
atul-upadhyay-7:fix/issue-1591

Conversation

@atul-upadhyay-7
Copy link
Copy Markdown
Collaborator

@atul-upadhyay-7 atul-upadhyay-7 commented Jun 1, 2026

This PR fixes the emergency rollback service which was previously using a hardcoded 'main' base branch when creating the emergency rollback PR. Now it dynamically fetches the repository's default branch using the GitHub API.

Closes #1591

Summary by CodeRabbit

  • New Features

    • Rollback PRs now dynamically determine base branch from repository settings.
  • Bug Fixes

    • File content endpoint now validates file paths, restricts to text files, and enforces 1MB content size limits.
    • Auth error logging now properly respects environment configuration.
    • Chat API now sanitizes repository context in prompts before processing.
  • Tests

    • Added comprehensive test coverage for file content security validations and chat prompt handling.

The file content endpoint at /api/repositories/[id]/files/content accepted
an unsanitized 'path' query parameter that was interpolated directly into
raw.githubusercontent.com URLs. This allowed:

1. Path traversal via ../ to read sensitive files (.env, config, secrets)
2. Null byte injection to bypass extension checks
3. Binary file downloads for data exfiltration
4. Unbounded file reads for DoS via memory exhaustion

Changes to app/api/repositories/[id]/files/content/route.ts:
- Added validateFilePath(): regex-based path traversal detection
- Added encodePathSegments(): per-segment URL encoding preserving structure
- Added isTextFile(): binary file rejection (only text files allowed)
- Added 1MB content size limit (both Content-Length header and actual size)
- Added 10s fetch timeout via AbortSignal.timeout
- Path segments validated: no .., ., null bytes, backslashes, leading /
- Path characters restricted to [a-zA-Z0-9._-\/]

Changes to app/api/auth/signup/route.ts:
- Removed hardcoded 'fallback_secret' for IP fingerprinting
- Falls back to JWT_SECRET if NEXTAUTH_SECRET is not set
- Gracefully skips fingerprinting if no secret is available

Tests: 59 tests covering all attack vectors and edge cases
- Path traversal: 9 tests (../, encoded, backslash, null bytes, dots, absolute)
- Input validation: 6 tests (missing, invalid ID, long path, valid paths)
- Binary rejection: 13 tests (PNG, JPG, PDF, EXE, DOCX, ZIP, MP3, etc.)
- URL construction: 3 tests (segment encoding, branch encoding, defaults)
- Error handling: 5 tests (404, GitHub errors, non-GitHub, internal, timeout)
- Auth: 2 tests
- GitHub URL parsing: 6 tests (.git suffix, trailing slash, non-GitHub)
- Content size limits: 3 tests (Content-Length, actual size, within limit)
- Path edge cases: 7 tests (consecutive slashes, trailing slash, special chars)
- GitHub API responses: 3 tests (rate limit, server error, unauthorized)

Closes nisshchayarathi#1581
Add prompt injection defense utilities and update the AI chat route to:
1. Sanitize repository context to remove dangerous instruction patterns
2. Prepend safety system prompt that overrides any potential injection
3. Use structured delimiters to separate user questions from repository data
4. Build fully grounded prompts that treat file contents as read-only reference material

The fix addresses the root cause where malicious files in a repository could contain instructions that override the AI's safety guidelines. The solution implements defense-in-depth with multiple layers of protection.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

@anshika1179 is attempting to deploy a commit to the Nisshchaya's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR hardens three critical security paths: RAG chat prompts (injection defense), file-content fetching (path/size validation), and incident rollback (default-branch fix). It adds 1,200+ lines of sanitization utilities and tests while fixing the hardcoded "main" branch bug affecting all incident response on non-main repositories.

Changes

Prompt Injection Defense for RAG Chat

Layer / File(s) Summary
Prompt sanitization utility module
lib/utils/promptSanitization.ts
Five core utilities (sanitizeTextContent, buildDelimitedContextBlock, buildSafetySystemPrompt, wrapUserQuestion, assembleChatPrompt) define regex-based injection-phrase detection, per-file/total-context character limits, and unified grounded-prompt assembly with isolated repository context blocks.
Prompt sanitization test suite
lib/utils/__tests__/promptSanitization.test.ts
797 lines of Jest coverage validating injection-pattern removal, legitimate-text preservation, truncation, delimited-block isolation, safety-rule completeness, and end-to-end assembleChatPrompt integration across multiple injection vectors and edge cases.
Chat API endpoint integration
app/api/ai/chat/route.ts
Imports sanitization helpers, sanitizes retrieved file contents and cross-repo context before embedding, computes repository metadata (languages, stats) into variables, and constructs enhancedPrompt using safety system prompt plus structured assembleChatPrompt output instead of hardcoded template.

File-Content Endpoint Path Validation and Encoding

Layer / File(s) Summary
Path validation and encoding helpers
app/api/repositories/[id]/files/content/route.ts (lines 5–112)
Introduces constants for maximum path length, dangerous-pattern detection regex, character/extension allowlists, and helper functions (validateFilePath, encodePathSegment) to reject traversal vectors, enforce text-file-only serving, and safely encode path segments.
GET handler refactor with security controls
app/api/repositories/[id]/files/content/route.ts (lines 124–226)
Validates id/path upfront with granular 400s, applies path/binary validation before repository lookup, parses repository URL to confirm GitHub, constructs raw URLs using per-segment encoding, fetches with text headers and 10s timeout, enforces 1MB cap via headers and body inspection, and updates error responses.
File-content security test suite
lib/services/__tests__/file-content-security.test.ts
691 lines of Jest coverage validating path traversal prevention, input validation, binary-file rejection with allowlists, URL-construction safety, GitHub URL parsing, error handling without leakage, authentication/userId propagation, size limits and mismatches, edge cases, and GitHub API response handling.

Rollback PR Default Branch Dynamic Resolution

Layer / File(s) Summary
Dynamic default-branch resolution in rollback service
lib/services/rollback-pr.ts
Calls githubService.getRepository(owner, repo) to read repository.default_branch and use it as PR base instead of hardcoded "main", reorganizing PR creation to include repository lookup before body generation. Fixes critical incident-response failure for repos using master, develop, or other default branches.
Incident pipeline test mocking for default-branch
lib/services/__tests__/incident-pipeline.test.ts
Extends githubService Jest mock with getRepository function and wires mocks returning default_branch: "main" into three rollback test scenarios.

Auth Signup Environment-Aware IP Fingerprinting

Layer / File(s) Summary
IP fingerprinting conditional on secret availability
app/api/auth/signup/route.ts
HMAC-based ipFingerprint in error logging now derives from NEXTAUTH_SECRET/JWT_SECRET only when present; falls back to "unknown" instead of hardcoded "fallback_secret".

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • nisshchayarathi/gitverse-nextjs#1392: Modifies the RAG retrieval flow in app/api/ai/chat/route.ts for embedding GitHub file contents, directly related to this PR's prompt-sanitization integration.
  • nisshchayarathi/gitverse-nextjs#1497: Introduces the rollback service that hardcodes base: "main", which this PR fixes by using repository.default_branch dynamically.
  • nisshchayarathi/gitverse-nextjs#646: Changes how default_branch is fetched and persisted from GitHub, supporting this PR's dynamic branch-resolution in rollback PR creation.

Suggested labels

bug, security, level:critical

Poem

🐰 A rabbit springs forth with a shield so bright,
Prompt injections scatter in the sanitized night,
Files are validated, branches run true,
Default homes found where repos brew—
Security hardens from root through the sky! 🔒✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Multiple changes beyond the stated objective detected: prompt sanitization utilities, chat route security improvements, signup auth fingerprinting, and file content endpoint path validation are unrelated to fixing hardcoded 'main' branch issue. Isolate rollback PR fix into a focused PR containing only rollback-pr.ts, related incident-pipeline tests, and rollback-pr tests. Submit prompt sanitization, chat security, auth, and file-content security changes in separate PRs.
Docstring Coverage ⚠️ Warning Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing the rollback service to use dynamic repository default branch instead of hardcoded 'main'.
Linked Issues check ✅ Passed The PR meets the core requirement of issue #1591: dynamically fetching repository default_branch via GitHub API for rollback PR creation instead of hardcoding 'main'. Implementation in rollback-pr.ts uses repositoryService to fetch metadata and applies it to PR base branch logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the GSSoC'26 Part of GirlScript Summer of Code 2026 label Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🎉 Thanks for your contribution, @atul-upadhyay-7!

Your PR has passed our automated GSSoC quality checks. Here's a quick summary:

Check Status
PR description ✅ Provided
PR title ✅ Meaningful
Linked issue ✅ Found
Change size ✅ Looks good (2018 lines across 8 file(s))

A maintainer will review your PR soon. Please be patient and available for feedback. 💪

GSSoC'26 automation · Maintainer: @nisshchayarathi

Comment thread lib/services/rollback-pr.ts Dismissed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🎉 Thanks for your contribution, @atul-upadhyay-7!

Your PR has passed our automated GSSoC quality checks. Here's a quick summary:

Check Status
PR description ✅ Provided
PR title ✅ Meaningful
Linked issue ✅ Found
Change size ✅ Looks good (2018 lines across 8 file(s))

A maintainer will review your PR soon. Please be patient and available for feedback. 💪

GSSoC'26 automation · Maintainer: @nisshchayarathi

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
app/api/ai/chat/route.ts (1)

233-237: ⚡ Quick win

Pass cross-repo context through crossRepoContext instead of merging into source_code.

You currently append cross-repo text into retrievedFilesContent and then pass crossRepoContext: "". This bypasses the dedicated cross_repository labeling in assembleChatPrompt.

Suggested refactor
-    let retrievedFilesContent = "";
+    let retrievedFilesContent = "";
+    let crossRepoContextText = "";
@@
-            const sanitizedCross = crossRepoContext.map(ctx => sanitizeTextContent(ctx)).join("\n\n");
-            retrievedFilesContent += "\n\n--- CROSS-REPOSITORY CONTEXT ---\n" + sanitizedCross;
+            crossRepoContextText = crossRepoContext
+              .map(ctx => sanitizeTextContent(ctx))
+              .join("\n\n");
@@
-      crossRepoContext: "",
+      crossRepoContext: crossRepoContextText,

Also applies to: 276-283

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/ai/chat/route.ts` around lines 233 - 237, The code is merging
cross-repository text into retrievedFilesContent and then calling
assembleChatPrompt with crossRepoContext: ""—instead, preserve the cross-repo
results from orgRagIndex.retrieveCrossRepositoryContext (stored in
crossRepoContext), stop appending them into retrievedFilesContent (remove the
retrievedFilesContent += "... CROSS-REPOSITORY CONTEXT ..." concat), sanitize
items with sanitizeTextContent as you already do, and pass the sanitized
crossRepoContext array/string into assembleChatPrompt via the
cross_repository/crossRepoContext parameter so assembleChatPrompt can label it
separately (also apply the same change to the second occurrence around the other
block that calls orgRagIndex.retrieveCrossRepositoryContext at lines referenced
in the comment).
lib/utils/__tests__/promptSanitization.test.ts (1)

201-208: ⚡ Quick win

Add regression tests for label escaping and well-formed truncation.

Current tests validate wrapping/truncation presence, but not that source attributes are escaped or that truncation keeps tags balanced. Add both to lock down the safety boundary.

Also applies to: 239-247, 575-582

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/utils/__tests__/promptSanitization.test.ts` around lines 201 - 208,
Extend the tests for buildDelimitedContextBlock to assert that the source
attribute is properly escaped (e.g., passes through escaping for quotes/angle
brackets) and that when content is truncated the returned string still contains
balanced/opening and closing REPOSITORY_DATA tags; add assertions that the
output contains the escaped source attribute value and that opening and closing
tags counts match (or that closing tag exists after opening) to verify
well-formed truncation and label escaping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/api/ai/chat/route.ts`:
- Around line 248-249: Guard against missing or non-array repository.languages
before mapping: update the logic that builds langText (currently using
repository.languages.map in the langText expression) to first check
repository.languages is an array (e.g., repository.languages?.length ?
repository.languages : []) or use optional chaining with a safe fallback so map
is never called on undefined; ensure similar defensive checks where
repository.commits/contributors/files are read (used in statsText) if those
fields can be absent.

In `@app/api/repositories/`[id]/files/content/route.ts:
- Around line 13-14: The ALLOWED_PATH_SEGMENTS regex is too restrictive (it
rejects spaces and Unicode) so remove its use and stop filtering path segments
by that character class; instead keep the existing traversal checks (reject
"..") and empty-segment checks and rely on encodePathSegments() to safely encode
arbitrary segment characters. Locate references to ALLOWED_PATH_SEGMENTS in the
route handler and replace the character-filtering logic with a simple validation
that enforces non-empty segments and no path traversal, leaving encoding to
encodePathSegments() and keeping the rest of the path normalization logic
unchanged.
- Around line 208-225: The post-download size check uses content.length (UTF-16
code units) which miscounts bytes for multi-byte UTF-8; change the check after
const content = await response.text() to measure actual bytes (e.g., use new
TextEncoder().encode(content).length or fetch response.arrayBuffer() and use
.byteLength) and use that byte count to enforce the 1MB limit and return the
same 413 NextResponse when exceeded; keep the initial header-based check as-is
but replace the content.length comparison in route.ts accordingly.
- Around line 103-108: The current branch treats any filename without a dot as
text (using filePath and filename) which lets extensionless binaries bypass the
binary rejection; change this to check filename against an explicit allowlist
(e.g., ALLOWED_TEXT_FILENAMES array of known text-only names) instead of
returning true for all filenames with no '.' — replace the unconditional if
(!filename.includes(".")) return true with a lookup like if
(ALLOWED_TEXT_FILENAMES.includes(filename)) return true, otherwise continue
normal binary-file checks; update any tests that assumed the old behavior.

In `@lib/services/__tests__/incident-pipeline.test.ts`:
- Around line 65-70: Add a test case to validate dynamic base-branch behavior by
stubbing githubService.getRepository to return a non-"main" default_branch
(e.g., "develop") and then assert that githubService.client.post was called with
that branch as the PR base; specifically update the tests around
getRollbackPrService / githubService usage to include a scenario where
getRepository.mockResolvedValueOnce({ default_branch: "develop" }) and verify
the call to githubService.client.post includes an argument with base: "develop"
(in the same places you mock/inspect client.post at tests near the
getRollbackPrService usage and the other mentioned ranges).

In `@lib/utils/promptSanitization.ts`:
- Around line 185-187: The assembleChatPrompt implementation currently omits the
safety system prompt but its contract/comments state it performs "full prompt
assembly"; update assembleChatPrompt to include buildSafetySystemPrompt output
as the first part of the assembled prompt (safety rules + delimited context +
user question), and similarly ensure the other assembly helper at the 223-227
area also prepends buildSafetySystemPrompt; reference function names
assembleChatPrompt and buildSafetySystemPrompt when making the change so callers
get a single safe entry point.
- Around line 153-156: The current code truncates the final joined string
(`joined`) directly which can cut closing tags like `</REPOSITORY_DATA>` and
produce malformed blocks; instead iterate through `blocks` and build the output
by appending each block only if adding it (including the "\n\n" separator) keeps
the total length <= MAX_TOTAL_CONTEXT_CHARS, stop before adding a block that
would overflow, and then return the accumulated string plus "\n[additional
context truncated]" if any blocks were omitted; update the logic around
`joined`, `blocks`, and `MAX_TOTAL_CONTEXT_CHARS` so delimiters always remain
intact.
- Around line 148-151: The loop in promptSanitization.ts that pushes
`<REPOSITORY_DATA source="${label}">` can be broken by quotes or angle brackets
in label; update the code in the for (const { label, content } of contextParts)
block to HTML/XML-escape label (at minimum replace " with &quot; and > with &gt;
and & with &amp;) before interpolating it into the source attribute, then use
the escaped value when building the block (alongside the existing
sanitizeTextContent(content)) so attribute delimiter break-out cannot occur.

---

Nitpick comments:
In `@app/api/ai/chat/route.ts`:
- Around line 233-237: The code is merging cross-repository text into
retrievedFilesContent and then calling assembleChatPrompt with crossRepoContext:
""—instead, preserve the cross-repo results from
orgRagIndex.retrieveCrossRepositoryContext (stored in crossRepoContext), stop
appending them into retrievedFilesContent (remove the retrievedFilesContent +=
"... CROSS-REPOSITORY CONTEXT ..." concat), sanitize items with
sanitizeTextContent as you already do, and pass the sanitized crossRepoContext
array/string into assembleChatPrompt via the cross_repository/crossRepoContext
parameter so assembleChatPrompt can label it separately (also apply the same
change to the second occurrence around the other block that calls
orgRagIndex.retrieveCrossRepositoryContext at lines referenced in the comment).

In `@lib/utils/__tests__/promptSanitization.test.ts`:
- Around line 201-208: Extend the tests for buildDelimitedContextBlock to assert
that the source attribute is properly escaped (e.g., passes through escaping for
quotes/angle brackets) and that when content is truncated the returned string
still contains balanced/opening and closing REPOSITORY_DATA tags; add assertions
that the output contains the escaped source attribute value and that opening and
closing tags counts match (or that closing tag exists after opening) to verify
well-formed truncation and label escaping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 09952428-978f-48bd-97c3-966fae9233bf

📥 Commits

Reviewing files that changed from the base of the PR and between 63ecb90 and 4ce95a1.

📒 Files selected for processing (8)
  • app/api/ai/chat/route.ts
  • app/api/auth/signup/route.ts
  • app/api/repositories/[id]/files/content/route.ts
  • lib/services/__tests__/file-content-security.test.ts
  • lib/services/__tests__/incident-pipeline.test.ts
  • lib/services/rollback-pr.ts
  • lib/utils/__tests__/promptSanitization.test.ts
  • lib/utils/promptSanitization.ts

Comment thread app/api/ai/chat/route.ts
Comment on lines +248 to +249
const langText = repository.languages.map((l: any) => `${l.name} (${l.percentage}%)`).join(", ");
const statsText = `${repository.commits?.length || 0} commits, ${repository.contributors?.length || 0} contributors, ${repository.files?.length || 0} files`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard repository.languages before .map() to avoid runtime 500s.

If language stats are missing on some repos, this line throws and fails the request path.

Suggested fix
-    const langText = repository.languages.map((l: any) => `${l.name} (${l.percentage}%)`).join(", ");
+    const langText = Array.isArray((repository as any).languages)
+      ? (repository as any).languages.map((l: any) => `${l.name} (${l.percentage}%)`).join(", ")
+      : "Unknown";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const langText = repository.languages.map((l: any) => `${l.name} (${l.percentage}%)`).join(", ");
const statsText = `${repository.commits?.length || 0} commits, ${repository.contributors?.length || 0} contributors, ${repository.files?.length || 0} files`;
const langText = Array.isArray((repository as any).languages)
? (repository as any).languages.map((l: any) => `${l.name} (${l.percentage}%)`).join(", ")
: "Unknown";
const statsText = `${repository.commits?.length || 0} commits, ${repository.contributors?.length || 0} contributors, ${repository.files?.length || 0} files`;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/ai/chat/route.ts` around lines 248 - 249, Guard against missing or
non-array repository.languages before mapping: update the logic that builds
langText (currently using repository.languages.map in the langText expression)
to first check repository.languages is an array (e.g.,
repository.languages?.length ? repository.languages : []) or use optional
chaining with a safe fallback so map is never called on undefined; ensure
similar defensive checks where repository.commits/contributors/files are read
(used in statsText) if those fields can be absent.

Comment on lines +13 to +14
const ALLOWED_PATH_SEGMENTS = /^[a-zA-Z0-9._\-\/]+$/;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't reject spaces and Unicode in otherwise safe filenames.

This allowlist now blocks valid GitHub paths like docs/Design Doc.md or src/naïve.ts, even though encodePathSegments() can safely encode those segments. Keep the traversal and empty-segment checks, but this character filter is stricter than necessary and will break legitimate file viewing.

Also applies to: 49-52

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/repositories/`[id]/files/content/route.ts around lines 13 - 14, The
ALLOWED_PATH_SEGMENTS regex is too restrictive (it rejects spaces and Unicode)
so remove its use and stop filtering path segments by that character class;
instead keep the existing traversal checks (reject "..") and empty-segment
checks and rely on encodePathSegments() to safely encode arbitrary segment
characters. Locate references to ALLOWED_PATH_SEGMENTS in the route handler and
replace the character-filtering logic with a simple validation that enforces
non-empty segments and no path traversal, leaving encoding to
encodePathSegments() and keeping the rest of the path normalization logic
unchanged.

Comment on lines +103 to +108
// Allow files with no extension (often config files)
const lastSlash = filePath.lastIndexOf("/");
const filename = lastSlash >= 0 ? filePath.substring(lastSlash + 1) : filePath;
if (!filename.includes(".")) {
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The no-extension fallback bypasses the text-file allowlist.

Any path without a . is treated as text here, so files like bin/tool or other extensionless binaries skip the binary-file rejection entirely. Restrict this branch to an explicit allowlist of known text filenames instead of allowing every extensionless path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/repositories/`[id]/files/content/route.ts around lines 103 - 108, The
current branch treats any filename without a dot as text (using filePath and
filename) which lets extensionless binaries bypass the binary rejection; change
this to check filename against an explicit allowlist (e.g.,
ALLOWED_TEXT_FILENAMES array of known text-only names) instead of returning true
for all filenames with no '.' — replace the unconditional if
(!filename.includes(".")) return true with a lookup like if
(ALLOWED_TEXT_FILENAMES.includes(filename)) return true, otherwise continue
normal binary-file checks; update any tests that assumed the old behavior.

Comment on lines +208 to +225
// Limit content size to prevent memory exhaustion
const contentLength = response.headers.get("content-length");
if (contentLength && parseInt(contentLength) > 1024 * 1024) {
return NextResponse.json(
{ error: "File too large to display (max 1MB)" },
{ status: 413 }
);
}

const content = await response.text();

// Double-check content size after reading
if (content.length > 1024 * 1024) {
return NextResponse.json(
{ error: "File too large to display (max 1MB)" },
{ status: 413 }
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Measure the downloaded body in bytes, not string.length.

response.text() gives you a JS string, so content.length counts UTF-16 code units rather than actual response bytes. A multibyte UTF-8 file can exceed the 1MB cap while still passing this check. Use new TextEncoder().encode(content).length here, or stream and count bytes before returning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/repositories/`[id]/files/content/route.ts around lines 208 - 225, The
post-download size check uses content.length (UTF-16 code units) which miscounts
bytes for multi-byte UTF-8; change the check after const content = await
response.text() to measure actual bytes (e.g., use new
TextEncoder().encode(content).length or fetch response.arrayBuffer() and use
.byteLength) and use that byte count to enforce the 1MB limit and return the
same 413 NextResponse when exceeded; keep the initial header-based check as-is
but replace the content.length comparison in route.ts accordingly.

Comment on lines +65 to +70
const rollbackSvc = getRollbackPrService();
const { githubService } = require("../githubService");

githubService.getRepository.mockResolvedValueOnce({ default_branch: "main" });
githubService.client.post.mockResolvedValueOnce({ data: { html_url: "http://github.com/pr/1", number: 1 } });
githubService.client.put.mockResolvedValueOnce({});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tests currently don’t prove dynamic base-branch behavior.

Because mocks use "main" and assertions don’t check base, a hardcoded "main" implementation would still pass. Add at least one scenario with a non-main default branch (e.g., "develop") and assert client.post receives that exact base.

Suggested test tightening
- githubService.getRepository.mockResolvedValueOnce({ default_branch: "main" });
+ githubService.getRepository.mockResolvedValueOnce({ default_branch: "develop" });

  githubService.client.post.mockResolvedValueOnce({ data: { html_url: "url" } });

  const result = await rollbackSvc.executeRollback(1, "o", "r", { id: "1", source: "generic", severity: "high", title: "Err" } as any, {
    likelyPrNumber: 999, confidenceScore: 99, impactedFiles: [], impactedServices: [], analysisDetails: ""
  });

  expect(result.success).toBe(true);
+ expect(githubService.getRepository).toHaveBeenCalledWith("o", "r");
  expect(githubService.client.post).toHaveBeenCalledWith(
    `/repos/o/r/pulls`,
-   expect.objectContaining({ title: expect.stringContaining("Revert PR `#999`") })
+   expect.objectContaining({
+     title: expect.stringContaining("Revert PR `#999`"),
+     base: "develop",
+   })
  );

Also applies to: 127-138, 147-160

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/services/__tests__/incident-pipeline.test.ts` around lines 65 - 70, Add a
test case to validate dynamic base-branch behavior by stubbing
githubService.getRepository to return a non-"main" default_branch (e.g.,
"develop") and then assert that githubService.client.post was called with that
branch as the PR base; specifically update the tests around getRollbackPrService
/ githubService usage to include a scenario where
getRepository.mockResolvedValueOnce({ default_branch: "develop" }) and verify
the call to githubService.client.post includes an argument with base: "develop"
(in the same places you mock/inspect client.post at tests near the
getRollbackPrService usage and the other mentioned ranges).

Comment on lines +148 to +151
for (const { label, content } of contextParts) {
if (!content || !content.trim()) continue;
blocks.push(`<REPOSITORY_DATA source="${label}">\n${sanitizeTextContent(content)}\n</REPOSITORY_DATA>`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escape label before embedding in source="..." to prevent delimiter break-out.

label is interpolated as raw attribute text. A value containing " or > can break the wrapper and weaken the context-isolation contract.

Suggested fix
+function escapeAttr(value: string): string {
+  return value
+    .replace(/&/g, "&amp;")
+    .replace(/"/g, "&quot;")
+    .replace(/</g, "&lt;")
+    .replace(/>/g, "&gt;");
+}
+
 export function buildDelimitedContextBlock(contextParts: Array<{ label: string; content: string }>): string {
   const blocks: string[] = [];

   for (const { label, content } of contextParts) {
     if (!content || !content.trim()) continue;
-    blocks.push(`<REPOSITORY_DATA source="${label}">\n${sanitizeTextContent(content)}\n</REPOSITORY_DATA>`);
+    blocks.push(
+      `<REPOSITORY_DATA source="${escapeAttr(label)}">\n${sanitizeTextContent(content)}\n</REPOSITORY_DATA>`
+    );
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/utils/promptSanitization.ts` around lines 148 - 151, The loop in
promptSanitization.ts that pushes `<REPOSITORY_DATA source="${label}">` can be
broken by quotes or angle brackets in label; update the code in the for (const {
label, content } of contextParts) block to HTML/XML-escape label (at minimum
replace " with &quot; and > with &gt; and & with &amp;) before interpolating it
into the source attribute, then use the escaped value when building the block
(alongside the existing sanitizeTextContent(content)) so attribute delimiter
break-out cannot occur.

Comment on lines +153 to +156
const joined = blocks.join("\n\n");
if (joined.length > MAX_TOTAL_CONTEXT_CHARS) {
return joined.substring(0, MAX_TOTAL_CONTEXT_CHARS) + "\n[additional context truncated]";
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Avoid truncating the final joined string mid-tag.

Truncating joined directly can cut </REPOSITORY_DATA> and leave malformed blocks. Truncate per-block (or cap before append) so delimiters stay structurally valid.

Suggested fix
-  const joined = blocks.join("\n\n");
-  if (joined.length > MAX_TOTAL_CONTEXT_CHARS) {
-    return joined.substring(0, MAX_TOTAL_CONTEXT_CHARS) + "\n[additional context truncated]";
-  }
-  return joined;
+  let total = 0;
+  const kept: string[] = [];
+  for (const block of blocks) {
+    const sep = kept.length ? 2 : 0; // "\n\n"
+    if (total + sep + block.length > MAX_TOTAL_CONTEXT_CHARS) break;
+    if (sep) total += sep;
+    kept.push(block);
+    total += block.length;
+  }
+  const joined = kept.join("\n\n");
+  if (kept.length < blocks.length) {
+    return `${joined}\n[additional context truncated]`;
+  }
+  return joined;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/utils/promptSanitization.ts` around lines 153 - 156, The current code
truncates the final joined string (`joined`) directly which can cut closing tags
like `</REPOSITORY_DATA>` and produce malformed blocks; instead iterate through
`blocks` and build the output by appending each block only if adding it
(including the "\n\n" separator) keeps the total length <=
MAX_TOTAL_CONTEXT_CHARS, stop before adding a block that would overflow, and
then return the accumulated string plus "\n[additional context truncated]" if
any blocks were omitted; update the logic around `joined`, `blocks`, and
`MAX_TOTAL_CONTEXT_CHARS` so delimiters always remain intact.

Comment on lines +185 to +187
* Full prompt assembly: safety prompt + delimited context + user question.
* This is the single entry point that all chat-route code should use
* instead of string-concatenating repository content directly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

assembleChatPrompt contract says “full prompt assembly” but omits safety rules.

The comments describe this as the single safe entry point, but it does not include buildSafetySystemPrompt. This mismatch is easy to misuse in future call sites.

Suggested fix (align implementation with contract)
 export function assembleChatPrompt(opts: {
   repositoryName: string;
@@
   const contextBlock = buildDelimitedContextBlock(contextParts);
   const userQuestion = wrapUserQuestion(question);
+  const safetyPrompt = buildSafetySystemPrompt(repositoryName);

-  return `Answer the user question using the repository data provided below. Ground your answer in the file contents and metadata. If code snippets are relevant, reference them. If no relevant files are found, say so.
+  return `${safetyPrompt}
+
+Answer the user question using the repository data provided below. Ground your answer in the file contents and metadata. If code snippets are relevant, reference them. If no relevant files are found, say so.
 
 ${contextBlock}
 
 ${userQuestion}`;

Also applies to: 223-227

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/utils/promptSanitization.ts` around lines 185 - 187, The
assembleChatPrompt implementation currently omits the safety system prompt but
its contract/comments state it performs "full prompt assembly"; update
assembleChatPrompt to include buildSafetySystemPrompt output as the first part
of the assembled prompt (safety rules + delimited context + user question), and
similarly ensure the other assembly helper at the 223-227 area also prepends
buildSafetySystemPrompt; reference function names assembleChatPrompt and
buildSafetySystemPrompt when making the change so callers get a single safe
entry point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GSSoC'26 Part of GirlScript Summer of Code 2026

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Critical: Hardcoded "main" Base Branch in Rollback Service Causes Failed PR Creation for Repos Using Different Default Branches

3 participants