Skip to content

Feat/static code analysis#37

Merged
KevinGruber2001 merged 6 commits into
mainfrom
feat/static-code-analysis
Jun 8, 2026
Merged

Feat/static code analysis#37
KevinGruber2001 merged 6 commits into
mainfrom
feat/static-code-analysis

Conversation

@KevinGruber2001

@KevinGruber2001 KevinGruber2001 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Static analysis for uploads: configurable checks for network, exec, reflection, and filesystem APIs; can be enabled/disabled and tuned via new config flags.
    • Analyzer inspects common archive formats (JAR/ZIP and gzip tar variants) and may reject uploads with HTTP 403 including detected violation details; analyzer errors are logged and do not block uploads.
  • Chores

    • Helm chart version bumped to 0.5.3.

@KevinGruber2001 KevinGruber2001 self-assigned this Jun 8, 2026
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request adds static API analysis to detect and reject artifacts containing forbidden APIs. Configuration flags control per-category checks (network, exec, reflection, filesystem), a new analysis engine scans gzip/ZIP artifacts and class constant pools, and the cache handler enforces violations with 403 responses.

Changes

Static Analysis Feature

Layer / File(s) Summary
Configuration Schema & Defaults
chart/values.yaml, src/configs/config.yaml, src/internal/config/config.go
Static analysis configuration is defined with per-check toggles (network, exec, reflection, filesystem) and integrated into the Config struct. Load sets defaults for these keys.
Bytecode Analysis Engine
src/internal/analysis/patterns.go, src/internal/analysis/classfile.go, src/internal/analysis/analyzer.go
Analyzer detects gzip tar and ZIP/JAR inputs, iterates entries, extracts constant-pool UTF-8 strings from .class files, and matches them against categorized prefix lists; violations are de-duplicated and parse errors for individual entries are skipped.
Handler Integration with Analysis
src/internal/handler/handler.go, src/internal/handler/cache_put.go
CacheHandler accepts an optional Analyzer. Cache PUT unconditionally buffers request body, validates size, runs analysis if configured, returns 403 with violation details on detection, logs analyzer errors, and writes buffered bytes to storage.
Server Initialization & Analyzer Injection
src/internal/server/server.go
Server setupRoutes conditionally constructs an Analyzer when enabled, logs enabled-check booleans, and passes the analyzer (or nil) into NewCacheHandler.
Helm ConfigMap & Chart Metadata
chart/templates/configmap.yaml, chart/Chart.yaml
ConfigMap template conditionally renders static_analysis block in config.yaml when enabled; chart version bumped from 0.5.2 to 0.5.3.

🎯 3 (Moderate) | ⏱️ ~22 minutes

A rabbit hops through the cache,
Checking classes byte by byte,
No forbidden APIs pass through—
Just safe artifacts, shining bright! 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Feat/static code analysis' directly describes the main feature being added: a static code analysis capability that scans Java artifacts for forbidden API usage across network, execution, reflection, and filesystem categories.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/static-code-analysis

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Chart Preview Ready

helm install test oci://ghcr.io/eduide/charts/eduide-shared-cache --version 0.5.3-pr.37

Updated: 2fec5d9

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/configs/config.yaml`:
- Around line 30-34: Update the static-analysis defaults in
src/configs/config.yaml so they match the Helm chart and loader defaults: set
check_network and check_exec to false (to align with existing defaults in
chart/values.yaml and src/internal/config/config.go) while keeping
check_reflection and check_filesystem as currently defined; ensure the
static_analysis.enabled behavior is consistent across all sources by making
these keys identical to the Helm/loader defaults.

In `@src/go.mod`:
- Line 27: The go.mod entry for github.com/dgryski/go-rendezvous has a typo in
the trailing comment ("// indirectrelo"); update that comment to the correct "//
indirect" so the transitive dependency is marked properly—locate the line
containing "github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f"
and replace "// indirectrelo" with "// indirect".

In `@src/internal/handler/cache_put.go`:
- Around line 34-37: The PUT handler currently unconditionally buffers the
entire request body via io.ReadAll(io.LimitReader(c.Request.Body,
h.maxEntrySize+1)), causing large allocations even when no analysis is needed;
change the fast path so that when h.analyzer == nil and the request provides a
Content-Length that is <= h.maxEntrySize you stream c.Request.Body directly to
the storage/write path instead of buffering it, and only use the io.ReadAll
LimitReader path when analysis is required or Content-Length is
missing/indicates chunked transfer (or exceeds h.maxEntrySize); keep the
over-limit detection logic (+1) for the buffering branch and ensure behavior for
analyzer != nil remains unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14c872bd-7cb0-4133-a7f7-64db1005ff59

📥 Commits

Reviewing files that changed from the base of the PR and between 5524389 and 4a9fc6f.

📒 Files selected for processing (11)
  • chart/templates/configmap.yaml
  • chart/values.yaml
  • src/configs/config.yaml
  • src/go.mod
  • src/internal/analysis/analyzer.go
  • src/internal/analysis/classfile.go
  • src/internal/analysis/patterns.go
  • src/internal/config/config.go
  • src/internal/handler/cache_put.go
  • src/internal/handler/handler.go
  • src/internal/server/server.go

Comment thread src/configs/config.yaml
Comment thread src/go.mod Outdated
Comment thread src/internal/handler/cache_put.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/internal/analysis/analyzer.go`:
- Around line 89-92: Analyzer.Check currently uses unbounded io.ReadAll on
tar/zip entry readers (entryData := io.ReadAll(tr) and io.ReadAll(rc)), and
isZIP only accepts the exact 4‑byte sequence PK\x03\x04 which misses other valid
ZIP/JAR signatures. Replace the unbounded reads with bounded reads using
io.ReadAll(io.LimitReader(tr, maxEntrySize)) and io.ReadAll(io.LimitReader(rc,
maxEntrySize)), and treat reads that reach the limit as "entry too large" (skip
or return a handled error) to avoid memory blowups; reference the variables
entryData, tr, rc and the existing maxEntrySize constant. Also broaden isZIP to
detect ZIP/JAR by checking for the "PK" prefix (e.g., bytes.HasPrefix(buf,
[]byte("PK"))) rather than only PK\x03\x04 so ZIP/JAR archives aren't
incorrectly skipped; reference the isZIP function and Analyzer.Check where it's
used.
- Around line 55-63: The ZIP detection is too strict: in Analyzer.Check the
switch currently only calls checkZIP when isZIP(data) is true, causing valid
ZIP/JAR payloads that lack the local file header at the start to be skipped;
instead, mirror the gzip path by removing the isZIP gate and always attempt ZIP
parsing for non-gzip data—create the same seen map (seen :=
make(map[string]struct{})) and call a.checkZIP(data, seen) in the non-gzip
branch (keep the isGzip -> a.checkGzipTar(data) branch and the default error
handling), so checkZIP is attempted for any non-gzip input just like .jar
entries handled inside checkGzipTar.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab629b1b-9d78-4eb6-ae2f-f19500da4803

📥 Commits

Reviewing files that changed from the base of the PR and between 945994a and 8ebb975.

📒 Files selected for processing (2)
  • src/internal/analysis/analyzer.go
  • src/internal/handler/cache_put.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/internal/handler/cache_put.go

Comment on lines +55 to +63
switch {
case isGzip(data):
return a.checkGzipTar(data)
case isZIP(data):
seen := make(map[string]struct{})
return a.checkZIP(data, seen)
default:
return nil, nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and print the analyzer + helpers around the referenced lines
file="src/internal/analysis/analyzer.go"
echo "== $file =="
nl -ba "$file" | sed -n '1,220p' | sed -n '45,110p'

# Find isZIP definition and checkZIP usage
echo "== Search for isZIP definition =="
rg -n "func\s+isZIP\s*\(" -S src/internal/analysis || true
rg -n "isZIP\(" -S src/internal/analysis || true

echo "== Search for checkZIP definition =="
rg -n "func\s+\(a\s+\*\s*Analyzer\)\s+checkZIP\s*\(" -S src/internal/analysis || true
rg -n "checkZIP\(" -S src/internal/analysis || true

# Also inspect any custom zip error handling if present
echo "== Search for zip.ErrFormat usage =="
rg -n "zip\.ErrFormat|errors\.Is\(.*ErrFormat|ErrFormat" -S src/internal/analysis || true

Repository: EduIDE/EduIDE-shared-cache

Length of output: 171


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="src/internal/analysis/analyzer.go"
echo "== $file =="
awk 'NR>=45 && NR<=110 {printf "%5d  %s\n", NR, $0}' "$file"

echo "== Search for isZIP definition =="
rg -n "func\s+isZIP\s*\(" -S src/internal/analysis || true

echo "== Search for isZIP\(\) usage =="
rg -n "isZIP\(" -S src/internal/analysis || true

echo "== Search for checkZIP definition =="
rg -n "func\s+\(a\s+\*\s*Analyzer\)\s+checkZIP\s*\(" -S src/internal/analysis || true

echo "== Search for checkZIP\(\) usage =="
rg -n "checkZIP\(" -S src/internal/analysis || true

echo "== Search for zip.ErrFormat / errors.Is handling =="
rg -n "zip\.ErrFormat|errors\.Is\(|ErrFormat" -S src/internal/analysis || true

Repository: EduIDE/EduIDE-shared-cache

Length of output: 3170


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/internal/analysis/analyzer.go"

echo "== isZIP (around 170-220) =="
awk 'NR>=165 && NR<=230 {printf "%5d  %s\n", NR, $0}' "$file"

echo "== checkZIP (around 100-170) =="
awk 'NR>=95 && NR<=170 {printf "%5d  %s\n", NR, $0}' "$file"

echo "== gzip/zip imports context =="
awk 'NR>=1 && NR<=45 {printf "%5d  %s\n", NR, $0}' "$file"

Repository: EduIDE/EduIDE-shared-cache

Length of output: 4913


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for Analyzer.Check call sites
rg -n "New\(cfg\).*Analyzer|\.Check\(" -S src | head -n 200

# Search for Analyzer usage specifically with shared-cache scanning
rg -n "analy[sz]er" -S src/internal | head -n 200

# Search for any tests mentioning ZIP/JAR signatures or isZIP
fd -t f . test tests __tests__ 2>/dev/null | head -n 50 || true
rg -n "isZIP|PK\\\\x03\\\\x04|zip.NewReader|jar" -S . | head -n 200

# Display full analyzer.go to see any additional context around comments (small file already ~ maybe > 220 lines)
wc -l src/internal/analysis/analyzer.go

Repository: EduIDE/EduIDE-shared-cache

Length of output: 3256


🌐 Web query:

Go archive/zip NewReader required signatures does it require first bytes to be PK\x03\x04 local file header

💡 Result:

No, the Go archive/zip package's NewReader function does not require the archive to begin with the PK\x03\x04 (local file header) signature [1][2]. Instead, NewReader initializes by locating and reading the Central Directory, which is typically found at or near the end of the ZIP file [1][3][4]. The ZIP specification defines the Central Directory as the authoritative source for the archive's contents, and Go's implementation follows this by: 1. Searching for the End of Central Directory (EOCD) record at the end of the provided io.ReaderAt [1][3]. 2. Using the information in the EOCD record to locate and parse the Central Directory [1][4]. 3. Populating the list of files based on the entries found in the Central Directory, rather than scanning from the beginning of the file [1][5][3]. Because the Central Directory contains the offsets to the local file headers for each individual file, NewReader does not need to start at the beginning of the stream [1][6]. This architecture allows the package to support reading ZIP files that may be appended to other data or exist as part of a larger stream where the local file headers are not positioned at the start [6].

Citations:


🌐 Web query:

ZIP file signatures empty zip starts with PK\x05\x06 end of central directory header

💡 Result:

In the ZIP file format, the signature PK\x05\x06 (hexadecimal 50 4B 05 06) identifies the End of Central Directory (EOCD) record [1][2][3]. According to the official PKWARE ZIP File Format Specification, every ZIP file must contain exactly one EOCD record [4][5]. An empty ZIP archive is defined as a file containing only this EOCD record [4][5]. Because the EOCD record is mandatory and serves as the anchor for the central directory, a ZIP file consisting solely of this structure is considered a valid, albeit empty, archive [4]. The EOCD record follows a specific structure [2][3]: - Signature: 4 bytes (PK\x05\x06) [2][3] - Disk numbers and entry counts: These fields (e.g., total number of entries in the central directory) are set to zero in an empty ZIP file [2][6]. - Total size and offset: The size of the central directory and its offset are also set to zero [2][3]. - Comment: An optional field for the ZIP file comment follows the fixed fields [2][3]. A common hexdump for a minimal empty ZIP file is: 50 4B 05 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [6]. This represents the 22-byte structure (4-byte signature + 18 bytes of fixed data) [2][7][6].

Citations:


🌐 Web query:

ZIP local file header signature 0x04034b50 end of central directory 0x06054b50 central directory header 0x02014b50

💡 Result:

The ZIP file format utilizes specific 4-byte magic number signatures to identify its various structural components. These signatures are stored in little-endian byte order, which is why they appear as 0x50 0x4b 0x03 0x04 in a hex editor, corresponding to the integer value 0x04034b50 [1][2]. The primary signatures are: Local File Header: 0x04034b50 This signature marks the beginning of a local file header, which precedes the actual compressed or stored data for each individual file within the archive [3][4][5]. It contains metadata such as the file name, compression method, and CRC-32 checksum [3][1][4]. Central Directory Header: 0x02014b50 This signature marks the beginning of a central directory header [6][7]. The central directory is located at the end of the ZIP archive and serves as a master index of all files contained within the ZIP, allowing for faster directory listing without scanning the entire file [7]. End of Central Directory Record (EOCD): 0x06054b50 This signature marks the beginning of the end of central directory record [1][2]. This record is found at the very end of the ZIP archive [1][7]. It provides critical information for the ZIP reader, such as the number of central directory records and the relative offset where the central directory starts, effectively anchoring the archive structure [1][7]. These signatures are formal parts of the PKWARE.ZIP File Format Specification (APPNOTE), which defines the architecture of ZIP archives [4][8][5].

Citations:


🌐 Web query:

archive/zip NewReader invalid zip returns zip.ErrFormat

💡 Result:

In Go's archive/zip package, zip.NewReader will return the error zip.ErrFormat when it cannot identify the provided input as a valid ZIP archive [1][2]. This error message is defined as "zip: not a valid zip file" [1][2]. The library returns this error in several specific scenarios, primarily when it fails to locate or parse required ZIP structure signatures [3]: - Signature Mismatches: The parser does not find the expected signatures for file headers, directory headers, or the end of the central directory record [3][4]. - Truncated or Malformed Data: The input may be missing critical components, or the data might not align with the offsets specified in the archive's central directory [3]. - Incorrect Input Size: A common cause for this error is providing an incorrect size to zip.NewReader [5]. The function requires the size of the ZIP file in bytes; if the provided size is inaccurate, the reader may fail to locate the end-of-central-directory record, leading to zip.ErrFormat [5]. While zip.ErrFormat is the standard indicator of an invalid or unrecognized ZIP format, it is not always easy to determine exactly which part of the archive structure caused the failure without inspecting the Go source code or using debug logging [3]. If you encounter this error, verify that the input is a complete, valid ZIP file and that the size passed to NewReader accurately matches the total byte length of the reader's underlying data source [5].

Citations:


🌐 Web query:

archive/zip ErrFormat error value 'zip: not a valid zip file'

💡 Result:

In the Go standard library, ErrFormat is an error variable defined in the archive/zip package with the message "zip: not a valid zip file" [1][2]. It is returned by various methods, such as zip.OpenReader or while iterating through zip files, when the input data does not conform to the expected ZIP file format [3][4][5]. This error typically occurs because the reader failed to find or validate essential ZIP file structures, such as [4]: * Missing or invalid ZIP signature headers (file, directory, or end-of-central-directory signatures) [6][4][5]. * Data that is truncated or shorter than the metadata claims [4][5]. * An invalid directory offset (e.g., negative or exceeding the file size) [4]. * The provided file is not actually a ZIP archive, or contains unexpected "garbage" data at the beginning or within the file structure [4]. Because ErrFormat is a sentinel error, it provides a generic message to prevent the exposure of implementation details or excessive information during parsing, which has been the subject of community discussion regarding the difficulty of debugging specific causes of format failures [3][4]. To diagnose why a specific file is triggering this error, developers often use external tools like the system's unzip command to check if the file can be validated or to see more descriptive error messages [3]. Adding local debugging logs to the internal archive/zip package source code is another common approach for identifying which specific header or offset verification is failing [4].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/internal/handler/cache_put.go"
echo "== $file =="
awk 'NR>=1 && NR<=140 {printf "%5d  %s\n", NR, $0}' "$file" | sed -n '40,120p'

file2="src/internal/handler/handler.go"
echo "== $file2 =="
awk 'NR>=1 && NR<=120 {printf "%5d  %s\n", NR, $0}' "$file2" | sed -n '1,80p'

file3="src/internal/server/server.go"
echo "== $file3 =="
awk 'NR>=60 && NR<=130 {printf "%5d  %s\n", NR, $0}' "$file3"

Repository: EduIDE/EduIDE-shared-cache

Length of output: 6002


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find Go tests mentioning Analyzer/isZIP/checkZIP
rg -n --hidden --glob '*_test.go' "Analyzer|isZIP|checkZIP|archive/zip|PK\x03\x04" . || true

# List any test files under internal/analysis
fd -t f --hidden --follow "*analysis*" . | head -n 200
fd -t f --hidden --follow "*_test.go" src/internal/analysis || true

# Show full analyzer.go line count already known, but ensure no other ZIP detection elsewhere
rg -n "func\s+isZIP|PK\\x03\\x04|isZIP\(" src/internal/analysis/analyzer.go

Repository: EduIDE/EduIDE-shared-cache

Length of output: 360


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for tests or fixtures mentioning analyzer/zip/jar/isZIP
rg -n --hidden --glob '*_test.go' "Analyzer|isZIP|checkZIP|archive/zip|PK\x03\x04|jar" . || true

echo "== Files in src/internal/analysis matching *_test.go =="
fd -t f --hidden -g "*_test.go" src/internal/analysis || true

echo "== Any non-test files referencing isZIP =="
rg -n --hidden "isZIP\(" src/internal/analysis || true

Repository: EduIDE/EduIDE-shared-cache

Length of output: 294


ZIP detection is too strict and can skip parseable ZIP/JAR inputs.

Line 58 gates ZIP scanning on isZIP, which only returns true for buffers starting with the local file header signature PK\x03\x04. For ZIP/JAR data that is still valid but doesn’t start with that signature, Analyzer.Check falls through to default and returns (nil, nil), causing silent skipping (no parse attempt). This is also inconsistent with the gzip-tar path, where .jar entries call checkZIP without an isZIP gate.

Suggested robustness diff
 import (
 	"archive/tar"
 	"archive/zip"
 	"bytes"
 	"compress/gzip"
+	"errors"
 	"fmt"
 	"io"
 	"strings"
@@
 func (a *Analyzer) Check(data []byte) ([]Violation, error) {
-	switch {
-	case isGzip(data):
+	if isGzip(data) {
 		return a.checkGzipTar(data)
-	case isZIP(data):
-		seen := make(map[string]struct{})
-		return a.checkZIP(data, seen)
-	default:
-		return nil, nil
 	}
+
+	seen := make(map[string]struct{})
+	v, err := a.checkZIP(data, seen)
+	if err != nil {
+		if errors.Is(err, zip.ErrFormat) {
+			return nil, nil // unknown/non-zip format
+		}
+		return nil, err
+	}
+	return v, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
switch {
case isGzip(data):
return a.checkGzipTar(data)
case isZIP(data):
seen := make(map[string]struct{})
return a.checkZIP(data, seen)
default:
return nil, nil
}
if isGzip(data) {
return a.checkGzipTar(data)
}
seen := make(map[string]struct{})
v, err := a.checkZIP(data, seen)
if err != nil {
if errors.Is(err, zip.ErrFormat) {
return nil, nil // unknown/non-zip format
}
return nil, err
}
return v, nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/internal/analysis/analyzer.go` around lines 55 - 63, The ZIP detection is
too strict: in Analyzer.Check the switch currently only calls checkZIP when
isZIP(data) is true, causing valid ZIP/JAR payloads that lack the local file
header at the start to be skipped; instead, mirror the gzip path by removing the
isZIP gate and always attempt ZIP parsing for non-gzip data—create the same seen
map (seen := make(map[string]struct{})) and call a.checkZIP(data, seen) in the
non-gzip branch (keep the isGzip -> a.checkGzipTar(data) branch and the default
error handling), so checkZIP is attempted for any non-gzip input just like .jar
entries handled inside checkGzipTar.

Comment on lines +89 to +92
entryData, err := io.ReadAll(tr)
if err != nil {
continue
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Files ==="
ls -la src/internal/analysis/analyzer.go

echo "=== Locate io.ReadAll usage in analyzer.go ==="
rg -n "io\.ReadAll\(" src/internal/analysis/analyzer.go

echo "=== Show analyzer.go around reported lines (70-160) ==="
nl -ba src/internal/analysis/analyzer.go | sed -n '40,190p'

echo "=== Show analyzer.go around checkZIP / checkGzipTar definitions (if present) ==="
rg -n "func \(a \*Analyzer\) checkGzipTar|func \(a \*Analyzer\) checkZIP|func isGzip|func isZIP" src/internal/analysis/analyzer.go
nl -ba src/internal/analysis/analyzer.go | sed -n '1,260p'

echo "=== Check handler call sites for Analyzer.Check and whether they validate/limit upload sizes ==="
rg -n "Analyzer\.Check\(" -S src/internal | head
rg -n "cache_put\.go|cache_put\.go" -S src/internal/handler | head -n 50
nl -ba src/internal/handler/cache_put.go | sed -n '1,220p' || true

Repository: EduIDE/EduIDE-shared-cache

Length of output: 409


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Files ==="
ls -la src/internal/analysis/analyzer.go

echo "=== Locate io.ReadAll usage in analyzer.go ==="
rg -n "io\.ReadAll\(" src/internal/analysis/analyzer.go

echo "=== Show analyzer.go around reported lines (70-160) ==="
nl -ba src/internal/analysis/analyzer.go | sed -n '70,160p'

echo "=== Show analyzer.go around checkZIP / checkGzipTar definitions ==="
rg -n "checkGzipTar|checkZIP|scanClassBytes|isGzip|isZIP" src/internal/analysis/analyzer.go
nl -ba src/internal/analysis/analyzer.go | sed -n '1,240p'

echo "=== Check handler call sites for Analyzer.Check and whether they validate/limit upload sizes ==="
rg -n "Analyzer\.Check\(" -S src/internal | head -n 50
for f in $(fd -e go src/internal/handler | tr '\n' ' '); do :; done
if [ -f src/internal/handler/cache_put.go ]; then
  nl -ba src/internal/handler/cache_put.go | sed -n '1,260p'
fi

# Search for other upload size limits
rg -n "max.*(upload|request|body|bytes|size)" src/internal | head -n 50
rg -n "http\.MaxBytesReader|MaxBytesReader|io\.LimitReader|LimitedReader" src/internal | head -n 50

Repository: EduIDE/EduIDE-shared-cache

Length of output: 409


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Locate io.ReadAll usage in analyzer.go ==="
rg -n "io\.ReadAll\(" src/internal/analysis/analyzer.go

echo "=== Show analyzer.go around lines 60-160 ==="
nl -ba src/internal/analysis/analyzer.go | sed -n '55,160p'

echo "=== Show analyzer.go around lines 160-240 (to include ZIP/tar helpers if any) ==="
nl -ba src/internal/analysis/analyzer.go | sed -n '160,280p'

echo "=== Show relevant helper funcs (checkGzipTar, checkZIP, isGzip, isZIP) ==="
for pat in "checkGzipTar" "checkZIP" "isGzip" "isZIP"; do
  echo "--- matching $pat ---"
  rg -n "$pat" src/internal/analysis/analyzer.go
done

echo "=== Show handler cache_put.go that calls analyzer.Check ==="
if [ -f src/internal/handler/cache_put.go ]; then
  rg -n "Analyzer\.Check" src/internal/handler/cache_put.go
  nl -ba src/internal/handler/cache_put.go | sed -n '1,260p'
else
  echo "cache_put.go not found"
fi

echo "=== Search for size limits on uploads/bodies ==="
rg -n "MaxBytesReader|http\.MaxBytesReader|max.*upload|max.*request|max.*body|io\.LimitReader|LimitedReader" src/internal | head -n 100

Repository: EduIDE/EduIDE-shared-cache

Length of output: 300


🏁 Script executed:

[Cannot proceed: no verification evidence (tool outputs) was provided in this chat. I need the repository inspection results to rewrite the comment accurately.]

Repository: EduIDE/EduIDE-shared-cache

Length of output: 318


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/internal/analysis/analyzer.go"

echo "=== Line count ==="
wc -l "$FILE"

echo "=== analyzer.go: lines 1-120 ==="
sed -n '1,120p' "$FILE"

echo "=== analyzer.go: lines 120-200 ==="
sed -n '120,200p' "$FILE"

echo "=== analyzer.go: where helpers gate archive type ==="
rg -n "isGzip|isZIP|checkGzipTar|checkZIP|scanClassBytes" "$FILE"

echo "=== analyzer.go: any existing bounds/limits ==="
rg -n "LimitReader|MaxBytes|max.*(bytes|size)|io\.CopyN|ReadFull" "$FILE" || true

echo "=== Find handler(s) that call Analyzer.Check ==="
fd -e go "cache_put.go" src/internal/handler -t f || true
rg -n "Analyzer\.Check" src/internal/handler -S || true

echo "=== Show cache_put.go if present ==="
if [ -f "src/internal/handler/cache_put.go" ]; then
  sed -n '1,240p' "src/internal/handler/cache_put.go"
fi

echo "=== Search for request/body size limits in internal HTTP handlers/middleware ==="
rg -n "MaxBytesReader|http\.MaxBytesReader|max.*(upload|request|body)|LimitReader|LimitedReader" src/internal -S || true

Repository: EduIDE/EduIDE-shared-cache

Length of output: 9067


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "maxEntrySize" -S src/internal | head -n 50
rg -n "maxEntrySize" -S . | head -n 50
fd -e go src/internal | wc -l

Repository: EduIDE/EduIDE-shared-cache

Length of output: 3677


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for maxEntrySize definition ==="
rg -n "maxEntrySize" -S src/internal

echo "=== Search for CacheHandler struct (maxEntrySize field) ==="
rg -n "type CacheHandler" -S src/internal/handler
rg -n "maxEntrySize" -S src/internal/handler

echo "=== Search for defaults / config wiring ==="
rg -n "StaticAnalysisConfig|CheckNetwork|CheckExec|CheckReflection|CheckFilesystem" -S src/internal | head -n 50
rg -n "max.*entry.*size|max.*Entry.*Size|max.*upload.*size" -S src/internal src | head -n 50

Repository: EduIDE/EduIDE-shared-cache

Length of output: 8019


Bound decompressed reads from untrusted archives.

src/internal/analysis/analyzer.go uses unbounded io.ReadAll on decompressed archive contents (io.ReadAll(tr) at ~89 for tar entries and io.ReadAll(rc) at ~132 for ZIP entries). While the HTTP handler caps the uploaded artifact size (maxEntrySize via io.ReadAll(io.LimitReader(...))), gzip/zip bombs can still expand to extremely large per-entry sizes and exhaust memory during analysis.

Additionally, isZIP only returns true for PK\x03\x04 at byte 0; otherwise Analyzer.Check falls back to (nil, nil), skipping static analysis for ZIP/JAR files that don’t match this narrow signature gate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/internal/analysis/analyzer.go` around lines 89 - 92, Analyzer.Check
currently uses unbounded io.ReadAll on tar/zip entry readers (entryData :=
io.ReadAll(tr) and io.ReadAll(rc)), and isZIP only accepts the exact 4‑byte
sequence PK\x03\x04 which misses other valid ZIP/JAR signatures. Replace the
unbounded reads with bounded reads using io.ReadAll(io.LimitReader(tr,
maxEntrySize)) and io.ReadAll(io.LimitReader(rc, maxEntrySize)), and treat reads
that reach the limit as "entry too large" (skip or return a handled error) to
avoid memory blowups; reference the variables entryData, tr, rc and the existing
maxEntrySize constant. Also broaden isZIP to detect ZIP/JAR by checking for the
"PK" prefix (e.g., bytes.HasPrefix(buf, []byte("PK"))) rather than only
PK\x03\x04 so ZIP/JAR archives aren't incorrectly skipped; reference the isZIP
function and Analyzer.Check where it's used.

@KevinGruber2001 KevinGruber2001 merged commit 4b658ab into main Jun 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant