Fix/security and logic issues#21
Merged
Merged
Conversation
ComputeDiff's processFile only handled os.IsNotExist after fsys.Stat, so any other stat failure (permission denied, I/O error) left info nil and the subsequent info.Size() call panicked. Surface non-ENOENT stat errors instead of dereferencing the nil FileInfo.
A signature with EntropyTolerance == 0 was treated as demanding an exact entropy match, silently dropping valid fuzzy candidates. MatchSignature and the PebbleDB ScanCandidates implementation both treat 0 as "unspecified" and substitute the scanner's default tolerance. Align the JSON backend with that contract so both implementations of the SignatureProvider interface behave identically.
The fail-closed verdict switch listed models.StatusPreserved ("preserved")
as a passing value. StatusPreserved belongs to the diff-status domain;
an LLM verdict can only ever be MATCH, SUSPICIOUS, LIE, or ERROR (the
output schema and validateOutput both enforce this). The case was dead
and conflated two unrelated string-constant domains.
llm.validateOutput accepts verdict strings case-insensitively (it upper-cases before checking), but the fail-closed switch in RunAudit compared against the canonical upper-case constants directly. A valid "match"/"Match" verdict therefore passed validation yet hit the default branch and was reported as "unknown verdict received". Upper-case the verdict before the switch so both checks agree.
The audit step exposed the key as SFW_API_KEY, but the sfw CLI only resolves the audit key from --api-key, OPENAI_API_KEY, or GEMINI_API_KEY. SFW_API_KEY is read by no code, so audit mode always exited with "API Key is required". Re-export the supplied key as OPENAI_API_KEY or GEMINI_API_KEY, picked by the model name to match the CLI's own logic.
foldSCEV previously wrapped every binary operation in an opaque
SCEVGenericExpr, so a value derived affinely from a loop counter
(e.g. j := i*2 + 3) was never recognised as the recurrence {3, +, 2}.
The canonicalizer's derived-IV virtualization pass therefore never
fired, and loops computing a strided value in different ways produced
different fingerprints.
foldSCEV now applies the standard affine identities -- AddRec +/- a
loop-invariant, AddRec +/- AddRec over the same loop, AddRec scaled by
a loop-invariant, and constant folding -- collapsing the result into a
canonical SCEVAddRec whenever it stays affine. Recurrence-times-
recurrence is quadratic and is deliberately left opaque.
This changes the canonical IR (and therefore fingerprints) of any
function containing loops, so it ships as part of the v4 release.
NewPebbleScanner only rejected schemas newer than CurrentSchemaVersion; an older schema was silently accepted. Stored TopologyHash values from a prior canonicalizer are stale and will never match code fingerprinted by this binary, so opening such a database produced an empty scanner that quietly returned no alerts. Fail closed when dbVer < CurrentSchemaVersion with an error directing the user to re-index. Update the surrounding comment to document that the schema constant tracks both the gob serialization shape AND the fingerprint/canonicalization algorithm, since either being out of sync makes stored signatures unmatchable.
LoadDatabase only handled os.IsNotExist explicitly and fell through to info.Mode().IsRegular() on every other error. A permission-denied or EIO from os.Stat returned a nil FileInfo, so the IsRegular() call panicked instead of surfacing a useful error -- the same defect we just fixed in pkg/diff (commit 4697e56). Return early with a wrapped stat error for any non-ENOENT failure and add a regression test that strips read access from the containing directory to exercise the EACCES path.
AddSignature (singular) updated the ID->index map; AddSignatures (the batch path used by MigrateFromJSON and any caller importing many signatures at once) appended to s.db.Signatures but never touched s.sigMap. GetSignature consults the map first, so after a batch insert every signature reported "not found" until the database was reloaded from disk. Initialize sigMap lazily and record each batched signature's slice index. Add a regression test that inserts via the batch path and verifies GetSignature can resolve each ID.
ProcessFilesParallel guarded each goroutine with a recover() but only printed a warning. The result slot stayed at its zero value (empty File, empty ErrorMessage) and hasErrors was never flipped, so a panic during SSA generation produced a JSON output with a phantom blank entry and "strict mode" silently exited 0 -- exactly the case strict mode is meant to fail on. Inside recover, write a real FileOutput with the original filename and a "panic during analysis: ..." message under the same mutex, and set hasErrors. Add a regression test using a FileSystem that panics from Stat to drive the recovery path without needing pathological Go source.
RunAudit unconditionally delegated to SandboxExec. Inside any environment that already counts as sandboxed (SFW_SANDBOX_ID set, nested CI containers, runsc-in-runsc) SandboxExec refuses with "process is already sandboxed; nested sandboxing is not supported" and the audit fails closed with no way for the operator to override -- check, diff, and scan all expose --no-sandbox for exactly this case but audit did not. Add the flag to the audit subcommand and a corresponding noSandbox parameter on RunAudit. When set, or when we detect we are already sandboxed, run RunDiffLogic directly into a bytes.Buffer instead of forking a worker. As part of the refactor RunDiffLogic now takes an io.Writer so audit can capture the diff JSON without piping through os.Stdout; existing callers pass os.Stdout to preserve current behaviour.
The "legacy 5-arg" fallback in runWorker's diff case was reachable in theory only -- no caller ever emits `internal-worker diff -old <p> -new <p>` (SandboxExec packs args positionally), and the index math (args[2], args[4]) would have parsed wrong if it were called. The 3-arg path is the only contract we ship. Collapse to a single len-check and keep the original error string so the existing TestWorkerModeDispatch substring assertion still holds.
The next release ships breaking changes to the fingerprinting algorithm (SCEV affine-recurrence folding, 34f1b47) and the schema-version contract (81b5d80 rejects databases written under any older schema). Both make signatures produced by v3 binaries unusable in v4, so Go's module rules require a new major version. Rewrite every github.com/BlackVectorOps/semantic_firewall/v3 import to /v4 across go.mod, source, tests, README, and the Copilot agent doc. action.yml is intentionally untouched -- its 'go install' line and SFW_VERSION still point at v3.2.0, the last released tag under /v3, and will be bumped in the v4.0.0 release commit alongside the actual tag. The release itself remains gated on the MCP server rebuild that replaces the one-shot LLM audit; until that lands, this branch carries the v4 source path but does not yet tag v4.0.0.
The MCP server in semantic_firewall_mcp drives sfw as a library, but
the diff/check/scan helpers it needs all lived in internal/cli, which
Go refuses to import across modules. Move the parts that are pure
library logic -- no flag parsing, no os.Args, no sandbox shellout --
into a new public pkg/api package:
pkg/api/filesystem.go FileSystem interface + RealFileSystem
(bounded ReadFile that rejects oversized
input)
pkg/api/names.go ShortFunctionName + containsSpecial, the
SSA-name canonicaliser used by diff/scan
output
pkg/api/diff.go Diff / DiffWithFS as the public entry point,
plus CompareFunctions and
CalculateTopologyDelta which used to be
internal helpers of ComputeDiff
pkg/api/doc.go package-level docs
pkg/api/api_test.go covers ShortFunctionName edge cases and the
two key Diff branches (modification detected,
missing-file-treated-as-empty)
internal/cli stops re-implementing those symbols and instead:
- type-aliases FileSystem and RealFileSystem to the pkg/api versions
so every existing call site (cli.FileSystem, cli.RealFileSystem,
MockFileSystem satisfying the interface) keeps compiling
unchanged;
- wraps ShortFunctionName as a thin pass-through (functions can't be
aliased like types);
- rewrites RunDiffLogic to call api.DiffWithFS directly, dropping
the local ComputeDiff/CompareFunctions/CalculateTopologyDelta
definitions.
No behavioural change. Every existing test passes (internal/cli,
cmd/sfw, pkg/diff, pkg/storage), plus three new tests in pkg/api lock
in the moved surface.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.