Skip to content

Fix/security and logic issues#21

Merged
xkilldash9x merged 14 commits into
mainfrom
fix/security-and-logic-issues
May 21, 2026
Merged

Fix/security and logic issues#21
xkilldash9x merged 14 commits into
mainfrom
fix/security-and-logic-issues

Conversation

@xkilldash9x
Copy link
Copy Markdown
Collaborator

No description provided.

ComputeDiff's processFile only handled os.IsNotExist after fsys.Stat,
so any other stat failure (permission denied, I/O error) left info nil
and the subsequent info.Size() call panicked. Surface non-ENOENT stat
errors instead of dereferencing the nil FileInfo.
A signature with EntropyTolerance == 0 was treated as demanding an exact
entropy match, silently dropping valid fuzzy candidates. MatchSignature
and the PebbleDB ScanCandidates implementation both treat 0 as
"unspecified" and substitute the scanner's default tolerance. Align the
JSON backend with that contract so both implementations of the
SignatureProvider interface behave identically.
The fail-closed verdict switch listed models.StatusPreserved ("preserved")
as a passing value. StatusPreserved belongs to the diff-status domain;
an LLM verdict can only ever be MATCH, SUSPICIOUS, LIE, or ERROR (the
output schema and validateOutput both enforce this). The case was dead
and conflated two unrelated string-constant domains.
llm.validateOutput accepts verdict strings case-insensitively (it
upper-cases before checking), but the fail-closed switch in RunAudit
compared against the canonical upper-case constants directly. A valid
"match"/"Match" verdict therefore passed validation yet hit the default
branch and was reported as "unknown verdict received". Upper-case the
verdict before the switch so both checks agree.
The audit step exposed the key as SFW_API_KEY, but the sfw CLI only
resolves the audit key from --api-key, OPENAI_API_KEY, or GEMINI_API_KEY.
SFW_API_KEY is read by no code, so audit mode always exited with
"API Key is required". Re-export the supplied key as OPENAI_API_KEY or
GEMINI_API_KEY, picked by the model name to match the CLI's own logic.
foldSCEV previously wrapped every binary operation in an opaque
SCEVGenericExpr, so a value derived affinely from a loop counter
(e.g. j := i*2 + 3) was never recognised as the recurrence {3, +, 2}.
The canonicalizer's derived-IV virtualization pass therefore never
fired, and loops computing a strided value in different ways produced
different fingerprints.

foldSCEV now applies the standard affine identities -- AddRec +/- a
loop-invariant, AddRec +/- AddRec over the same loop, AddRec scaled by
a loop-invariant, and constant folding -- collapsing the result into a
canonical SCEVAddRec whenever it stays affine. Recurrence-times-
recurrence is quadratic and is deliberately left opaque.

This changes the canonical IR (and therefore fingerprints) of any
function containing loops, so it ships as part of the v4 release.
NewPebbleScanner only rejected schemas newer than CurrentSchemaVersion;
an older schema was silently accepted. Stored TopologyHash values from a
prior canonicalizer are stale and will never match code fingerprinted by
this binary, so opening such a database produced an empty scanner that
quietly returned no alerts.

Fail closed when dbVer < CurrentSchemaVersion with an error directing the
user to re-index. Update the surrounding comment to document that the
schema constant tracks both the gob serialization shape AND the
fingerprint/canonicalization algorithm, since either being out of sync
makes stored signatures unmatchable.
LoadDatabase only handled os.IsNotExist explicitly and fell through to
info.Mode().IsRegular() on every other error. A permission-denied or
EIO from os.Stat returned a nil FileInfo, so the IsRegular() call
panicked instead of surfacing a useful error -- the same defect we just
fixed in pkg/diff (commit 4697e56).

Return early with a wrapped stat error for any non-ENOENT failure and
add a regression test that strips read access from the containing
directory to exercise the EACCES path.
AddSignature (singular) updated the ID->index map; AddSignatures (the
batch path used by MigrateFromJSON and any caller importing many
signatures at once) appended to s.db.Signatures but never touched
s.sigMap. GetSignature consults the map first, so after a batch insert
every signature reported "not found" until the database was reloaded
from disk.

Initialize sigMap lazily and record each batched signature's slice
index. Add a regression test that inserts via the batch path and
verifies GetSignature can resolve each ID.
ProcessFilesParallel guarded each goroutine with a recover() but only
printed a warning. The result slot stayed at its zero value (empty File,
empty ErrorMessage) and hasErrors was never flipped, so a panic during
SSA generation produced a JSON output with a phantom blank entry and
"strict mode" silently exited 0 -- exactly the case strict mode is meant
to fail on.

Inside recover, write a real FileOutput with the original filename and a
"panic during analysis: ..." message under the same mutex, and set
hasErrors. Add a regression test using a FileSystem that panics from
Stat to drive the recovery path without needing pathological Go source.
RunAudit unconditionally delegated to SandboxExec. Inside any
environment that already counts as sandboxed (SFW_SANDBOX_ID set, nested
CI containers, runsc-in-runsc) SandboxExec refuses with "process is
already sandboxed; nested sandboxing is not supported" and the audit
fails closed with no way for the operator to override -- check, diff,
and scan all expose --no-sandbox for exactly this case but audit did
not.

Add the flag to the audit subcommand and a corresponding noSandbox
parameter on RunAudit. When set, or when we detect we are already
sandboxed, run RunDiffLogic directly into a bytes.Buffer instead of
forking a worker. As part of the refactor RunDiffLogic now takes an
io.Writer so audit can capture the diff JSON without piping through
os.Stdout; existing callers pass os.Stdout to preserve current
behaviour.
The "legacy 5-arg" fallback in runWorker's diff case was reachable in
theory only -- no caller ever emits `internal-worker diff -old <p> -new
<p>` (SandboxExec packs args positionally), and the index math
(args[2], args[4]) would have parsed wrong if it were called. The
3-arg path is the only contract we ship.

Collapse to a single len-check and keep the original error string so
the existing TestWorkerModeDispatch substring assertion still holds.
The next release ships breaking changes to the fingerprinting algorithm
(SCEV affine-recurrence folding, 34f1b47) and the schema-version
contract (81b5d80 rejects databases written under any older schema).
Both make signatures produced by v3 binaries unusable in v4, so Go's
module rules require a new major version.

Rewrite every github.com/BlackVectorOps/semantic_firewall/v3 import to
/v4 across go.mod, source, tests, README, and the Copilot agent doc.
action.yml is intentionally untouched -- its 'go install' line and
SFW_VERSION still point at v3.2.0, the last released tag under /v3,
and will be bumped in the v4.0.0 release commit alongside the actual
tag.

The release itself remains gated on the MCP server rebuild that
replaces the one-shot LLM audit; until that lands, this branch carries
the v4 source path but does not yet tag v4.0.0.
The MCP server in semantic_firewall_mcp drives sfw as a library, but
the diff/check/scan helpers it needs all lived in internal/cli, which
Go refuses to import across modules. Move the parts that are pure
library logic -- no flag parsing, no os.Args, no sandbox shellout --
into a new public pkg/api package:

  pkg/api/filesystem.go   FileSystem interface + RealFileSystem
                          (bounded ReadFile that rejects oversized
                          input)
  pkg/api/names.go        ShortFunctionName + containsSpecial, the
                          SSA-name canonicaliser used by diff/scan
                          output
  pkg/api/diff.go         Diff / DiffWithFS as the public entry point,
                          plus CompareFunctions and
                          CalculateTopologyDelta which used to be
                          internal helpers of ComputeDiff
  pkg/api/doc.go          package-level docs
  pkg/api/api_test.go     covers ShortFunctionName edge cases and the
                          two key Diff branches (modification detected,
                          missing-file-treated-as-empty)

internal/cli stops re-implementing those symbols and instead:

  - type-aliases FileSystem and RealFileSystem to the pkg/api versions
    so every existing call site (cli.FileSystem, cli.RealFileSystem,
    MockFileSystem satisfying the interface) keeps compiling
    unchanged;
  - wraps ShortFunctionName as a thin pass-through (functions can't be
    aliased like types);
  - rewrites RunDiffLogic to call api.DiffWithFS directly, dropping
    the local ComputeDiff/CompareFunctions/CalculateTopologyDelta
    definitions.

No behavioural change. Every existing test passes (internal/cli,
cmd/sfw, pkg/diff, pkg/storage), plus three new tests in pkg/api lock
in the moved surface.
@xkilldash9x xkilldash9x merged commit 165add5 into main May 21, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant