security: harden output file creation with symlink checks and 0600 mode [5/5]#173
security: harden output file creation with symlink checks and 0600 mode [5/5]#173
Conversation
When the agent runs as root, output files were created via os.Create which uses the process umask (typically 0022), producing world-readable files. Additionally, a local attacker could place a symlink at an expected output path to redirect writes to a sensitive system file. Add safeCreateFile() in both the file writer and CSV converter that: - Checks for symlinks via os.Lstat before opening (rejects symlinks) - Creates files with explicit 0600 mode (owner read/write only) Also chmod the audit log file to 0600 after lumberjack creates it. Signed-off-by: Rodrigo Sampaio Vaz <rvaz@nvidia.com>
📝 WalkthroughWalkthroughSecurity-critical updates introducing symlink-safe file creation across three modules (CSV exporter, file writer, and audit logging). Each module implements helper functions using Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e112cae27b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if err := prepareAuditLogFile(logFile); err != nil { | ||
| w = zapcore.AddSync(os.Stdout) |
There was a problem hiding this comment.
Keep audit logs on file sink after transient preflight errors
Falling back to os.Stdout on any prepareAuditLogFile error permanently disables file-backed audit logging for this process, even when the configured path becomes valid later (for example, the directory is created after startup). Because NewAuditLogger does not return an error, this silently breaks retention/rotation expectations for callers that explicitly provided logFile; before this change, the logger still targeted the file path. Consider only rejecting known unsafe cases (like symlink detection) or at least surfacing the failure while preserving file sink behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/exporter/converter/csv.go (1)
100-115: Consider centralizingsafeCreateFileinto a shared internal utility.This helper is now duplicated here and in
internal/exporter/writer/file.go; centralizing reduces security drift risk if behavior needs future updates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/exporter/converter/csv.go` around lines 100 - 115, The safeCreateFile implementation is duplicated (in safeCreateFile in internal/exporter/converter/csv.go and the same helper in internal/exporter/writer/file.go); extract and centralize it into a single internal utility (e.g., internal/fs or internal/util) as a public or package-exported function (e.g., SafeCreateFile) preserving the exact behavior (use os.OpenFile with unix.O_NOFOLLOW, ELOOP check, Chmod to 0o600, close on error and wrapped errors), then update both caller sites to import and call that centralized function (replace local safeCreateFile usages in csv.go and writer/file.go with the new utility function) to avoid duplication and security drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@third_party/fleet-intelligence-sdk/pkg/log/audit.go`:
- Around line 99-109: The current code does a one-time O_NOFOLLOW preflight in
prepareAuditLogFile but then hands lumberjack.Logger a path (Filename) which
opens the file later without O_NOFOLLOW, creating a TOCTOU/symlink swap risk and
silently falling back to stdout on preflight failure; fix by (1) stopping use of
lumberjack.Logger with Filename: open the audit file yourself with O_NOFOLLOW
(use syscall.Open or os.OpenFile with the platform-specific flag) and pass the
resulting *os.File to zapcore.AddSync(w) (use the file descriptor for all
writes), or replace lumberjack with a rotation implementation that accepts an
already-open FD and performs rotations by safely creating/truncating new files
with O_NOFOLLOW, and (2) remove the silent fallback to stdout in the error
branch of prepareAuditLogFile so prepareAuditLogFile errors are returned (or
cause startup failure) instead of defaulting to os.Stdout; update references to
prepareAuditLogFile, lumberjack.Logger, and variable w accordingly.
- Around line 99-101: Currently the code silently falls back to
zapcore.AddSync(os.Stdout) when prepareAuditLogFile(...) returns an error, which
can expose audit logs and hide failures; instead, on error from
prepareAuditLogFile return or propagate the error to the caller (do not reassign
w to stdout), and log the failure to stderr or the process logger so the
deployment can detect misconfiguration. Concretely: in the block handling
prepareAuditLogFile in audit.go, replace the silent assignment to
zapcore.AddSync(os.Stdout) with code that logs the prepareAuditLogFile error
(including the error value) and returns that error (or wraps it) from the
enclosing initializer function so the caller can handle startup failure, keeping
use of w only when the file setup succeeds.
---
Nitpick comments:
In `@internal/exporter/converter/csv.go`:
- Around line 100-115: The safeCreateFile implementation is duplicated (in
safeCreateFile in internal/exporter/converter/csv.go and the same helper in
internal/exporter/writer/file.go); extract and centralize it into a single
internal utility (e.g., internal/fs or internal/util) as a public or
package-exported function (e.g., SafeCreateFile) preserving the exact behavior
(use os.OpenFile with unix.O_NOFOLLOW, ELOOP check, Chmod to 0o600, close on
error and wrapped errors), then update both caller sites to import and call that
centralized function (replace local safeCreateFile usages in csv.go and
writer/file.go with the new utility function) to avoid duplication and security
drift.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 6e100627-6feb-4fb1-a56f-bbf68005c675
📒 Files selected for processing (6)
internal/exporter/converter/csv.gointernal/exporter/converter/csv_test.gointernal/exporter/writer/file.gointernal/exporter/writer/file_test.gothird_party/fleet-intelligence-sdk/pkg/log/audit.gothird_party/fleet-intelligence-sdk/pkg/log/audit_test.go
| if err := prepareAuditLogFile(logFile); err != nil { | ||
| w = zapcore.AddSync(os.Stdout) | ||
| } else { | ||
| w = zapcore.AddSync(&lumberjack.Logger{ | ||
| Filename: logFile, | ||
| MaxSize: 128, // megabytes | ||
| MaxBackups: 5, | ||
| MaxAge: 3, // days | ||
| Compress: true, // compress the rotated files | ||
| }) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In gopkg.in/natefinch/lumberjack.v2, what flags and mode are used when opening/creating log files? Does it support O_NOFOLLOW or custom open flags for every open/rotate operation?
💡 Result:
In gopkg.in/natefinch/lumberjack.v2 (original natefinch/lumberjack v2.0 branch, latest release v2.2.1 from 2023), log files are opened using os.OpenFile with hardcoded flags and modes, no support for O_NOFOLLOW or custom open flags. - Existing files (open/append): os.O_APPEND | os.O_WRONLY, mode 0644 Code: file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY, 0644) - New files (create/rotate): os.O_CREATE | os.O_WRONLY | os.O_TRUNC, mode os.FileMode(0600) (copied from existing if present) Code: f, err := os.OpenFile(name, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, mode) These occur in openExistingOrNew (for first write or append) and openNew (for rotation/create new). No configuration options for custom flags/modes in Logger struct or functions. Forks like chaseSpace/lumberjack add FileMode but not flags, and original does not support O_NOFOLLOW or customization.
Citations:
- 1: https://github.com/natefinch/lumberjack/blob/v2.0/lumberjack.go
- 2: https://github.com/natefinch/lumberjack/blob/v2.2.1/lumberjack.go
- 3: https://www.mynewflag.com
- 4: https://pkg.go.dev/github.com/wangyougui/lumberjack.v2
- 5: https://git.linux.ucla.edu/di-gital/matterbridge/src/commit/c2b8e298d8c0a15a0e6489b658efe58030a18164/vendor/gopkg.in/natefinch/lumberjack.v2/lumberjack.go
TOCTOU vulnerability: preflight O_NOFOLLOW check is bypassed by lumberjack's path-based file operations.
prepareAuditLogFile validates the path with O_NOFOLLOW once, then closes the fd. However, lumberjack.Logger opens/reopens Filename by path using hardcoded flags (O_APPEND|O_WRONLY for existing, O_CREATE|O_WRONLY|O_TRUNC for rotation) without O_NOFOLLOW support. A symlink swap between the preflight validation and lumberjack's first open—or during any rotation cycle—bypasses the guard entirely.
Additionally, silent fallback to stdout when preflight fails can mask configuration errors and silently alter data handling behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@third_party/fleet-intelligence-sdk/pkg/log/audit.go` around lines 99 - 109,
The current code does a one-time O_NOFOLLOW preflight in prepareAuditLogFile but
then hands lumberjack.Logger a path (Filename) which opens the file later
without O_NOFOLLOW, creating a TOCTOU/symlink swap risk and silently falling
back to stdout on preflight failure; fix by (1) stopping use of
lumberjack.Logger with Filename: open the audit file yourself with O_NOFOLLOW
(use syscall.Open or os.OpenFile with the platform-specific flag) and pass the
resulting *os.File to zapcore.AddSync(w) (use the file descriptor for all
writes), or replace lumberjack with a rotation implementation that accepts an
already-open FD and performs rotations by safely creating/truncating new files
with O_NOFOLLOW, and (2) remove the silent fallback to stdout in the error
branch of prepareAuditLogFile so prepareAuditLogFile errors are returned (or
cause startup failure) instead of defaulting to os.Stdout; update references to
prepareAuditLogFile, lumberjack.Logger, and variable w accordingly.
| if err := prepareAuditLogFile(logFile); err != nil { | ||
| w = zapcore.AddSync(os.Stdout) | ||
| } else { |
There was a problem hiding this comment.
Avoid silent downgrade to stdout when secure file setup fails.
At Line 99-Line 101, failures are swallowed and logging silently switches destinations. This can hide misconfiguration/security incidents and unexpectedly expose audit logs in stdout sinks.
Suggested minimal visibility fix
if logFile != "" {
if err := prepareAuditLogFile(logFile); err != nil {
+ _, _ = fmt.Fprintf(os.Stderr, "audit logger: failed to prepare secure log file %q: %v; falling back to stdout\n", logFile, err)
w = zapcore.AddSync(os.Stdout)
} else {
w = zapcore.AddSync(&lumberjack.Logger{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@third_party/fleet-intelligence-sdk/pkg/log/audit.go` around lines 99 - 101,
Currently the code silently falls back to zapcore.AddSync(os.Stdout) when
prepareAuditLogFile(...) returns an error, which can expose audit logs and hide
failures; instead, on error from prepareAuditLogFile return or propagate the
error to the caller (do not reassign w to stdout), and log the failure to stderr
or the process logger so the deployment can detect misconfiguration. Concretely:
in the block handling prepareAuditLogFile in audit.go, replace the silent
assignment to zapcore.AddSync(os.Stdout) with code that logs the
prepareAuditLogFile error (including the error value) and returns that error (or
wraps it) from the enclosing initializer function so the caller can handle
startup failure, keeping use of w only when the file setup succeeds.
| @@ -4,10 +4,13 @@ | |||
| package log | |||
|
|
|||
| import ( | |||
There was a problem hiding this comment.
Are we using this sdk pkg/log? Would like to see if we can remove this.
There was a problem hiding this comment.
Yes we do extensively @jingxiang-z, it may be worth to refactor our logging strategy and remove this from the SDK, the SDK should mostly have the metrics collection packages
When the agent runs as root, output files were created via os.Create which uses the process umask (typically 0022), producing world-readable files. Additionally, a local attacker could place a symlink at an expected output path to redirect writes to a sensitive system file.
Add safeCreateFile() in both the file writer and CSV converter that:
Also chmod the audit log file to 0600 after lumberjack creates it.
Description
Checklist
Summary by CodeRabbit
Bug Fixes