feat(sandbox): integrate OCSF structured logging for sandbox events by johntmyers · Pull Request #720 · NVIDIA/OpenShell

johntmyers · 2026-04-01T04:46:08Z

Summary

Replace ad-hoc tracing log calls across the sandbox with OCSF v1.7.0 structured events using the openshell-ocsf crate (PR #489). Every network connection, process lifecycle event, filesystem policy decision, SSH authentication, and configuration change now emits a typed OCSF event with a human-readable shorthand format and optional JSONL export.

Related Issue

Builds on PR #489 (openshell-ocsf crate) and PR #474 (settings system).

Changes

Foundation

Register ocsf_json_enabled setting (Bool, defaults false) for JSONL export toggle
Replace stdout and file tracing::fmt layers with OcsfShorthandLayer
Add conditional OcsfJsonlLayer with Arc<AtomicBool> hot-toggle (daily rotation, 3 files)
Update LogPushLayer to use format_shorthand() for OCSF events and set level to OCSF
Create process-wide SandboxContext via OnceLock with test fallback

Event Migration (~110 call sites)

Network (proxy.rs, l7/, bypass_monitor.rs, mechanistic_mapper.rs): CONNECT/FORWARD allow/deny, SSRF blocks, L7 decisions, bypass detection (dual-emit with DetectionFinding), DNS failures, inference interception
SSH (ssh.rs): Handshake accepted/denied, nonce replay (dual-emit), direct-tcpip, listen
Process (lib.rs, process.rs): Launch, exit, timeout kill, SIGTERM failure
Filesystem (landlock.rs, sandbox/mod.rs, lib.rs): Landlock apply/unavailable, disk policy load/validate, platform sandbox warning
Config (lib.rs, opa.rs, netns.rs): Policy load/reload, inference routes, TLS setup, settings changes, bypass detection rules, provider env, network namespace lifecycle
Lifecycle (lib.rs): Supervisor start, SSH server ready/failed, poll loop exit

Shorthand Format

2026-04-01T04:04:32.118Z OCSF NET:OPEN [INFO] ALLOWED /usr/bin/curl(58) -> api.github.com:443 [policy:github_api engine:opa]
2026-04-01T04:04:32.690Z OCSF NET:OPEN [MED] DENIED /usr/bin/curl(64) -> httpbin.org:443 [policy:- engine:opa]
2026-04-01T04:04:13.058Z INFO openshell_sandbox: Starting sandbox

OCSF / INFO level label at column 25 for visual scanning
Severity as bracketed suffix after CLASS:ACTIVITY: [INFO], [MED], [HIGH], [CRIT]
Timestamps from the shorthand layer (UTC ISO 8601)
One SSH event per connection (intermediate handshake steps downgraded to debug)

Docker

Add openshell-ocsf to skeleton, stub, and source COPY stages in all Dockerfiles
Touch openshell-ocsf/src/lib.rs in supervisor-workspace to invalidate cargo cache

Docs

New Observability section in user-facing docs: Logging, Accessing Logs, OCSF JSON Export
OCSF logging guidance added to AGENTS.md (decision framework, class selection, severity, builder examples)

Smoke Tests

Attach provider to all phases to avoid GitHub API rate limiting
Add auth headers for credential injection in L4/L7 test phases
Accept 200 or 403 for tls:skip raw tunnel test

Testing

mise run pre-commit passes
110 openshell-ocsf tests pass
397 openshell-sandbox tests pass
Smoke tests pass (5/5 after rate limit fix)
Deployed and verified shorthand format in live sandbox logs
Verified OCSF JSONL output when ocsf_json_enabled is toggled on
Principal engineer review completed and all warnings addressed
E2E tests (require cluster)

Checklist

Follows Conventional Commits
Architecture docs updated (AGENTS.md OCSF guidance)
User-facing docs updated (docs/observability/)
No secrets logged (query params excluded, redacted targets used)
Docker build contexts updated for all Dockerfiles

WIP: Replace ad-hoc tracing calls with OCSF event builders across all sandbox subsystems (network, SSH, process, filesystem, config, lifecycle). - Register ocsf_logging_enabled setting (defaults false) - Replace stdout/file fmt layers with OcsfShorthandLayer - Add conditional OcsfJsonlLayer for /var/log/openshell-ocsf.log - Update LogPushLayer to extract OCSF shorthand for gRPC push - Migrate ~106 log sites to OCSF builders (NetworkActivity, HttpActivity, SshActivity, ProcessActivity, DetectionFinding, ConfigStateChange, AppLifecycle) - Add openshell-ocsf to all Docker build contexts

…limits GitHub's unauthenticated API rate limit (60/hour) causes flaky 403s for Phases 1, 2, and 4. Fix by attaching the provider to all sandboxes and upgrading the Phase 1 policy to L7 so credential injection works. Phase 4 (tls:skip) cannot inject credentials by design, so relax the assertion to accept either 200 or 403 from upstream -- both prove the proxy forwarded the request.

…estamp The display layer (gateway logs, TUI, sandbox logs CLI) already prepends a timestamp. Having one in the shorthand output too produces redundant double-timestamps like: 15:49:11 sandbox INFO 15:49:11.649 I NET:OPEN ALLOWED ... Now the shorthand is just the severity + structured content: 15:49:11 sandbox INFO I NET:OPEN ALLOWED ...

Replace cryptic single-character severity codes (I/L/M/H/C/F) with readable bracketed labels: [LOW], [MED], [HIGH], [CRIT], [FATAL]. Informational severity (the happy-path default) is omitted entirely to keep normal log output clean and avoid redundancy with the tracing-level INFO that the display layer already provides. Before: sandbox INFO I NET:OPEN ALLOWED ... After: sandbox INFO NET:OPEN ALLOWED ... Before: sandbox INFO M NET:OPEN DENIED ... After: sandbox INFO [MED] NET:OPEN DENIED ...

Set the level field to 'OCSF' instead of 'INFO' for OCSF events in the gRPC log push. This visually distinguishes structured OCSF events from plain tracing output in the TUI and CLI sandbox logs: sandbox OCSF NET:OPEN [INFO] ALLOWED python3(42) -> api.example.com:443 sandbox OCSF NET:OPEN [MED] DENIED python3(42) -> blocked.com:443 sandbox INFO Fetching sandbox policy via gRPC

PR #677 added a warn!() for inaccessible Landlock paths in best-effort mode. Convert to ConfigStateChangeBuilder with degraded state so it flows through the OCSF shorthand format consistently.

Match the main openshell.log rotation mechanics (daily, 3 files max) instead of a single unbounded append-only file. Prevents disk exhaustion when ocsf_logging_enabled is left on in long-running sandboxes.

W1: Remove redundant 'OCSF' prefix from shorthand file layer — the class name (NET:OPEN, HTTP:GET) already identifies structured events and the LogPushLayer separately sets the level field. W2: Log a debug message when OCSF_CTX.set() is called a second time instead of silently discarding via let _. W3: Document the boundary between OCSF-migrated events and intentionally plain tracing calls (DEBUG/TRACE, transient, internal plumbing). W4: Migrate remaining iptables LOG rule failure warnings in netns.rs (IPv4 TCP/UDP, IPv6 TCP/UDP) to ConfigStateChangeBuilder for consistency with the IPv4 bypass rule failure already migrated. W5: Migrate malformed inference request warn to NetworkActivity with ActivityId::Refuse and SeverityId::Medium. W6: Use Medium severity for L7 deny decisions (both CONNECT tunnel and FORWARD proxy paths) to match the CONNECT deny severity pattern. Allows and audits remain Informational.

The shorthand logs are already OCSF-structured events. The setting specifically controls the JSONL file export, so the name should reflect that: ocsf_json_enabled.

The OcsfShorthandLayer writes directly to the log file with no outer display layer to supply timestamps. Add a UTC timestamp prefix to every line so the file output matches what tracing::fmt used to provide. Before: CONFIG:VALIDATED [INFO] Validated 'sandbox' user exists in image After: 2026-04-01T15:49:11.649Z CONFIG:VALIDATED [INFO] Validated ...

The supervisor-workspace stage touches sandbox and core sources to force recompilation over the rust-deps dummy stubs, but openshell-ocsf was missing. This caused the Docker cargo cache to use stale ocsf objects from the deps stage, preventing changes to the ocsf crate (like the timestamp fix) from appearing in the final binary. Also adds a shorthand layer test verifying timestamp output, and drafts the observability docs section.

Without a level prefix, OCSF events in the log file have no visual anchor at the position where standard tracing lines show INFO/WARN. This makes scanning the file harder since the eye has nothing consistent to lock onto after the timestamp. Before: 2026-04-01T04:04:13.065Z CONFIG:DISCOVERY [INFO] ... After: 2026-04-01T04:04:13.065Z OCSF CONFIG:DISCOVERY [INFO] ...

- Fix double space in NET:LISTEN, SSH:LISTEN, and other events where action is empty (e.g., 'NET:LISTEN [INFO] 10.200.0.1' -> 'NET:LISTEN [INFO] 10.200.0.1') - Add listen address to SSH:LISTEN event (was empty) - Downgrade SSH handshake intermediate steps (reading preface, verifying) from OCSF events to debug!() traces. Only the final verdict (accepted/denied) is an OCSF event now, reducing noise from 3 events to 1 per SSH connection. - Apply same spacing fix to HTTP shorthand for consistency.

…fixes Align doc examples with the deployed output: - Add OCSF level prefix to all shorthand examples in the log file - Show mixed OCSF + standard tracing in the file format section - Update listen events (no double space, SSH includes address) - Show one SSH:OPEN per connection instead of three - Update grep patterns to use 'OCSF NET:' etc.

Add a Sandbox Logging (OCSF) section to AGENTS.md so agents have in-context guidance for deciding whether new log emissions should use OCSF structured logging or plain tracing. Covers event class selection, severity guidelines, builder API usage, dual-emit pattern for security findings, and the no-secrets rule. Also adds openshell-ocsf to the Architecture Overview table.

These files were already merged to main in separate PRs. They got pulled into our branch during rebase conflict resolution for the deleted docs-preview-pr.yml file.

github-actions · 2026-04-01T04:47:01Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/OpenShell/pr-preview/pr-720/
Built to branch `gh-pages` at 2026-04-01 05:35 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Users access sandboxes via 'openshell sandbox connect', not direct SSH.

The settings CLI requires --key and --value named flags, not positional arguments. Also fix the per-sandbox form: the sandbox name is a positional argument, not a --sandbox flag.

The E2E tests asserted on the old tracing::fmt key=value format (action=allow, l7_decision=audit, FORWARD, L7_REQUEST, always-blocked). Update to match the new OCSF shorthand (ALLOWED/DENIED, HTTP:, NET:, engine:ssrf, policy:).

johntmyers added 16 commits March 31, 2026 21:41

fix(sandbox): convert new Landlock path-skip warning to OCSF

d1f8b00

PR #677 added a warn!() for inaccessible Landlock paths in best-effort mode. Convert to ConfigStateChangeBuilder with degraded state so it flows through the OCSF shorthand format consistently.

fix(sandbox): use rolling appender for OCSF JSONL file

dec0e79

Match the main openshell.log rotation mechanics (daily, 3 files max) instead of a single unbounded append-only file. Prevents disk exhaustion when ocsf_logging_enabled is left on in long-running sandboxes.

refactor(sandbox): rename ocsf_logging_enabled to ocsf_json_enabled

7ae5f31

The shorthand logs are already OCSF-structured events. The setting specifically controls the JSONL file export, so the name should reflect that: ocsf_json_enabled.

fix: remove workflow files accidentally included during rebase

8773024

These files were already merged to main in separate PRs. They got pulled into our branch during rebase conflict resolution for the deleted docs-preview-pr.yml file.

johntmyers requested a review from a team as a code owner April 1, 2026 04:46

johntmyers self-assigned this Apr 1, 2026

johntmyers added the test:e2e Requires end-to-end coverage label Apr 1, 2026

johntmyers added 3 commits March 31, 2026 21:58

docs(observability): use sandbox connect instead of raw SSH

e9b1c06

Users access sandboxes via 'openshell sandbox connect', not direct SSH.

fix(docs): correct settings CLI syntax in OCSF JSON export page

1cc028c

The settings CLI requires --key and --value named flags, not positional arguments. Also fix the per-sandbox form: the sandbox name is a positional argument, not a --sandbox flag.

fix(e2e): update log assertions for OCSF shorthand format

1b67e4f

The E2E tests asserted on the old tracing::fmt key=value format (action=allow, l7_decision=audit, FORWARD, L7_REQUEST, always-blocked). Update to match the new OCSF shorthand (ALLOWED/DENIED, HTTP:, NET:, engine:ssrf, policy:).

davidpeden3 mentioned this pull request Apr 1, 2026

fix(sandbox): relay WebSocket frames after HTTP 101 Switching Protocols #718

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sandbox): integrate OCSF structured logging for sandbox events#720

feat(sandbox): integrate OCSF structured logging for sandbox events#720
johntmyers wants to merge 19 commits intomainfrom
feat/ocsf-log-integration

johntmyers commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-01 05:35 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johntmyers commented Apr 1, 2026

Summary

Related Issue

Changes

Foundation

Event Migration (~110 call sites)

Shorthand Format

Docker

Docs

Smoke Tests

Testing

Checklist

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-01 05:35 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Apr 1, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-01 05:35 UTC.
Preview will be ready when the GitHub Pages deployment is complete.