fix(network-agent): allow dots in sandboxID validation by rogeroger-yu · Pull Request #163 · TencentCloud/CubeSandbox

rogeroger-yu · 2026-05-08T08:29:27Z

Summary

Fix overly strict sandboxID validation in network-agent's state_store that
rejected any ID containing . (dot). IDs like cubesandbox-python-slim-3.11-nydus_0
(containing version numbers) were incorrectly flagged as path traversal.
Changed to only reject /, \, and .. — consistent with Cubelet's
pathutil.ValidateID().
Fix ansible control role crash when /etc/NetworkManager/dnsmasq.d does not
exist (hosts using systemd-resolved instead of NM-dnsmasq).
Skip nydus-image tarball download in nydus-uffd role when the binary is
already pre-staged locally.

Root Cause

state_store.go:path() used strings.ContainsAny(sandboxID, "/\\.") which
treats a single . the same as path separators. A dot in a version string
(e.g. 3.11) is perfectly safe — only .. is a traversal risk.

Test Plan

Template creation with dotted image tag succeeds end-to-end
Full multi-node ansible deploy passes (0 failures across 5 nodes)
All compute nodes report HEALTHY after network-agent restart

Original Error

<ie-training/cubesandbox-python-slim:3.11 --writable-layer-size 1G --storage-media nydus --expose-port 49983 --probe 49983 --probe-path /health --template-id cubesandbox-python-slim-3.11-nydus
job_id: d94ebd92-785e-4cfa-ba7d-b19e9df80aa8
template_id: cubesandbox-python-slim-3.11-nydus
attempt_no: 1
operation: CREATE
artifact_id:
status: PENDING
phase: PULLING
progress: 0%
distribution: 0/0 ready, 0 failed
root@ak-ai-engine-sandbox05:~/Workspace/code/cube-sandbox# /usr/local/services/cubetoolbox/CubeMaster/bin/cubemastercli tpl watch --job-id d94ebd92-785e-4cfa-ba7d-b19e9df80aa8
job_id: d94ebd92-785e-4cfa-ba7d-b19e9df80aa8
template_id: cubesandbox-python-slim-3.11-nydus
attempt_no: 1
operation: CREATE
artifact_id: rfs-85671e04f4cdfcd700f581a1
status: FAILED
phase: CREATING_TEMPLATE
progress: 100%
distribution: 1/1 ready, 0 failed
template_spec_fingerprint: 85671e04f4cdfcd700f581a19887ccdf7de95430f820b195973b99d2687f8ab3
template_status: FAILED
error: template cubesandbox-python-slim-3.11-nydus creation failed: network-agent EnsureNetwork failed: grpc_code=Unknown grpc_msg=invalid sandboxID "cubesandbox-python-slim-3.11-nydus_0": contains path separators or traversal characters
cubemastercli run fail: template cubesandbox-python-slim-3.11-nydus creation failed: network-agent EnsureNetwork failed: grpc_code=Unknown grpc_msg=invalid sandboxID "cubesandbox-python-slim-3.11-nydus_0": contains path separators or traversal characters

fslongjin · 2026-05-08T13:43:44Z

/cubebot review

github-actions · 2026-05-08T13:44:08Z

Claude finished @fslongjin's task in 1m 10s —— View job

PR Review: fix(network-agent): allow dots in sandboxID validation

Read and understand the code change
Review for correctness, security, and consistency
Post review feedback

Verdict: LGTM — the fix is correct and well-aligned with existing patterns

The change (state_store.go:95):

// Before:
if strings.ContainsAny(sandboxID, `/\.`) || sandboxID == "" {

// After:
if sandboxID == "" || strings.ContainsAny(sandboxID, `/\`) || strings.Contains(sandboxID, "..") {

Correctness

The fix correctly distinguishes between a single . (safe, used in version strings like 3.11) and .. (path traversal). The original ContainsAny(sandboxID, /.) treated any dot as dangerous, which broke IDs like cubesandbox-python-slim-3.11-nydus_0.

Consistency with codebase

This now matches exactly the pattern used in Cubelet/pkg/pathutil/validate.go:17:

if strings.ContainsAny(id, `/\`) || strings.Contains(id, "..") {

It also aligns with Cubelet/pkg/utils/pathsec.go:21-22 which rejects ., .., and .. substrings:

if strings.ContainsAny(untrusted, `/\`) || untrusted == "." || untrusted == ".." ||
    strings.Contains(untrusted, "..") {

One subtle difference: pathsec.SafeJoinPath additionally rejects the bare . component (untrusted == "."). The state_store.path() function doesn't reject a bare . sandboxID. This is also consistent with pathutil.ValidateSafeID() which doesn't reject bare . either. Since the path() function appends .json, a bare . would produce <dir>/.json — a hidden file, not a traversal. This is a pre-existing behavior and not a regression introduced by this PR.

Security

Path traversal protection remains intact:

/ and \ are still rejected (path separators)
.. is still rejected (traversal sequence)
Single . in version strings like 3.11 is correctly allowed

Suggestion (non-blocking)

The existing state_store_test.go only tests the happy path. Adding a few validation test cases would prevent regressions and document the expected behavior:

func TestStateStorePathValidation(t *testing.T) {
    store, _ := newStateStore(t.TempDir())
    tests := []struct {
        id      string
        wantErr bool
    }{
        {"valid-id-123", false},
        {"cubesandbox-python-slim-3.11-nydus_0", false},  // single dots ok
        {"", true},                                        // empty
        {"foo/bar", true},                                 // forward slash
        {"foo\\bar", true},                                // backslash
        {"foo..bar", true},                                // double dot
        {"../etc/passwd", true},                           // traversal prefix
    }
    for _, tt := range tests {
        _, err := store.path(tt.id)
        if (err != nil) != tt.wantErr {
            t.Errorf("path(%q) err=%v, wantErr=%v", tt.id, err, tt.wantErr)
        }
    }
}

This mirrors the test coverage that already exists for pathutil.ValidateSafeID() in Cubelet/pkg/pathutil/validate_test.go.

Fix this →

fslongjin · 2026-05-08T13:47:10Z

@rogeroger-yu Hi, could you please add some unit tests to stabilize this behavior?

rogeroger-yu · 2026-05-08T15:31:43Z

@fslongjin okey.

fslongjin · 2026-05-09T01:52:20Z

After reviewing this, I don't think this is the right way to fix it. The sandbox ID should be designed as a UUID. The root cause here is that when CubeMaster generates the template, it sends the template OCI image name to Cubelet as the sandbox ID.
A possible solution would be to modify CubeMaster to either generate a UUID or pass down the template ID to the Cubelet layer.
Note: For normal sandbox creation requests, the sandbox ID is generated by Cubelet. However, the current step is creating a sandbox template (for app snapshots), which is why CubeMaster specifies the sandbox ID.

rogeroger-yu · 2026-05-09T06:39:55Z

@fslongjin I agree with your judgment and resubmit a version. I will review it after my hard work.

chenhengqi · 2026-05-09T07:08:27Z

-
-		templateID, ok := opts.GetSnapshotTemplateID()
-		if !ok {
+		if _, ok := opts.GetSnapshotTemplateID(); !ok {


This if statement should be dropped.

Cubelet: generate UUID-based sandboxID for snapshot creation

chenhengqi · 2026-05-11T07:49:02Z

cc @fslongjin

kinwin-ustc · 2026-05-16T01:52:37Z

/cubebot review

github-actions · 2026-05-16T01:52:47Z

Claude finished @kinwin-ustc's task in 1m 58s —— View job

Review: fix: use UUID-based sandboxID

Read and understand the code changes
Review for correctness, security, and consistency
Post review feedback

Verdict: Core fix is correct, but has unaddressed downstream breakage

The UUID-based approach (as suggested by @fslongjin) is the right fix — prevent dotted IDs from ever reaching state_store rather than weakening validation. However, the change leaves two stale templateID + "_0" references elsewhere in the codebase that will break after this commit.

Change 1: `Cubelet/plugins/cube/internals/createid/plugin.go` — Correct

Before: opts.SandboxID = templateID + "_" + "0" — produced dotted IDs like cubesandbox-python-slim-3.11-nydus_0 when the OCI image tag contained a version number.

After: opts.SandboxID = utils.GenerateID() + "_snapshot" — utils.GenerateID() returns a hex-encoded UUID (e.g. a1b2c3d4e5f6...). No dots, no slashes, no traversal risk.

The outer if opts.IsCreateSnapshot() is preserved, which is correct — non-snapshot IDs are UUIDs without suffix, snapshot IDs carry _snapshot for traceability.

Security: The dropped GetSnapshotTemplateID() call removed the only code path where external (OCI-image-derived) string data entered the sandboxID. This is a genuine hardening improvement.

Change 2: `network-agent/internal/service/state_store_test.go` — Tests are correct

The test cases align with the current state_store.path() validation (strings.ContainsAny(sandboxID, "/\\.") || sandboxID == ""):

Test case	ID	wantErr	Correct?
valid simple id	`sb-1`	false	✓ UUID-safe format
valid uuid	`a1b2c3d4e5f6`	false	✓ matches GenerateID output
valid snapshot id	`a1b2c3d4e5f6_snapshot`	false	✓ matches new snapshot format
empty string	``	true	✓
contains dot	`sb.1`	true	✓ still rejected by state_store
forward slash	`a/b`	true	✓
backslash	`a\\b`	true	✓
dot prefix	`.hidden`	true	✓
double dot traversal	`..`	true	✓
oci image name	`registry.example.com/path/image:tag`	true	✓

The oci image name case is good documentation: it proves state_store remains strict — but since Cubelet no longer sends IDs in this format, the validation will never fire in normal operation.

⚠️ Issue: Stale `templateID + "_0"` references will break

Two files still hardcode the old sandboxID format and will become stale after this commit:

1. Cubelet/services/cubebox/appsnapshot.go:116

expectedSandboxID := templateID + "_0"

This cleanup code (triggered when cubebox.Create returns PreConditionFailed) reconstructs the sandbox ID in the old format. With UUID-based IDs, there's no deterministic mapping from templateID to sandboxID, so the cleanup will attempt to destroy a non-existent sandbox. The correct approach would be to use createRsp.SandboxID (the actual sandboxID from the workflow response).

2. Cubelet/plugins/cbri/cubeboxcbri/cubebox.go:210

annotations[constants.AnnotationAppSnapshotContainerID] = templateID + "_0"

This sets an annotation in the old format. If anything downstream reads this annotation and expects it to match the sandboxID, it will break. The annotation value is no longer correlated with the actual sandboxID.

These files are NOT modified by this PR, but they will be logically inconsistent after this change lands.

Suggestion: Fix `appsnapshot.go:116` to use the response sandboxID

The cleanup path should reference createRsp.SandboxID instead of reconstructing the ID:

// Before (appsnapshot.go:116):
expectedSandboxID := templateID + "_0"

// After:
expectedSandboxID := createRsp.SandboxID

Fix this →

Minor: PR description may need updating

The PR body still references "Fix overly strict sandboxID validation in network-agent's state_store" but state_store.go:95 was not changed — it still uses strings.ContainsAny(sandboxID, "/\\."). The actual fix is at the Cubelet level (use UUIDs instead of template IDs). The description could be updated to reflect the actual approach.

Summary

Aspect	Assessment
Core fix	✅ Correct
Test coverage	✅ Good
Security hardening	✅ Improved (no external string data enters sandboxID)
Consistency with codebase	✅ Aligns with reviewer guidance
Downstream correctness	⚠️ `appsnapshot.go:116` and `cubebox.go:210` need updating

rogeroger-yu requested a review from fslongjin as a code owner May 8, 2026 08:29

rogeroger-yu force-pushed the fix-net-name-verify branch from 164bd78 to 1713a7c Compare May 8, 2026 17:57

rogeroger-yu requested review from chenhengqi, ls-ggg, tinklone and up2wing as code owners May 8, 2026 17:57

rogeroger-yu force-pushed the fix-net-name-verify branch from 1713a7c to e15e0a4 Compare May 8, 2026 17:59

rogeroger-yu force-pushed the fix-net-name-verify branch from e15e0a4 to 16fd291 Compare May 9, 2026 06:38

chenhengqi reviewed May 9, 2026

View reviewed changes

rogeroger-yu force-pushed the fix-net-name-verify branch from 16fd291 to 5a7bee8 Compare May 9, 2026 08:44

chenhengqi reviewed May 9, 2026

View reviewed changes

Comment thread network-agent/internal/service/state_store_test.go

fix: use UUID-based sandboxID

4f8cedb

Cubelet: generate UUID-based sandboxID for snapshot creation

rogeroger-yu force-pushed the fix-net-name-verify branch from 5a7bee8 to 4f8cedb Compare May 9, 2026 09:47

rogeroger-yu requested a review from chenhengqi May 11, 2026 02:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(network-agent): allow dots in sandboxID validation#163

fix(network-agent): allow dots in sandboxID validation#163
rogeroger-yu wants to merge 1 commit into
TencentCloud:masterfrom
rogeroger-yu:fix-net-name-verify

rogeroger-yu commented May 8, 2026

Uh oh!

fslongjin commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

fslongjin commented May 8, 2026

Uh oh!

rogeroger-yu commented May 8, 2026

Uh oh!

fslongjin commented May 9, 2026

Uh oh!

rogeroger-yu commented May 9, 2026

Uh oh!

chenhengqi May 9, 2026

Uh oh!

Uh oh!

chenhengqi commented May 11, 2026

Uh oh!

kinwin-ustc commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rogeroger-yu commented May 8, 2026

Summary

Root Cause

Test Plan

Original Error

Uh oh!

fslongjin commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(network-agent): allow dots in sandboxID validation

Verdict: LGTM — the fix is correct and well-aligned with existing patterns

Correctness

Consistency with codebase

Security

Suggestion (non-blocking)

Uh oh!

fslongjin commented May 8, 2026

Uh oh!

rogeroger-yu commented May 8, 2026

Uh oh!

fslongjin commented May 9, 2026

Uh oh!

rogeroger-yu commented May 9, 2026

Uh oh!

chenhengqi May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenhengqi commented May 11, 2026

Uh oh!

kinwin-ustc commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: fix: use UUID-based sandboxID

Verdict: Core fix is correct, but has unaddressed downstream breakage

Change 1: Cubelet/plugins/cube/internals/createid/plugin.go — Correct

Change 2: network-agent/internal/service/state_store_test.go — Tests are correct

⚠️ Issue: Stale templateID + "_0" references will break

Suggestion: Fix appsnapshot.go:116 to use the response sandboxID

Minor: PR description may need updating

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

Change 1: `Cubelet/plugins/cube/internals/createid/plugin.go` — Correct

Change 2: `network-agent/internal/service/state_store_test.go` — Tests are correct

⚠️ Issue: Stale `templateID + "_0"` references will break

Suggestion: Fix `appsnapshot.go:116` to use the response sandboxID