Handle non-root Kubernetes exec user context by mfreeman451 · Pull Request #113 · GoogleCloudPlatform/scion

mfreeman451 · 2026-04-10T00:10:09Z

Summary

wrap Kubernetes exec and attach commands in a user-aware shell instead of always forcing su - scion
reuse the pod's scion.username annotation when choosing the target user for execs
add parity coverage for the wrapper and username lookup

Testing

go test ./pkg/runtime

Validation

deployed the stacked server image in CarverAuto K8s
started a hosted agent in namespace scion on donbot
confirmed scion look returns terminal output instead of failing with su: Authentication failure

mfreeman451

lgtm

ptone · 2026-04-11T21:17:43Z

PR #113 Review: Handle non-root Kubernetes exec user context

Executive Summary

This PR refactors Kubernetes exec/attach to dynamically detect the running user instead of unconditionally wrapping every command in su - scion. The change is medium risk — the core logic is sound and well-tested, but there is one notable issue around annotation-sourced usernames being passed unsanitized into shell commands.

Critical Issues

1. Unsanitized annotation value flows into shell command (Medium-High Severity)

File: pkg/runtime/k8s_runtime.go

execTargetUsername() reads the scion.username annotation directly from the pod and passes it through buildExecCommandForUser() into a su invocation. If an attacker can control pod annotations (e.g., via a compromised hub or RBAC misconfiguration), the username value is embedded in a su - command without validation.

While su itself treats the argument as a username (not a shell command), a username like --help could cause unexpected behavior, and the value also flows into shellJoin where it becomes part of the -c argument.

// Current code — no validation
func execTargetUsername(pod *corev1.Pod) string {
    if pod != nil {
        if u := strings.TrimSpace(pod.Annotations["scion.username"]); u != "" {
            return u
        }
    }
    return "scion"
}

Suggested Fix:

import "regexp"

var validUsername = regexp.MustCompile(`^[a-z_][a-z0-9_-]{0,31}$`)

func execTargetUsername(pod *corev1.Pod) string {
    if pod != nil {
        if u := strings.TrimSpace(pod.Annotations["scion.username"]); u != "" && validUsername.MatchString(u) {
            return u
        }
    }
    return "scion"
}

2. `commandForExec` performs an extra exec per call (Performance / Reliability)

File: pkg/runtime/k8s_runtime.go, commandForExec()

Every Exec() and Attach() call now triggers two API round-trips: one id -un exec (via currentExecUser) and one for the actual command. This doubles the latency for every exec operation and introduces a new failure mode — if the id -un probe fails transiently, the fallback path is used, which may produce incorrect behavior (e.g., wrapping in su when the container is already non-root, causing su: Authentication failure).

The currentExecUser result is deterministic for a given pod lifecycle. Consider caching it per (namespace, podName), or computing it once during Run() and storing it in an annotation.

Observations

3. Hardcoded container name `"agent"` in `podRunsAsNonRoot`

File: pkg/runtime/k8s_runtime.go, line in commandForExec()

if podRunsAsNonRoot(pod, "agent") {

The container name "agent" is hardcoded in the call site. The PodExecOptions in both Exec() and Attach() also hardcode Container: "agent", so this is consistent — but if the container name ever becomes configurable, this will silently break. A named constant would be more robust.

4. `chown` refactoring eliminates unnecessary shell wrapping — good

The changes to Run() that replace sh -c "chown ..." with a direct []string{"chown", ...} argv are a strict improvement. They eliminate a shell-injection surface and are cleaner.

5. `buildExecCommandForUser` edge case: `targetUser == "root"`

if currentUser == "" || currentUser == targetUser || targetUser == "root" {
    return append([]string(nil), cmd...)
}

When targetUser == "root", the function returns the raw command. This is correct because K8s exec defaults to root, but it means if a pod annotation says scion.username=root, no su is used. This seems intentional but should be documented.

6. Test coverage is solid

The new tests cover:

Same-user passthrough (no su wrapping)
Different-user su fallback
Single-quote escaping in shellJoin
Annotation-based username lookup
podRunsAsNonRoot with pod-level and container-level security contexts, and the nil case

Positive Feedback

Clean decomposition: Breaking the exec-wrapping logic into execTargetUsername, buildExecCommandForUser, shellQuote, shellJoin, and podRunsAsNonRoot makes each concern independently testable. This is a significant improvement over the inline logic it replaces.
Defensive copy: append([]string(nil), cmd...) correctly avoids mutating the caller's slice.
The chown argv refactoring is a nice incidental security hardening — removing sh -c wrappers where they aren't needed.

Final Verdict

Needs minor rework. The core architecture is sound and well-tested. The two items to address before merge:

Must fix: Validate the scion.username annotation value against a POSIX username pattern before using it in exec commands.
Should address: The extra id -un exec per call is a latency and reliability concern. At minimum, document this trade-off; ideally, cache the result or compute it once at pod creation.

Once item (1) is addressed, this PR is safe to merge.

mfreeman451 · 2026-04-11T23:46:59Z

Addressed the annotation-safety feedback on this branch.

Changes:

validate scion.username against a constrained POSIX-style username pattern before using it in exec/attach command construction
fall back to scion when the annotation is empty or invalid instead of passing arbitrary annotation values through to su
replaced the test's duplicated username resolution logic with the real execTargetUsername(...) helper and added an invalid-annotation case
added a short comment documenting why the live id -un probe remains uncached for now

Validation:

go test ./pkg/runtime -run 'TestK8s(Exec|Attach|PodRunsAsNonRoot)' -count=1\n\nCurrent head: 77a0be04

mfreeman451

lgtm

ptone · 2026-04-12T14:40:26Z

PR-113 Review: Handle non-root Kubernetes exec user context

Executive Summary

This PR replaces a hardcoded su - scion wrapper for K8s exec/attach with a user-aware strategy that probes the running container's effective user and conditionally wraps commands. The change is low-to-medium risk — it introduces clean helper functions with good test coverage, but has one performance concern (an extra exec per command) and one subtle logic edge case worth discussing.

Critical Issues

1. Extra API round-trip on every Exec/Attach call

File: pkg/runtime/k8s_runtime.go, commandForExec (line 148-165)

Every call to Exec() or Attach() now performs:

A Pods().Get() API call to fetch the pod spec.
An execInPod(["id", "-un"]) to probe the live user.

The Exec() path is used for tmux send-keys and other high-frequency operations (e.g., scion look which calls Exec to capture terminal output). The additional latency of a full SPDY exec round-trip for id -un on every invocation could be noticeable.

Severity: Medium — not blocking, but worth considering caching the result for the lifetime of the pod (the exec user won't change while a container is running).

Suggested mitigation:

// Cache on KubernetesRuntime, keyed by namespace/podName.
// Invalidate on pod restart (detectable via pod UID or resourceVersion).
type execUserCacheEntry struct {
    user            string
    resourceVersion string
}

2. `podRunsAsNonRoot` checks pod-level `RunAsUser` but ignores container-level override

File: pkg/runtime/k8s_runtime.go, lines 110-138

The function checks pod.Spec.SecurityContext.RunAsUser first, and if set, returns immediately. However, a container-level SecurityContext.RunAsUser: 0 can override a pod-level RunAsUser: 1000, meaning the container actually runs as root. The current logic would incorrectly return true (non-root) in that scenario.

Severity: Low — this code path is only reached when id -un fails, which is an uncommon fallback. But it's worth noting as a correctness gap.

Suggested fix:

func podRunsAsNonRoot(pod *corev1.Pod, containerName string) bool {
    if pod == nil {
        return false
    }
    // Check container-level first (overrides pod-level)
    for _, container := range pod.Spec.Containers {
        if container.Name != containerName {
            continue
        }
        if container.SecurityContext != nil {
            if container.SecurityContext.RunAsUser != nil {
                return *container.SecurityContext.RunAsUser != 0
            }
            if container.SecurityContext.RunAsNonRoot != nil && *container.SecurityContext.RunAsNonRoot {
                return true
            }
        }
        // Fall through to pod-level
        break
    }
    // Pod-level fallback
    if pod.Spec.SecurityContext != nil {
        if pod.Spec.SecurityContext.RunAsUser != nil {
            return *pod.Spec.SecurityContext.RunAsUser != 0
        }
        if pod.Spec.SecurityContext.RunAsNonRoot != nil && *pod.Spec.SecurityContext.RunAsNonRoot {
            return true
        }
    }
    return false
}

Observations

3. `buildExecCommandForUser` skips `su` when `targetUser == "root"`

File: pkg/runtime/k8s_runtime.go, line 104

if currentUser == "" || currentUser == targetUser || targetUser == "root" {
    return append([]string(nil), cmd...)
}

The targetUser == "root" guard is sensible (you wouldn't su - root when you're already root), but it means if the annotation scion.username is set to "root", exec runs as whatever user the container is running as. This is fine since the annotation is validated against validExecUsername and root would match, but worth documenting the intent.

4. `chown` refactoring is a nice cleanup

File: pkg/runtime/k8s_runtime.go, lines 396, 411

Replacing sh -c "chown -R ..." with direct argv ["chown", "-R", ...] eliminates a shell injection surface. Even though config.UnixUsername is likely controlled, removing the shell layer is strictly better.

5. Hardcoded container name `"agent"` in `commandForExec`

Line 161: podRunsAsNonRoot(pod, "agent") — this is consistent with the rest of the codebase (the container is always named "agent"), but ideally it would be a constant. Not a blocker, just a minor style note for the existing codebase.

6. `shellQuote` does not handle NUL bytes or multi-byte edge cases

The quoting logic strings.ReplaceAll(arg, "'", "'\"'\"'") is the standard POSIX single-quote escaping idiom and is correct for all printable strings. Arguments with embedded NUL bytes would be truncated by the shell, but that's not a realistic concern for tmux commands.

Positive Feedback

Username validation regex (^[a-z_][a-z0-9_-]{0,31}$) is well-chosen — matches POSIX username rules and prevents injection via malicious annotation values. Good security hygiene.
Clean separation of concerns: execTargetUsername, buildExecCommandForUser, shellQuote, shellJoin, and podRunsAsNonRoot are all pure functions that are independently testable.
Graceful fallback: The currentExecUser error path falls back to podRunsAsNonRoot spec inspection, and ultimately defaults to the safe su wrapping. Defense in depth.
Test refactoring: Tests now exercise the extracted functions directly (buildExecCommandForUser, shellJoin, execTargetUsername) instead of duplicating inline logic. Much better test design.
New test cases for annotation validation (invalid username "--help", invalid "bad user") are good adversarial coverage.

Final Verdict

Approve with suggestions. The core logic is correct, well-tested, and fixes a real bug (non-root containers failing with su: Authentication failure). The two issues raised (extra API round-trip and podRunsAsNonRoot precedence) are non-blocking but should be tracked for follow-up.

ptone · 2026-04-12T14:44:21Z

@mfreeman451 Let me know your thoughts on the cache idea if you think it is worth it. My concern is that trip could be multi hop like:

Hub → Broker → GKE API server → kubelet on node

mfreeman451 · 2026-04-12T15:34:03Z

@mfreeman451 Let me know your thoughts on the cache idea if you think it is worth it. My concern is that trip could be multi hop like:

Hub → Broker → GKE API server → kubelet on node

I think caching makes sense here, will whip something up.

mfreeman451 marked this pull request as ready for review April 10, 2026 00:47

mfreeman451 commented Apr 10, 2026

View reviewed changes

mfreeman451 force-pushed the fix/k8s-nonroot-exec-user-context branch from cc8f6be to dc20c10 Compare April 11, 2026 05:15

ptone added the Needs rework label Apr 11, 2026

mfreeman451 added 2 commits April 11, 2026 17:49

Handle non-root Kubernetes exec user context

ef43158

Remove Kubernetes exec shell wrappers

f304bd3

mfreeman451 force-pushed the fix/k8s-nonroot-exec-user-context branch from dc20c10 to f304bd3 Compare April 11, 2026 22:51

Validate Kubernetes exec username annotation

77a0be0

mfreeman451 commented Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle non-root Kubernetes exec user context#113

Handle non-root Kubernetes exec user context#113
mfreeman451 wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-nonroot-exec-user-context

mfreeman451 commented Apr 10, 2026

Uh oh!

mfreeman451 left a comment

Uh oh!

ptone commented Apr 11, 2026

Uh oh!

mfreeman451 commented Apr 11, 2026

Uh oh!

mfreeman451 left a comment

Uh oh!

ptone commented Apr 12, 2026

Uh oh!

ptone commented Apr 12, 2026

Uh oh!

mfreeman451 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mfreeman451 commented Apr 10, 2026

Summary

Testing

Validation

Uh oh!

mfreeman451 left a comment

Choose a reason for hiding this comment

Uh oh!

ptone commented Apr 11, 2026

PR #113 Review: Handle non-root Kubernetes exec user context

Executive Summary

Critical Issues

1. Unsanitized annotation value flows into shell command (Medium-High Severity)

2. commandForExec performs an extra exec per call (Performance / Reliability)

Observations

3. Hardcoded container name "agent" in podRunsAsNonRoot

4. chown refactoring eliminates unnecessary shell wrapping — good

5. buildExecCommandForUser edge case: targetUser == "root"

6. Test coverage is solid

Positive Feedback

Final Verdict

Uh oh!

mfreeman451 commented Apr 11, 2026

Uh oh!

mfreeman451 left a comment

Choose a reason for hiding this comment

Uh oh!

ptone commented Apr 12, 2026

PR-113 Review: Handle non-root Kubernetes exec user context

Executive Summary

Critical Issues

1. Extra API round-trip on every Exec/Attach call

2. podRunsAsNonRoot checks pod-level RunAsUser but ignores container-level override

Observations

3. buildExecCommandForUser skips su when targetUser == "root"

4. chown refactoring is a nice cleanup

5. Hardcoded container name "agent" in commandForExec

6. shellQuote does not handle NUL bytes or multi-byte edge cases

Positive Feedback

Final Verdict

Uh oh!

ptone commented Apr 12, 2026

Uh oh!

mfreeman451 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. `commandForExec` performs an extra exec per call (Performance / Reliability)

3. Hardcoded container name `"agent"` in `podRunsAsNonRoot`

4. `chown` refactoring eliminates unnecessary shell wrapping — good

5. `buildExecCommandForUser` edge case: `targetUser == "root"`

2. `podRunsAsNonRoot` checks pod-level `RunAsUser` but ignores container-level override

3. `buildExecCommandForUser` skips `su` when `targetUser == "root"`

4. `chown` refactoring is a nice cleanup

5. Hardcoded container name `"agent"` in `commandForExec`

6. `shellQuote` does not handle NUL bytes or multi-byte edge cases