Skip to content

Handle non-root Kubernetes exec user context#113

Open
mfreeman451 wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-nonroot-exec-user-context
Open

Handle non-root Kubernetes exec user context#113
mfreeman451 wants to merge 3 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-nonroot-exec-user-context

Conversation

@mfreeman451
Copy link
Copy Markdown
Contributor

Summary

  • wrap Kubernetes exec and attach commands in a user-aware shell instead of always forcing su - scion
  • reuse the pod's scion.username annotation when choosing the target user for execs
  • add parity coverage for the wrapper and username lookup

Testing

  • go test ./pkg/runtime

Validation

  • deployed the stacked server image in CarverAuto K8s
  • started a hosted agent in namespace scion on donbot
  • confirmed scion look returns terminal output instead of failing with su: Authentication failure

@mfreeman451 mfreeman451 marked this pull request as ready for review April 10, 2026 00:47
Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mfreeman451 mfreeman451 force-pushed the fix/k8s-nonroot-exec-user-context branch from cc8f6be to dc20c10 Compare April 11, 2026 05:15
@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 11, 2026

PR #113 Review: Handle non-root Kubernetes exec user context

Executive Summary

This PR refactors Kubernetes exec/attach to dynamically detect the running user instead of unconditionally wrapping every command in su - scion. The change is medium risk — the core logic is sound and well-tested, but there is one notable issue around annotation-sourced usernames being passed unsanitized into shell commands.

Critical Issues

1. Unsanitized annotation value flows into shell command (Medium-High Severity)

File: pkg/runtime/k8s_runtime.go

execTargetUsername() reads the scion.username annotation directly from the pod and passes it through buildExecCommandForUser() into a su invocation. If an attacker can control pod annotations (e.g., via a compromised hub or RBAC misconfiguration), the username value is embedded in a su - command without validation.

While su itself treats the argument as a username (not a shell command), a username like --help could cause unexpected behavior, and the value also flows into shellJoin where it becomes part of the -c argument.

// Current code — no validation
func execTargetUsername(pod *corev1.Pod) string {
    if pod != nil {
        if u := strings.TrimSpace(pod.Annotations["scion.username"]); u != "" {
            return u
        }
    }
    return "scion"
}

Suggested Fix:

import "regexp"

var validUsername = regexp.MustCompile(`^[a-z_][a-z0-9_-]{0,31}$`)

func execTargetUsername(pod *corev1.Pod) string {
    if pod != nil {
        if u := strings.TrimSpace(pod.Annotations["scion.username"]); u != "" && validUsername.MatchString(u) {
            return u
        }
    }
    return "scion"
}

2. commandForExec performs an extra exec per call (Performance / Reliability)

File: pkg/runtime/k8s_runtime.go, commandForExec()

Every Exec() and Attach() call now triggers two API round-trips: one id -un exec (via currentExecUser) and one for the actual command. This doubles the latency for every exec operation and introduces a new failure mode — if the id -un probe fails transiently, the fallback path is used, which may produce incorrect behavior (e.g., wrapping in su when the container is already non-root, causing su: Authentication failure).

The currentExecUser result is deterministic for a given pod lifecycle. Consider caching it per (namespace, podName), or computing it once during Run() and storing it in an annotation.

Observations

3. Hardcoded container name "agent" in podRunsAsNonRoot

File: pkg/runtime/k8s_runtime.go, line in commandForExec()

if podRunsAsNonRoot(pod, "agent") {

The container name "agent" is hardcoded in the call site. The PodExecOptions in both Exec() and Attach() also hardcode Container: "agent", so this is consistent — but if the container name ever becomes configurable, this will silently break. A named constant would be more robust.

4. chown refactoring eliminates unnecessary shell wrapping — good

The changes to Run() that replace sh -c "chown ..." with a direct []string{"chown", ...} argv are a strict improvement. They eliminate a shell-injection surface and are cleaner.

5. buildExecCommandForUser edge case: targetUser == "root"

if currentUser == "" || currentUser == targetUser || targetUser == "root" {
    return append([]string(nil), cmd...)
}

When targetUser == "root", the function returns the raw command. This is correct because K8s exec defaults to root, but it means if a pod annotation says scion.username=root, no su is used. This seems intentional but should be documented.

6. Test coverage is solid

The new tests cover:

  • Same-user passthrough (no su wrapping)
  • Different-user su fallback
  • Single-quote escaping in shellJoin
  • Annotation-based username lookup
  • podRunsAsNonRoot with pod-level and container-level security contexts, and the nil case

Positive Feedback

  • Clean decomposition: Breaking the exec-wrapping logic into execTargetUsername, buildExecCommandForUser, shellQuote, shellJoin, and podRunsAsNonRoot makes each concern independently testable. This is a significant improvement over the inline logic it replaces.
  • Defensive copy: append([]string(nil), cmd...) correctly avoids mutating the caller's slice.
  • The chown argv refactoring is a nice incidental security hardening — removing sh -c wrappers where they aren't needed.

Final Verdict

Needs minor rework. The core architecture is sound and well-tested. The two items to address before merge:

  1. Must fix: Validate the scion.username annotation value against a POSIX username pattern before using it in exec commands.
  2. Should address: The extra id -un exec per call is a latency and reliability concern. At minimum, document this trade-off; ideally, cache the result or compute it once at pod creation.

Once item (1) is addressed, this PR is safe to merge.

@mfreeman451 mfreeman451 force-pushed the fix/k8s-nonroot-exec-user-context branch from dc20c10 to f304bd3 Compare April 11, 2026 22:51
@mfreeman451
Copy link
Copy Markdown
Contributor Author

Addressed the annotation-safety feedback on this branch.

Changes:

  • validate scion.username against a constrained POSIX-style username pattern before using it in exec/attach command construction
  • fall back to scion when the annotation is empty or invalid instead of passing arbitrary annotation values through to su
  • replaced the test's duplicated username resolution logic with the real execTargetUsername(...) helper and added an invalid-annotation case
  • added a short comment documenting why the live id -un probe remains uncached for now

Validation:

  • go test ./pkg/runtime -run 'TestK8s(Exec|Attach|PodRunsAsNonRoot)' -count=1\n\nCurrent head: 77a0be04

Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 12, 2026

PR-113 Review: Handle non-root Kubernetes exec user context

Executive Summary

This PR replaces a hardcoded su - scion wrapper for K8s exec/attach with a user-aware strategy that probes the running container's effective user and conditionally wraps commands. The change is low-to-medium risk — it introduces clean helper functions with good test coverage, but has one performance concern (an extra exec per command) and one subtle logic edge case worth discussing.


Critical Issues

1. Extra API round-trip on every Exec/Attach call

File: pkg/runtime/k8s_runtime.go, commandForExec (line 148-165)

Every call to Exec() or Attach() now performs:

  1. A Pods().Get() API call to fetch the pod spec.
  2. An execInPod(["id", "-un"]) to probe the live user.

The Exec() path is used for tmux send-keys and other high-frequency operations (e.g., scion look which calls Exec to capture terminal output). The additional latency of a full SPDY exec round-trip for id -un on every invocation could be noticeable.

Severity: Medium — not blocking, but worth considering caching the result for the lifetime of the pod (the exec user won't change while a container is running).

Suggested mitigation:

// Cache on KubernetesRuntime, keyed by namespace/podName.
// Invalidate on pod restart (detectable via pod UID or resourceVersion).
type execUserCacheEntry struct {
    user            string
    resourceVersion string
}

2. podRunsAsNonRoot checks pod-level RunAsUser but ignores container-level override

File: pkg/runtime/k8s_runtime.go, lines 110-138

The function checks pod.Spec.SecurityContext.RunAsUser first, and if set, returns immediately. However, a container-level SecurityContext.RunAsUser: 0 can override a pod-level RunAsUser: 1000, meaning the container actually runs as root. The current logic would incorrectly return true (non-root) in that scenario.

Severity: Low — this code path is only reached when id -un fails, which is an uncommon fallback. But it's worth noting as a correctness gap.

Suggested fix:

func podRunsAsNonRoot(pod *corev1.Pod, containerName string) bool {
    if pod == nil {
        return false
    }
    // Check container-level first (overrides pod-level)
    for _, container := range pod.Spec.Containers {
        if container.Name != containerName {
            continue
        }
        if container.SecurityContext != nil {
            if container.SecurityContext.RunAsUser != nil {
                return *container.SecurityContext.RunAsUser != 0
            }
            if container.SecurityContext.RunAsNonRoot != nil && *container.SecurityContext.RunAsNonRoot {
                return true
            }
        }
        // Fall through to pod-level
        break
    }
    // Pod-level fallback
    if pod.Spec.SecurityContext != nil {
        if pod.Spec.SecurityContext.RunAsUser != nil {
            return *pod.Spec.SecurityContext.RunAsUser != 0
        }
        if pod.Spec.SecurityContext.RunAsNonRoot != nil && *pod.Spec.SecurityContext.RunAsNonRoot {
            return true
        }
    }
    return false
}

Observations

3. buildExecCommandForUser skips su when targetUser == "root"

File: pkg/runtime/k8s_runtime.go, line 104

if currentUser == "" || currentUser == targetUser || targetUser == "root" {
    return append([]string(nil), cmd...)
}

The targetUser == "root" guard is sensible (you wouldn't su - root when you're already root), but it means if the annotation scion.username is set to "root", exec runs as whatever user the container is running as. This is fine since the annotation is validated against validExecUsername and root would match, but worth documenting the intent.

4. chown refactoring is a nice cleanup

File: pkg/runtime/k8s_runtime.go, lines 396, 411

Replacing sh -c "chown -R ..." with direct argv ["chown", "-R", ...] eliminates a shell injection surface. Even though config.UnixUsername is likely controlled, removing the shell layer is strictly better.

5. Hardcoded container name "agent" in commandForExec

Line 161: podRunsAsNonRoot(pod, "agent") — this is consistent with the rest of the codebase (the container is always named "agent"), but ideally it would be a constant. Not a blocker, just a minor style note for the existing codebase.

6. shellQuote does not handle NUL bytes or multi-byte edge cases

The quoting logic strings.ReplaceAll(arg, "'", "'\"'\"'") is the standard POSIX single-quote escaping idiom and is correct for all printable strings. Arguments with embedded NUL bytes would be truncated by the shell, but that's not a realistic concern for tmux commands.


Positive Feedback

  • Username validation regex (^[a-z_][a-z0-9_-]{0,31}$) is well-chosen — matches POSIX username rules and prevents injection via malicious annotation values. Good security hygiene.
  • Clean separation of concerns: execTargetUsername, buildExecCommandForUser, shellQuote, shellJoin, and podRunsAsNonRoot are all pure functions that are independently testable.
  • Graceful fallback: The currentExecUser error path falls back to podRunsAsNonRoot spec inspection, and ultimately defaults to the safe su wrapping. Defense in depth.
  • Test refactoring: Tests now exercise the extracted functions directly (buildExecCommandForUser, shellJoin, execTargetUsername) instead of duplicating inline logic. Much better test design.
  • New test cases for annotation validation (invalid username "--help", invalid "bad user") are good adversarial coverage.

Final Verdict

Approve with suggestions. The core logic is correct, well-tested, and fixes a real bug (non-root containers failing with su: Authentication failure). The two issues raised (extra API round-trip and podRunsAsNonRoot precedence) are non-blocking but should be tracked for follow-up.

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 12, 2026

@mfreeman451 Let me know your thoughts on the cache idea if you think it is worth it. My concern is that trip could be multi hop like:

Hub → Broker → GKE API server → kubelet on node

@mfreeman451
Copy link
Copy Markdown
Contributor Author

@mfreeman451 Let me know your thoughts on the cache idea if you think it is worth it. My concern is that trip could be multi hop like:

Hub → Broker → GKE API server → kubelet on node

I think caching makes sense here, will whip something up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants