Skip to content

Detect Kubernetes runtime namespace in-cluster#112

Open
mfreeman451 wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-incluster-default-namespace
Open

Detect Kubernetes runtime namespace in-cluster#112
mfreeman451 wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/k8s-incluster-default-namespace

Conversation

@mfreeman451
Copy link
Copy Markdown
Contributor

Summary

  • detect the Kubernetes runtime namespace from in-cluster context instead of hard-defaulting to default
  • respect explicit namespace env overrides before falling back to the service account namespace file
  • add focused runtime coverage for namespace detection and runtime initialization

Problem

Hosted broker operations like look, logs, and exec-based attach rely on the runtime's default namespace when resolving pod access. In combined in-cluster deployments, hard-defaulting to default breaks those operations whenever Scion is actually deployed into another namespace like scion.

Validation

  • go test ./pkg/runtime -run 'Test(DefaultKubernetesNamespace|NewKubernetesRuntime_UsesDetectedNamespace|KubernetesRuntime_List|KubernetesRuntime_BuildPod_Env)$'

Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mfreeman451 mfreeman451 marked this pull request as ready for review April 9, 2026 23:57
@mfreeman451 mfreeman451 force-pushed the fix/k8s-incluster-default-namespace branch from aff781d to 79de838 Compare April 11, 2026 05:15
@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 11, 2026

PR #112 Review: Detect Kubernetes runtime namespace in-cluster

Executive Summary

This PR is a medium-risk, multi-concern changeset that bundles the advertised namespace-detection fix with several unrelated cleanups (removing updateAgentAfterDispatch retry logic, simplifying OAuth provider defaults, dropping HubID, removing RunAsUser/RunAsGroup from pod security context, changing podman rootless exec user, and dropping InlineConfig merging). The core namespace detection is well-implemented and low-risk; the bundled removals carry moderate risk because they silently change concurrency semantics and security posture without separate validation.


Critical Issues

1. resolveHubEndpoint: --base-url flag silently dropped (Bug)

File: cmd/server_foreground.go

The old code had a priority path: cfg.Hub.Endpoint > webBaseURL (the --base-url CLI flag) > SCION_SERVER_BASE_URL env > computed localhost.

The new code removes the webBaseURL check entirely. The variable webBaseURL is still referenced elsewhere in the file (line ~897) so it's a valid flag, but resolveHubEndpoint no longer respects it. Users passing --base-url to override the hub endpoint will silently get the wrong value.

Suggested Fix:

if hubEndpoint == "" && enableHub {
    if webBaseURL != "" {
        hubEndpoint = strings.TrimRight(webBaseURL, "/")
    } else if baseURL := os.Getenv("SCION_SERVER_BASE_URL"); baseURL != "" {
        hubEndpoint = strings.TrimRight(baseURL, "/")
    } else {
        // ... localhost fallback
    }
}

2. updateAgentAfterDispatch removal drops version-conflict retry (Regression Risk)

File: pkg/hub/handlers.go

The removed updateAgentAfterDispatch performed a single retry on ErrVersionConflict, re-fetching the agent and merging dispatched fields. This was replaced with bare s.store.UpdateAgent(ctx, agent) calls at 5 call sites. If a concurrent status update bumps StateVersion during dispatch (a realistic race in multi-broker setups), the update now fails silently (the error is only logged as a warning, not returned to the client).

This is a semantic change, not just a cleanup. If the version-conflict handling was unnecessary, the PR should explain why. The previous tests (TestCreateAgent_RetriesVersionConflictAfterDispatch, etc.) were also removed rather than adapted.

3. InlineConfig merge removed — hub-dispatched agent volumes silently dropped (Regression Risk)

File: pkg/agent/run.go

The old code merged opts.InlineConfig (carrying volumes, env, image from templates/harness) with --harness-auth. The new code ignores opts.InlineConfig entirely, only creating a ScionConfig if HarnessAuth is set. The comment in the removed code explicitly warned:

"Without this merge, custom volumes from hub-dispatched agents are silently dropped."

The InlineConfig field appears fully removed from the codebase, suggesting it was removed from StartOptions as well, but this warrants verification that no caller was relying on it.


Observations

4. serviceAccountNamespacePath package-level var is test-safe but not production-safe

File: pkg/runtime/k8s_runtime.go

The package-level var serviceAccountNamespacePath is mutated in tests via save/restore. This works for sequential subtests but would race if tests ever run in parallel (t.Parallel()). Consider using a function parameter or a sync.Once pattern for production code. This is minor since the tests don't use t.Parallel() today.

5. Removal of RunAsUser/RunAsGroup from pod security context

File: pkg/runtime/k8s_runtime.go

Removing RunAsUser: 1000 and RunAsGroup: 1000 means the container will run as whatever user the image's USER directive specifies (or root if none). Combined with RunAsNonRoot: true, this will fail at admission if the image doesn't set a non-root USER. The comment says FSGroup alignment is sufficient, but this weakens the defense-in-depth — the previous explicit UID pinning prevented drift if the image's Dockerfile changed. Ensure the container image has a non-root USER baked in.

6. Podman rootless exec user change: scion -> root

File: pkg/runtime/podman.go

The exec user for rootless podman is changed from "scion" to "root". The comment explains the reasoning (host UID maps to container UID 0, no privilege drop). This is a correct fix for the tmux socket path mismatch described, but it's a behavioral change that should be validated with rootless podman end-to-end tests.

7. HOME/USER/LOGNAME env vars no longer injected into pods

File: pkg/runtime/k8s_runtime.go

The pod no longer gets HOME, USER, or LOGNAME environment variables. If the container image's entrypoint or sciontool relies on these being set (rather than reading /etc/passwd), this could cause failures. Ensure the image's init process sets these from the container user.

8. OAuth default provider simplification is correct

File: pkg/hub/handlers_auth.go, pkg/hub/oauth.go

Replacing s.oauthService.DefaultProviderForClient(...) with hardcoded "google" is functionally equivalent — the removed method itself fell back to "google" when no providers were configured, and the standard provider order put Google first. The redirect-URI heuristic for GitHub is preserved. This is a safe simplification.


Positive Feedback

  • Namespace detection logic (defaultKubernetesNamespace) is clean, well-ordered (explicit env > service account file > default), and properly trims whitespace from the SA file.
  • Test coverage for namespace detection is thorough — covers env override, file-based detection with trailing newline, and fallback. The test properly uses t.TempDir() and t.Setenv().
  • Dead code cleanup of int64Ptr, DefaultProviderForClient, and HubID is complete — no dangling references remain.
  • The base64 decode removal from file secrets aligns with the prior commit (5b57c64) that moved decoding to the Kubernetes runtime layer.

Final Verdict

Needs Rework. The namespace detection (the stated purpose of this PR) is solid and ready to merge. However, the PR bundles at least three unrelated behavioral changes (issues #1, #2, #3) that carry regression risk and should either:

  1. Be split into separate PRs with their own validation, or
  2. Have the PR description updated to explain why these removals are safe, with evidence that the dropped webBaseURL handling, version-conflict retry, and InlineConfig merge are no longer needed.

Issue #1 (webBaseURL dropped) appears to be a bug and should be fixed before merging.

@mfreeman451 mfreeman451 force-pushed the fix/k8s-incluster-default-namespace branch from 79de838 to 53975cd Compare April 11, 2026 22:51
@mfreeman451
Copy link
Copy Markdown
Contributor Author

This branch has been rebuilt on top of current main, and the earlier bundled concerns no longer apply to the current diff.

Current branch scope is now only:

  • pkg/runtime/k8s_runtime.go
  • pkg/runtime/k8s_runtime_test.go

So the previous review items about cmd/server_foreground.go, updateAgentAfterDispatch, InlineConfig, Podman exec user changes, and the pod security/env cleanup are no longer part of this PR. The surviving branch content is just the in-cluster namespace detection logic plus its focused tests.

Validation on the narrowed branch:

  • go test ./pkg/runtime -run 'TestDefaultKubernetesNamespace' -count=1\n\nCurrent head: 53975cdd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants