Skip to content

fix: proxy local storage uploads through hub HTTP endpoint#102

Closed
meatballs wants to merge 5 commits intoGoogleCloudPlatform:mainfrom
Empiria:fix/template-sync-local-storage-upload
Closed

fix: proxy local storage uploads through hub HTTP endpoint#102
meatballs wants to merge 5 commits intoGoogleCloudPlatform:mainfrom
Empiria:fix/template-sync-local-storage-upload

Conversation

@meatballs
Copy link
Copy Markdown
Contributor

@meatballs meatballs commented Apr 9, 2026

Summary

Fixes #100scion template sync fails with mkdir /home/scion: permission denied when the hub uses local filesystem storage.

Root cause: Local storage generates file:// signed URLs containing server-side absolute paths (e.g. file:///home/scion/.scion/storage/local/templates/...). The client receives these URLs and attempts to write/read at those paths on the local machine, which fails because the hub's filesystem isn't accessible remotely.

Fix: When the hub's storage backend is local, rewrite file:// signed URLs to HTTP proxy URLs pointing to the hub's own template files endpoint. Both uploads (PUT) and downloads (GET) are proxied. The client's existing HTTP code handles these transparently.

Commit 1: Upload proxy

  • storage_helpers.go: Add requestBaseURL() (derives external URL from request headers, supports reverse proxies) and rewriteLocalUploadURLs() (converts file:// to HTTP PUT URLs with auth headers)
  • template_file_handlers.go: Add raw binary PUT support via handleTemplateFileWriteRaw — detects non-JSON Content-Type and writes body directly to storage
  • template_handlers.go: Wire up URL rewriting in template create and upload handlers

Commit 2: Download proxy + authenticated transfer clients

  • storage_helpers.go: Add rewriteLocalDownloadURLs() — converts file:// to HTTP GET URLs with ?raw=1 param
  • template_file_handlers.go: Add raw binary streaming to handleTemplateFileRead — when ?raw=1 or Accept: application/octet-stream, streams file content directly instead of JSON wrapper (bypasses 1MB limit)
  • template_handlers.go: Wire up download URL rewriting
  • apiclient/transport.go: Add AuthenticatedHTTPClient() — returns an *http.Client with an auth-injecting RoundTripper, so proxy URLs carry proper auth
  • hubclient/{templates,workspace,harness_configs}.go: Use AuthenticatedHTTPClient() for transfer clients

No client-side changes required

The client already handles HTTP upload/download URLs (used with GCS storage). The fix is entirely server-side — the hub returns different URLs, and existing client code works unchanged.

Test plan

  • go build ./... — compiles cleanly
  • go vet — no warnings
  • Tested with self-hosted hub (Podman, ARM64) + remote client — scion template sync uploads all files successfully
  • Template download by broker works — template hydration succeeds with correct content hashes
  • Verify GCS storage path is unaffected (no file:// URLs generated, rewrite is a no-op)

Local storage generates file:// signed URLs that reference server-side
filesystem paths. When the client and hub are on different machines, the
client cannot write to these paths, causing "mkdir /home/scion:
permission denied" errors during template sync.

For local storage, rewrite file:// upload URLs to HTTP proxy URLs
pointing to the hub's own template files endpoint
(PUT /api/v1/templates/<id>/files/<path>). The client's authenticated
HTTP transport handles auth automatically.

Changes:
- Add requestBaseURL() to derive external URL from request headers
- Add rewriteLocalUploadURLs() to convert file:// URLs to HTTP proxy URLs
- Add raw binary PUT support to handleTemplateFileWrite (Content-Type
  detection: non-JSON bodies are written directly to storage)
- Wire up URL rewriting in template create and upload handlers

Fixes GoogleCloudPlatform#100
Extends the upload proxy from the previous commit to cover downloads
and fix authentication for both directions.

Downloads: When local storage generates file:// download URLs, rewrite
them to HTTP proxy URLs (GET /api/v1/templates/<id>/files/<path>?raw=1).
The raw query parameter triggers streaming binary content instead of
the default JSON-wrapped response with a 1MB size limit.

Auth: The transfer client (used for signed URL uploads/downloads) was
created with a plain *http.Client that didn't carry authentication.
This worked for GCS signed URLs (auth is in the URL) and file:// URLs
(no HTTP), but proxy URLs to the hub need auth. Add an
AuthenticatedHTTPClient() method to apiclient.Transport that returns
an *http.Client with an auth-injecting RoundTripper. All three service
transfer clients (templates, workspaces, harness configs) now use it.
Copy link
Copy Markdown
Contributor

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments inline

for i := range urls {
if strings.HasPrefix(urls[i].URL, "file://") {
urls[i].URL = hubEndpoint + "/api/v1/" + resourceType + "/" + resourceID + "/files/" + urls[i].Path
urls[i].Method = "PUT"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use http.MethodPut here instead of plaintext "PUT"

return urls
}
hubEndpoint = strings.TrimRight(hubEndpoint, "/")
for i := range urls {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a better approach:

type UploadURL struct {
    URL    string
    Method string
    Path   string
    // ...
}

for i := range urls {
    if strings.HasPrefix(urls[i].URL, "file://") {
        urls[i].URL = fmt.Sprintf("%s/api/v1/%s/%s/files/%s",
            strings.TrimRight(hubEndpoint, "/"),
            resourceType,
            resourceID,
            urls[i].Path,
        )
        urls[i].Method = http.MethodPut
    }
}

// Raw binary download for local storage proxy flow
if r.URL.Query().Get("raw") != "" || strings.Contains(r.Header.Get("Accept"), "application/octet-stream") {
objectPath := template.StoragePath + "/" + filePath
reader, _, err := stor.Download(ctx, objectPath)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer errors.Is here, if errors.Is(err, storage.ErrNotFound) {


w.Header().Set("Content-Type", "application/octet-stream")
w.WriteHeader(http.StatusOK)
io.Copy(w, reader)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should log error here,

        // Log the error but don't send another response (headers already sent)
        log.Printf("Error streaming file %s: %v", objectPath, err)
    }

}
defer reader.Close()

w.Header().Set("Content-Type", "application/octet-stream")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add
w.Header().Set("Content-Disposition", attachment; filename="+filePath+"`)

below the Content-type line


w.Header().Set("Content-Type", "application/octet-stream")
w.WriteHeader(http.StatusOK)
io.Copy(w, reader)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using http.ServeContent(w, r, filePath, meta.ModTime, seeker) instead here

- Use http.MethodPut instead of string literal "PUT"
- Use fmt.Sprintf for URL construction
- Use errors.Is for error comparison
- Add Content-Disposition header on raw file downloads
- Log io.Copy errors during file streaming
- Add errors and slog imports
// back to the request's TLS state and Host header.
func requestBaseURL(r *http.Request) string {
scheme := "http"
if proto := r.Header.Get("X-Forwarded-Proto"); proto != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using guard clauses here

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 11, 2026

PR-102 Review: fix: proxy local storage uploads through hub HTTP endpoint

Reviewer: Scion Code Review Agent
Date: 2026-04-11
PR: #102 — Fix scion template sync with local filesystem storage


Executive Summary

This PR solves a real architectural gap where local-storage file:// signed URLs are unusable by remote clients, by proxying uploads/downloads through the hub's HTTP API. The change is moderate risk: the approach is sound and well-structured, but there are two security issues (one critical, one medium) and a Go API contract violation that should be addressed before merge.


Critical Issues

1. Header Injection via Content-Disposition (Security — High)

File: pkg/hub/template_file_handlers.go (new raw download path)

w.Header().Set("Content-Disposition", `attachment; filename="`+filePath+`"`)

filePath is derived from the URL path (strings.TrimPrefix(action, "files/") in template_handlers.go:319) and is only validated against the template manifest (existence check). A file path containing " or CRLF characters could break the header value or inject additional headers.

While the manifest check limits exploitability (the path must match an existing file), the manifest itself is user-created — a malicious user could register a file with a crafted name like foo"\r\nX-Injected: evil.

Suggested Fix:

import "mime"

// Sanitize filename for Content-Disposition
safeName := filepath.Base(filePath)
w.Header().Set("Content-Disposition", mime.FormatMediaType("attachment", map[string]string{"filename": safeName}))

2. authRoundTripper.RoundTrip Mutates the Request (Correctness — High)

File: pkg/apiclient/transport.go

func (rt *authRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
	if rt.ua != "" && req.Header.Get("User-Agent") == "" {
		req.Header.Set("User-Agent", rt.ua)       // ← mutates original request
	}
	if rt.auth != nil {
		if err := rt.auth.ApplyAuth(req); err != nil {  // ← mutates original request

The http.RoundTripper contract states: "RoundTrip should not modify the request, except for consuming and closing the Body." Mutating headers on the original request is a violation that can cause data races when requests are retried or used concurrently.

Suggested Fix:

func (rt *authRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
	// Clone the request to avoid mutating the original per RoundTripper contract.
	clone := req.Clone(req.Context())
	if rt.ua != "" && clone.Header.Get("User-Agent") == "" {
		clone.Header.Set("User-Agent", rt.ua)
	}
	if rt.auth != nil {
		if err := rt.auth.ApplyAuth(clone); err != nil {
			return nil, fmt.Errorf("failed to apply auth: %w", err)
		}
	}
	base := rt.base
	if base == nil {
		base = http.DefaultTransport
	}
	return base.RoundTrip(clone)
}

3. Authorization Header Leaked to External Hosts (Security — Medium)

File: pkg/hub/storage_helpers.gorewriteLocalUploadURLs

if authHeader != "" {
    urls[i].Headers["Authorization"] = authHeader
}

The Authorization header from the incoming request is embedded directly into the UploadURLInfo response sent to the client. For local storage this works because the URLs point back to the hub, but:

  1. The auth header is now serialized into a JSON response and sent over the wire — expanding its exposure surface.
  2. If any code path accidentally calls rewriteLocalUploadURLs without the ProviderLocal guard, or if the URLs are logged/cached, the credential leaks.

This is partially mitigated by the ProviderLocal check at the call sites. However, consider whether this is even necessary given that AuthenticatedHTTPClient() is now used on the client side (which already injects auth). The auth header in the URL response may be redundant and could be removed entirely.

Suggested investigation: Verify whether the transfer client on the client side uses the Headers from UploadURLInfo when making PUT requests. If AuthenticatedHTTPClient() already injects auth, remove the Authorization header from the rewritten URLs to minimize credential exposure.


Observations

4. requestBaseURL Trusts Proxy Headers Without Validation (Low)

File: pkg/hub/storage_helpers.go

func requestBaseURL(r *http.Request) string {
	host := r.Header.Get("X-Forwarded-Host")
	if host == "" {
		host = r.Host
	}
	if proto := r.Header.Get("X-Forwarded-Proto"); proto != "" {
		return proto + "://" + host
	}

X-Forwarded-Host and X-Forwarded-Proto are trivially spoofable by any client. A malicious client could set X-Forwarded-Host: evil.com and receive rewritten URLs pointing to http://evil.com/api/v1/templates/..., potentially redirecting subsequent uploads to an attacker-controlled server.

This is a known pattern in Go HTTP servers and is acceptable only if the hub is deployed behind a trusted reverse proxy that strips/overwrites these headers. Consider documenting this assumption or adding an allowlist.

5. Missing Content-Length on Raw Download

File: pkg/hub/template_file_handlers.go

The raw download path streams via io.Copy without setting Content-Length. The found.Size is available from the manifest. Setting it would enable clients to show download progress and detect truncation:

w.Header().Set("Content-Length", strconv.FormatInt(found.Size, 10))

6. handleTemplateFileWriteRaw Reads Entire File Into Memory

File: pkg/hub/template_file_handlers.go

data, err := io.ReadAll(io.LimitReader(r.Body, maxUploadFileSize+1))

For files up to 50MB, this allocates the full content in memory. The SHA256 hash computation also requires the full content, so streaming isn't straightforward here. This is acceptable for the current 50MB limit but worth noting if the limit increases.

7. No r.Body.Close() in Raw Upload Path

The raw upload path reads r.Body via io.ReadAll but doesn't explicitly close it. While Go's HTTP server closes the body after the handler returns, explicitly closing after ReadAll would release resources sooner for large uploads.

8. AuthenticatedHTTPClient Doesn't Copy All http.Client Fields

File: pkg/apiclient/transport.go

return &http.Client{
    Timeout:   base.Timeout,
    Transport: &authRoundTripper{...},
}

CheckRedirect and Jar from the base client are not copied. This is likely fine for the current use case (simple PUT/GET to hub URLs), but worth noting if redirect behavior matters in the future.


Positive Feedback

  • Clean architectural approach: Rewriting URLs server-side so the client needs zero changes is elegant and minimizes blast radius.
  • Consistent application: The ProviderLocal check + URL rewrite is applied uniformly across create, upload, and download paths.
  • Good use of existing patterns: The raw binary handlers mirror the existing JSON handlers' structure (manifest update, content hash recomputation, storage upload).
  • Proper LimitReader usage: The maxUploadFileSize+1 pattern to detect oversized uploads is correct and idiomatic.
  • Well-structured PR description: Clear root cause, commit-by-commit breakdown, and test plan.

Final Verdict

Needs Rework — two issues should be fixed before merge:

Priority Issue Effort
Must fix #2: RoundTrip must clone the request before mutating headers Small (~5 lines)
Must fix #1: Sanitize filePath in Content-Disposition header Small (~3 lines)
Should fix #3: Investigate whether auth header in URL response is redundant Investigation
Consider #4: Document proxy header trust assumption Documentation
Consider #5: Set Content-Length on raw downloads Small (~1 line)

@mfreeman451
Copy link
Copy Markdown
Contributor

Addressed the must-fix review items locally, but I cannot push directly to the PR head branch because Empiria/scion denied write access for mfreeman451 even though maintainer edits are enabled.\n\nPrepared fix commit: dd1a246a\nFallback branch on my fork: mfreeman451:review/pr102-local-storage-upload\n\nChanges in that branch:\n- clone the request before mutating headers in authRoundTripper.RoundTrip\n- remove serialized Authorization headers from rewritten local upload/download URL responses\n- document the trusted-proxy assumption in requestBaseURL\n- sanitize Content-Disposition with mime.FormatMediaType and filepath.Base\n- set Content-Length on raw downloads\n- close r.Body in the raw upload path\n\nValidation:\n- go test ./pkg/apiclient -count=1\n- go test ./pkg/hub -run ^ -count=1

  • go test ./pkg/hubclient -count=1

@meatballs
Copy link
Copy Markdown
Contributor Author

Thanks @mfreeman451 — I've cherry-picked your review-fix commit (dd1a246a) onto the PR head as 5f65744f. Apologies for the maintainer-edits friction; I've enabled 'Allow edits by maintainers' going forward. All five review items are now addressed on the branch:

  • Request cloned before header mutation in authRoundTripper.RoundTrip
  • Authorization header dropped from rewritten local upload/download URL responses
  • Trusted-proxy assumption documented in requestBaseURL
  • Content-Disposition sanitized via mime.FormatMediaType + filepath.Base
  • Content-Length set on raw downloads; r.Body closed in raw upload path

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 12, 2026

merged commits directly after fixing lint

@ptone ptone closed this Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Template sync fails with 'mkdir /home/scion: permission denied' on hub storage

3 participants