Skip to content

Allow GET for runtime broker agent logs#128

Merged
ptone merged 2 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/runtimebroker-agent-logs-get
Apr 12, 2026
Merged

Allow GET for runtime broker agent logs#128
ptone merged 2 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/runtimebroker-agent-logs-get

Conversation

@mfreeman451
Copy link
Copy Markdown
Contributor

Summary

  • allow GET for read-only runtime broker agent actions
  • keep mutating agent actions restricted to POST
  • add coverage for GET agent logs requests

Testing

  • go test ./pkg/runtimebroker -run 'Test(MethodNotAllowed|AgentLogsAllowsGet)$' -count=1

Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I think I'm going to have merge conflicts from #120 but will deal with that once it gets merged

@mfreeman451 mfreeman451 marked this pull request as ready for review April 10, 2026 18:27
@mfreeman451 mfreeman451 force-pushed the fix/runtimebroker-agent-logs-get branch 2 times, most recently from 430e1d2 to a84af11 Compare April 11, 2026 06:23
Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mfreeman451
Copy link
Copy Markdown
Contributor Author

This branch was restacked and narrowed.

Changes:

  • removed the temporary local/shared-action duplication approach
  • restacked the branch on top of #120, which now owns the shared agent-action vocabulary
  • this branch now contains only the runtime-broker GET-for-logs behavior and its focused test

Current head: a84af119

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 11, 2026

PR #128 Review: Allow GET for runtime broker agent logs

Executive Summary

This PR introduces an exec action for remote command execution on agents, refactors magic action strings into typed constants, and corrects HTTP method enforcement on the runtime broker to allow GET for read-only actions (logs, stats, has-prompt). The change is medium risk — the string-to-constant refactor and GET method fix are clean, but the new exec endpoint introduces a remote code execution surface that warrants careful scrutiny.


Critical Issues

1. Hardcoded exitCode: 0 in Hub Handler — Misleading Response

File: pkg/hub/handlers.go (new handleAgentExec function)

writeJSON(w, http.StatusOK, map[string]interface{}{
    "output":   output,
    "exitCode": 0,
})

The hub handler always returns exitCode: 0 regardless of the actual command outcome. If the command fails (e.g., ls /nonexistent), the caller receives exitCode: 0 with an error message in output, which is semantically incorrect. The broker-side ExecResponse struct already has an ExitCode field, but neither the broker's execCommand handler nor the hub propagates it.

Impact: Consumers of this API cannot distinguish successful from failed command executions without parsing stdout/stderr.

Suggested Fix: Propagate the exit code from the runtime through the broker response and into the hub response, or at minimum remove the exitCode field until it can report accurately rather than returning a misleading hardcoded 0.

2. Timeout Field Accepted but Never Enforced

Files: pkg/hub/handlers.go (handleAgentExec), pkg/hub/broker_http_transport.go (ExecAgent), pkg/runtimebroker/handlers.go (execCommand)

The timeout is accepted in the hub request body, marshaled and sent to the broker, deserialized into ExecRequest.Timeout, and then completely ignored. The broker calls rt.Exec(ctx, id, req.Command) without any deadline derived from the timeout.

Impact: A caller sending {"command": ["sleep", "3600"], "timeout": 5} would block the HTTP handler for up to an hour (or until the default HTTP timeout). This is a potential resource exhaustion vector.

Suggested Fix: Either enforce the timeout via context.WithTimeout in the broker's execCommand:

if req.Timeout > 0 {
    var cancel context.CancelFunc
    ctx, cancel = context.WithTimeout(ctx, time.Duration(req.Timeout)*time.Second)
    defer cancel()
}

Or remove the timeout field from the API contract until it's implemented, to avoid giving callers a false sense of safety.


Observations

3. RuntimeBrokerAgentActionMethod Does Not Cover All Runtime Broker Actions

File: pkg/api/agent_actions.go

The function maps only 9 of 18 declared constants. Actions like AgentActionStatus, AgentActionMessage(s), AgentActionEnv, etc., return ("", false). This is correct for the current runtime broker handler (which only routes those 9), but the function name suggests broader coverage. If a future developer adds a new runtime broker action and forgets to update this switch, the default case returns ("", false)NotFound, which is safe but could be confusing.

Suggestion: Add a brief doc comment to RuntimeBrokerAgentActionMethod clarifying it only covers actions routed through the runtime broker's handleAgentAction, not all agent actions globally.

4. Double Blank Line in pkg/api/agent_actions.go

Minor: there's a double blank line between the const block and the RuntimeBrokerAgentActionMethod function (lines 30-31 in the diff). Trivial style nit.

5. requireRuntimeBrokerAssigned Changes Error String

File: pkg/hub/agent_actions.gopkg/hub/httpdispatcher.go

The refactor changes all fmt.Errorf("agent has no runtime broker assigned") calls to errors.New("agent has no runtime broker assigned") via the new helper. While semantically equivalent, any code doing string-based error matching would see no difference. This is clean. However, the sentinel error errNoRuntimeBrokerAssigned is unexported — if any external package needs to check for this specific error condition, it won't be able to use errors.Is. For now this is fine since it's only used internally.

6. exec Added to Grove Authorization Gate — Correct

File: pkg/hub/handlers.go (line ~4298 in the diff)

case api.AgentActionStart, api.AgentActionStop, api.AgentActionRestart, api.AgentActionMessage, api.AgentActionExec:

The exec action is properly gated behind the ActionAttach policy check for grove-scoped requests, consistent with other interactive actions. This is the correct authorization behavior.

7. Hub handleAgentAction Does Not Enforce ActionAttach for Exec via Direct Agent Route

File: pkg/hub/handlers.go (handleAgentAction ~line 1673)

In the direct /api/v1/agents/{id}/exec route, the authorization check (lines 1673–1675 in the base) only applies the agent-identity grove-scoping check for agent callers. For user callers, there is no explicit ActionAttach check in handleAgentAction — unlike handleGroveAgentAction which does enforce it. This means a user with valid authentication but without ActionAttach authorization could potentially exec into any agent via the direct route.

Note: This pattern pre-exists for start, stop, restart, message on the direct route as well, so it's not a regression introduced by this PR. But given that exec is a higher-privilege operation, it may be worth adding the ActionAttach check on the direct route too.


Positive Feedback

  • Constant extraction is well done. Moving magic strings to pkg/api/agent_actions.go is a solid improvement that prevents typo-driven bugs and makes the codebase grep-friendly.
  • requireRuntimeBrokerAssigned helper is a clean DRY refactor, reducing 10+ identical guard clauses to a single sentinel error.
  • The core GET-for-read-only-actions fix is correct and minimal. The RuntimeBrokerAgentActionMethod function is a clear, maintainable approach.
  • Test coverage for both the direct agent route and grove agent route for exec is thorough. The TestMethodNotAllowed and TestAgentLogsAllowsGet tests validate the HTTP method enforcement change.
  • All mock dispatchers updated across test files to satisfy the new interface method — no compilation issues.

Final Verdict

Needs Minor Rework. The PR title and description undersell the scope — this also adds a full exec endpoint, which is the most sensitive part. The two critical issues (hardcoded exit code, ignored timeout) should be addressed before merge:

  1. Exit code: Either propagate it or remove the field from the response to avoid misleading consumers.
  2. Timeout: Either enforce it or strip it from the API contract. Accepting-but-ignoring a safety parameter is a footgun.

The GET method fix and string constant refactor are clean and ready to merge as-is.

@mfreeman451 mfreeman451 force-pushed the fix/runtimebroker-agent-logs-get branch from a84af11 to 43f0cd0 Compare April 11, 2026 23:21
@mfreeman451
Copy link
Copy Markdown
Contributor Author

This branch has been rebuilt and narrowed on top of current main. The older review text about the broader exec surface no longer matches the current diff.

Current branch scope is now only:

  • pkg/api/agent_actions.go
  • pkg/runtimebroker/handlers.go
  • pkg/runtimebroker/handlers_test.go

So the surviving change here is the GET-method fix for read-only runtime broker actions, plus the helper that maps those routed actions to their HTTP methods. I also added a doc comment clarifying that RuntimeBrokerAgentActionMethod(...) intentionally covers only actions handled through the runtime broker path, not every agent action in the package.

Validation:

  • go test ./pkg/runtimebroker -run 'Test(MethodNotAllowed|AgentLogsAllowsGet)$' -count=1\n\nCurrent head: 86db4c36

Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ptone ptone merged commit b94cba6 into GoogleCloudPlatform:main Apr 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants