feat(server): add runtime auth and namespace scoping#214
feat(server): add runtime auth and namespace scoping#214abhinav-galileo wants to merge 12 commits into
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
09cb289 to
19fa65c
Compare
ad586bb to
3a5b7e4
Compare
af54543 to
479ca86
Compare
8312b99 to
e75cbb7
Compare
dce333a to
69aaa49
Compare
| ``context_builder`` on the dependency surfaces ``target_type`` / | ||
| ``target_id``, the provider also enforces that they match the token's | ||
| binding — runtime endpoints get the request-target check for free. | ||
| binding - runtime endpoints get the request-target check for free. |
There was a problem hiding this comment.
P1:
Line 58-76:
Target-binding check is conditional on the request supplying a target, not on the token carrying one. Today a runtime token bound to (env, prod) will pass authorization for /api/v1/evaluation if the caller simply omits target_type/target_id from the body — evaluation_context returns {"target_type": None, "target_id": None}, and lines 61/69 short-circuit on the requested_target* is not None guard. EvaluationRequest.target_type is Optional ([models/evaluation.py:46], so Pydantic will not reject this. Net effect: a target-bound token can be used for a namespace-wide eval that ignores the binding.
Suggested fix — invert the check so the token's binding is the source of truth:
if claims.target_type is not None or claims.target_id is not None:
if context is None:
raise ForbiddenError(...)
if context.get("target_type") != claims.target_type:
raise ForbiddenError("target_type does not match")
if context.get("target_id") != claims.target_id:
raise ForbiddenError("target_id does not match")
Add a regression test in test_runtime_token_exchange_endpoint.py next to test_evaluation_rejects_runtime_jwt_for_wrong_target that omits target_type/target_id from the eval body and asserts 403.
There was a problem hiding this comment.
Fixed in d0dd12c. The verifier now treats the token binding as authoritative, so missing or mismatched request target context is denied. Added a regression test for target-less evaluation.
| dependencies=[Depends(get_api_key_from_header)], | ||
| ) | ||
|
|
||
| # Evaluator discovery still uses the local credential dependency. |
There was a problem hiding this comment.
Two routers were left on the legacy Depends(require_api_key) gate — evaluator_router (with the new "Evaluator discovery still uses the local credential dependency." comment) and observability_router. That means AGENT_CONTROL_AUTH_MODE=none and AGENT_CONTROL_AUTH_MODE=http_upstream silently do not apply to either route. An operator running in upstream-auth mode will still find /api/v1/evaluators / /api/v1/observability/... requiring an X-API-Key, and a none-mode deployment will return 401 from these even though the rest of the API is open.
Either migrate them in this PR (each is a one-line require_operation swap with a new EVALUATORS_READ / OBSERVABILITY_WRITE Operation member), or land a follow-up issue link in the comment so the deferral is explicit, not implied.
There was a problem hiding this comment.
Same applies to server/endpoints/observability line 45
There was a problem hiding this comment.
Agreed this should be explicit. I am leaving evaluator discovery and observability on the existing local-credential gate in this stack because migrating them adds new generic operation names to the upstream auth contract. I will track that as a separate auth-framework follow-up.
There was a problem hiding this comment.
Covered in the parent thread. Same follow-up: observability needs a generic operation added before moving it to the provider contract.
There was a problem hiding this comment.
Done in this stack. Evaluator discovery and observability now declare explicit auth-framework operations (evaluators.read, observability.read, observability.write). The router-level API-key extractor is kept only for OpenAPI security metadata, while local API-key mode preserves the previous authenticated-access behavior.
| ) | ||
|
|
||
|
|
||
| async def _evaluation_context(request: Request) -> dict[str, object]: |
There was a problem hiding this comment.
except Exception swallows every parse failure and returns {}, so a malformed body removes the target check from the runtime authorizer entirely (cf. Comment 1 above). FastAPI's body validator will eventually 422 the request, but the authorizer has already accepted the token by then — useful signal lost from logs, and harder to spot if the auth ordering ever changes.
Suggested: narrow the catch to (json.JSONDecodeError, UnicodeDecodeError), log at debug, and once Comment 1 lands the fail-open hole closes for free. Don't widen _evaluation_context to enforce — keep the policy in LocalJwtVerifyProvider, the context builder should just supply data.
There was a problem hiding this comment.
Fixed in d0dd12c. The context builder now only catches JSON/Unicode decode failures and logs at debug; target enforcement stays in the JWT provider.
| policy_id = int(policy.json()["policy_id"]) | ||
| attach_to_policy = ns_a.post(f"/api/v1/policies/{policy_id}/controls/{control_id}") | ||
| assert attach_to_policy.status_code == 200, attach_to_policy.text | ||
|
|
There was a problem hiding this comment.
The new test pins create / list / get / delete cross-namespace isolation, but skips the mutating verbs on controls (PATCH /controls/{id}, PUT /controls/{id}/data), the binding by-key paths (PUT /control-bindings/by-key, POST /control-bindings/by-key:delete), and the agent association endpoints (POST /agents/{name}/policies/{policy_id} etc.). Each of those threads principal.namespace_key independently and a future refactor that drops the kwarg on one of them won't be caught.
Suggested: parametrize a single test_namespace_isolation_for_writes over (method, url, payload) tuples covering PATCH /controls, PUT /controls/{id}/data, PUT /control-bindings/by-key, POST /agents/{name}/policies/{policy_id}, and POST /agents/{name}/controls/{control_id} — assert each rejects (404 or empty list) when called from ns-b for an ns-a resource. ~40 lines.
There was a problem hiding this comment.
Added coverage in d0dd12c for the cross-namespace write paths: control patch/data update, binding by-key upsert/delete, and agent policy/control associations.
| self, | ||
| control_id: int, | ||
| *, | ||
| namespace_key: str | None = None, |
There was a problem hiding this comment.
The sibling get_active_control_or_404 was tightened to make namespace_key required in this PR ([line 163], but get_control_or_404 still accepts namespace_key: str | None = None. Both internal callers (lines 240, 279) pass it, so making it required is a no-op today but closes the door on a future caller forgetting it.
Suggested: drop the default and make the parameter required, same as get_active_control_or_404.
There was a problem hiding this comment.
Fixed in d0dd12c. get_control_or_404 now requires namespace_key and always scopes the query by it.
| timeout = float(os.environ.get(_UPSTREAM_TIMEOUT_ENV, "5.0")) | ||
| token = os.environ.get(_UPSTREAM_TOKEN_ENV) | ||
| token_header = os.environ.get(_UPSTREAM_TOKEN_HEADER_ENV, "X-Agent-Control-Service-Token") | ||
| extra_forward_headers = _parse_extra_forward_headers( |
There was a problem hiding this comment.
_build_default_provider accepts header as an alias for api_key (good, backwards-compatible), but the RuntimeError lists only 'none', 'api_key', or 'http_upstream' — operators reading the error after a typo will think header is invalid. Same nit for _resolve_runtime_mode. Add 'header' to the error string, or drop the alias from the accepted set if you intend to retire it.
There was a problem hiding this comment.
Fixed in d0dd12c. The accepted-mode error text now includes header wherever the alias is accepted.
Add explicit none, api_key, and jwt runtime auth modes, including a generic no-auth provider. Move controls, bindings, policies, agents, and evaluation storage lookups onto principal namespace scoping. Cover auth mode selection and principal namespace isolation with server tests.
…stream The default forward set (X-API-Key, Authorization, Cookie) only covers credential headers Agent Control itself reads. Deployments whose upstream authenticates against a different header name (e.g., a deployer-specific API-key header) had no way to surface that credential through HttpUpstreamAuthProvider — the inbound header reached AC but never crossed the upstream call. Add an extra_forward_headers config field on HttpUpstreamConfig (defaulting to the empty tuple) that operators populate via the new AGENT_CONTROL_AUTH_UPSTREAM_EXTRA_FORWARD_HEADERS env var (comma- separated). The provider's _forward_headers iterates over the union of the default set and the extras, deduplicating case-insensitively so a duplicate name (cross-set or within extras) does not produce two copies on the wire. Tests: - forwards a configured extra header alongside defaults - default forward set unchanged when extras are empty - extras dedupe against defaults case-insensitively - _parse_extra_forward_headers parametric: None / empty / single / multiple / whitespace / empty-entries / case-folded duplicates - configure_auth_from_env threads the parsed tuple onto the provider Lint clean, typecheck clean, full server suite (747) green.
e75cbb7 to
2935d2d
Compare
69aaa49 to
4b778e3
Compare
Summary
none,api_key, andjwt.Stack
Testing
make prepushon the stacked branch in feat(sdk): add runtime token auth #215.