Skip to content

[Feature]: Admin-gated GET /api/v1/admin/sessions — assemble the global issue-derived session list + live in-memory overlay (H3) #212

@chronoai-shining

Description

@chronoai-shining

Part of milestone #6 (Admin global session observability).

File-only spec. No implementation lands from this issue itself — it specifies the consume-and-overlay layer for the sibling implement loop. This is the H3 issue of milestone #6; it depends on the H1 (App-installation enumeration) and H2 (global cross-repo aggregation reader) issues filed under the same milestone.

Background

fkst-hosted is database-free: a controller only knows the sessions and goals it created. There is no datastore and no reconcile-from-GitHub path, so today nobody — not even an fkst:admin — can see the platform-wide set of sessions. The state that is durable lives in each goal's GitHub Issue.

  • GROUND (DB-free, controller-local only): the live session store is an in-memory Arc<Mutex<HashMap<bson::Uuid, SessionDoc>>> owned by the authoritative controller — backend/fkst-control-plane/src/sessions/repo.rs:39-42 (SessionRepo). Its module doc states that a controller restart loses in-flight in-memory sessions and that they are recovered from running workers' OS-truth re-adoption (feat: worker engine execution + re-adopt live engines on restart (OS-truth runtime tracking) #136) and the GitHub journal skip-set (feat: journaling to committed file + rolling issue comment (remove session_progress & run_journals) #139) — never from a datastore (repo.rs:1-19). So the controller's in-memory view is partial by construction — it covers only sessions this controller currently runs. SessionRepo exposes a per-id lookup get(id) (repo.rs:104) but no list/snapshot method — the overlay must therefore be driven by the issue-derived rows, looking up each row's live doc by id (see Implementation).
  • GROUND (session-state IS recoverable from an issue): each goal issue carries the feat: session-lifecycle goal-issue labels + persisted terminal cause (user-stop vs graceful completion) #180 lifecycle labels — fkst-goal plus, once a session spawns, fkst-session-<id> and exactly one of fkst-running / fkst-terminated / fkst-completed / fkst-failed (backend/fkst-control-plane/src/goals/labels.rs:18-34). The legacy status:* scheme is gone (confirmed by goals/issue_store.rs:15 "status is in-memory only and no longer mirrored as a status:* label" and its assertions at issue_store.rs:898-902). Goal metadata (owner / org / packages / repo) is recoverable from the hidden marker — GoalMarker { v, goal_id, owner_user_id, org_id, package_names, repo } parsed by parse_marker (backend/fkst-control-plane/src/goals/marker.rs:27-34, 63-86). The engine prompt is never in GitHub (marker.rs:6-8, test marker_never_contains_the_prompt at marker.rs:126-128).
  • GROUND (live fields exist only in memory): SessionDoc holds pod_id, fencing_token, pid, runtime_dir, status, goal_id, owner_user_id, org_id, repo, plus created_at / started_at / stopped_at and terminal_causebackend/fkst-shared/src/models/mod.rs:55-131 (the live fields pod_id/fencing_token/pid/runtime_dir at models/mod.rs:60-63). SessionStatus { Pending, Validating, Running, Stopping, Stopped, Failed } at models/mod.rs:27-36. The goal→session link is controller memory only: GoalIssueStore::set_active_session / active_session (goals/issue_store.rs:607, 670).
  • GROUND (the App cannot enumerate installations today): github_app/api.rs exposes exactly two endpoints — installation_for_repo (GET /repos/{o}/{r}/installation, api.rs:165) and create_installation_token (POST /app/installations/{id}/access_tokens, api.rs:216). There is no GET /app/installations (list) nor GET /installation/repositories, so the global repo set the reader needs does not exist yet (that is H1). The App mints issues:write (⊇ issues:read) installation tokens (github_app/mod.rs:164 default_permissions(); asserted by api.rs test token_mint_serializes_admin_and_pull_requests). Today's only issue aggregation, github_hub/fanout.rs:117 aggregate_issues, fans out over the user's own linked accounts via the NyxID GithubProxy (fanout.rs:27, 125) — not the App, and not global. No GitHub Search API (/search/issues) is used anywhere.
  • GROUND (no admin surface, ADMIN is bypass-only): authz/permissions.rs:30 defines ADMIN = "fkst:admin" as an escape hatch that bypasses both authz layers (require_permission at permissions.rs:79-91, bypass branch permissions.rs:80). There is no admin route anywhere; build_router merges only sessions, goals, github, catalog, repos (backend/fkst-control-plane/src/router.rs:45-49).

This issue (closes H3) adds the admin-only HTTP surface: it consumes the global issue-derived list produced by the sibling aggregation reader and overlays the controller's live in-memory fields for the sessions this controller currently runs. It is the consumer/assembly layer; it does not itself talk to GitHub.

Purpose

Give an fkst:admin (or a read-only admin observer) a single endpoint returning all sessions' state + metadata platform-wide — the global, issue-derived list merged with a live in-memory overlay (pid, runtime_dir, pod_id, fencing_token, live status) for the subset of sessions this controller runs — without ever exposing the engine prompt or any secret/env value, and without an object-layer owner check (admin sees everything by design).

Relationships

Affected Files

File New/Modify Why
backend/fkst-control-plane/src/routes/admin.rs new The admin router + handler(s): GET /api/v1/admin/sessions (and optionally GET /api/v1/admin/sessions/:session_id); calls the H2 aggregation reader, applies the live overlay, serializes AdminSessionView.
backend/fkst-control-plane/src/routes/mod.rs modify Declare pub mod admin;.
backend/fkst-control-plane/src/router.rs modify .merge(routes::admin::router()) alongside the existing merges (router.rs:45-49).
backend/fkst-control-plane/src/authz/permissions.rs modify Add the dedicated ADMIN_READ = "fkst:admin:read" const (see Implementation) with its doc comment + a unit test; ADMIN continues to bypass.
backend/fkst-control-plane/src/state.rs modify (if needed) Expose the H2 aggregation-reader handle on AppState (behind a trait for test injection). The live SessionRepo (via AppState.sessions.repo()) and GoalIssueStore (AppState.goals) are already present.
docs/api-reference.md modify Document the new admin endpoint(s), response shape, the partial-overlay semantics, the per-repo errors[]/scanned_repos contract, and the "no prompt / no secret" guarantee; note the GLOBAL-vs-LOCAL distinction relative to #144.
backend/fkst-control-plane/src/routes/admin.rs (#[cfg(test)]) new tests Fake aggregation reader + seeded SessionRepo → assert the merge, the gating, the partial-failure surfacing, and the absence of secret fields.

(All paths are current — fkst-control-plane, never the renamed-away fkst-hosted-api.)

Implementation Instructions

Land these as small, atomic, individually-buildable commits, in order. Each commit must compile and keep the suite green.

  1. Add the dedicated read permission. In backend/fkst-control-plane/src/authz/permissions.rs, add:

    /// Read the GLOBAL admin observability surface — `GET /api/v1/admin/sessions`
    /// (milestone #6). A read-only admin capability: it grants visibility into
    /// every session platform-wide WITHOUT the full `fkst:admin` escape hatch, so
    /// NyxID can hand observers least-privilege read access. `ADMIN` still bypasses.
    pub const ADMIN_READ: &str = "fkst:admin:read";

    Justification (recommended choice): gate the endpoint on ADMIN_READ, not on ADMIN directly. require_permission already lets ADMIN bypass any required permission (permissions.rs:80), so a full admin still gets in, while a read-only observer can be granted fkst:admin:read alone — least privilege, consistent with how feat(repos): add POST /api/v1/repos/:owner/:name/fkst-setup to scaffold .fkst/ via GitHub App contents-write #181 split out fkst:repo:setup (permissions.rs:65) rather than reusing a broad capability. Add a unit test mirroring repo_setup_is_a_distinct_grantable_permission (permissions.rs:140-152): assert ADMIN_READ == "fkst:admin:read", that a ctx with only SESSION_READ is denied, that a ctx with ADMIN_READ is allowed, and that ADMIN bypasses it.

    • Verify: cargo test -p fkst-control-plane authz::permissions → all permission tests pass.
  2. Define the response DTOs in routes/admin.rs. Pure serde types, no I/O. Shape:

    #[derive(serde::Serialize)]
    pub struct AdminSessionsResponse {
        pub sessions: Vec<AdminSessionView>,
        pub errors: Vec<RepoScanError>,   // per-repo partial failures, surfaced not swallowed
        pub scanned_repos: usize,
    }
    #[derive(serde::Serialize)]
    pub struct AdminSessionView {
        pub session_id: String,
        pub state: String,                // issue-derived lifecycle (from #180 labels)
        pub live: Option<LiveOverlay>,    // None for issue-only rows (partial by design)
        pub goal_id: String,
        pub owner_user_id: String,
        pub org_id: Option<String>,
        pub package_names: Vec<String>,
        pub repo: Option<RepoRef>,        // owner/name only
        pub issue_url: String,
        pub created_at: Option<String>,
        pub updated_at: Option<String>,
        pub source: AdminSessionSource,   // IssueDerived | LiveEnriched
    }
    #[derive(serde::Serialize)]
    pub struct LiveOverlay {
        pub pid: Option<i32>,
        pub runtime_dir: Option<String>,
        pub pod_id: Option<String>,
        pub fencing_token: Option<i64>,
        pub status: Option<String>,       // live SessionStatus, may differ from issue label
    }

    AdminSessionSource is a #[serde(rename_all = "snake_case")] enum { IssueDerived, LiveEnriched }. There is deliberately NO prompt/description field and NO secret/env field — assert this in tests (instruction 6). Reuse RepoRef from fkst-shared (models/mod.rs:20-24) for repo.

  3. Implement the overlay/merge function (kept testable in isolation). Signature, e.g.:

    async fn apply_overlay(
        issue_rows: Vec<GlobalSessionRow>,
        live: &SessionRepo,           // from AppState.sessions.repo()
        links: &GoalIssueStore,       // AppState.goals — the goal->session link
    ) -> Vec<AdminSessionView>

    The issue-derived rows drive the iteration (there is no SessionRepo list/snapshot method — only get(id) at repo.rs:104). For each row, resolve its session_id (use the row's session id directly where the feat: session-lifecycle goal-issue labels + persisted terminal cause (user-stop vs graceful completion) #180 fkst-session-<id> label provided it; otherwise resolve it from the goal id via GoalIssueStore::active_session(goal_id) at issue_store.rs:670), then look up the controller's live SessionDoc with SessionRepo::get(session_id). When a live doc is present, set live = Some(LiveOverlay { pid, runtime_dir, pod_id, fencing_token, status }) from SessionDoc (models/mod.rs:60-63 + status) and source = LiveEnriched; otherwise live = None, source = IssueDerived. The overlay is partial by design — only sessions this controller runs are enriched; document this on the function. (For pure unit-testing without async plumbing, factor the field-copy step — SessionDoc → LiveOverlay — into a small synchronous helper and test it directly.)

    • Verify: cargo test -p fkst-control-plane routes::admin::tests::overlay → merge unit tests pass.
  4. Implement the handler async fn list_sessions(...) in routes/admin.rs:

    • Gate first, exactly as the existing protected handlers do (extract AuthContext as a handler argument — see routes/repos.rs:83,87): require_permission(&ctx, permissions::ADMIN_READ)?;. No object-layer check — admin sees everything by design.
    • Call the H2 aggregation reader (injected via AppState behind a trait) to get (rows, errors, scanned_repos). The reader owns all GitHub I/O: the H1 installation enumeration, reading each repo's fkst-goal issues with the App installation token (issues:read), the FKST_ADMIN_SCAN_ORGS scope, REST pagination (follow Link: rel="next" / per_page=100), rate-limit handling (the reader lives in the github_app module and reuses its pub(super) reset_seconds / is_rate_limited at github_app/api.rs:128,154), and per-repo partial-failure isolation (one repo's read failing must not fail the whole scan — mirror the resilient fan-out in github_hub/fanout.rs). This handler must propagate the reader's errors into the response errors[] (never swallow them) and report scanned_repos. This handler does not call GitHub or those pub(super) rate-limit helpers directly.
    • Apply apply_overlay (instruction 3) and return Json(AdminSessionsResponse { ... }).
    • Optionally implement GET /api/v1/admin/sessions/:session_id returning a single AdminSessionView (404 when absent across both the global list and the live store).
    • Support optional query filters (state, org, repo) and basic pagination of the assembled list; apply them after assembly so issue-derived and live-enriched rows filter uniformly.
  5. Wire the router. Add pub fn router() -> Router<AppState> in routes/admin.rs (mirror routes/repos.rs:253-254) exposing /admin/sessions (so it nests under /api/v1), declare pub mod admin; in routes/mod.rs, and add .merge(routes::admin::router()) in router.rs next to the existing merges (router.rs:45-49). It lives inside the protect() nest so it carries NyxID proxy-trusted identity (unlike the unauthenticated webhook at router.rs:77-79).

    • Verify: cargo build -p fkst-control-plane → builds clean; cargo clippy -p fkst-control-plane -- -D warnings → no warnings.
  6. Tests in routes/admin.rs (#[cfg(test)]):

    • Merge: a fake aggregation reader returning two issue-derived rows (one whose session_id matches a seeded SessionRepo entry, one that does not) → assert the matching row has live = Some(..) with the seeded pid/runtime_dir/pod_id/fencing_token/status and source = LiveEnriched, and the non-matching row has live = None, source = IssueDerived.
    • Gating: request with no fkst:admin:read and no fkst:admin403; with fkst:admin:read200; with fkst:admin (bypass) → 200.
    • No secrets: serialize the response and assert the JSON contains none of: the goal prompt/description, any env/secret value, or any field named prompt/description/secret/env/token (negative substring assertions over serde_json::to_string).
    • Partial failure surfaced: fake reader returns one row + one RepoScanError → assert the error appears in errors[] and scanned_repos is reported (the request still returns 200).
    • Verify: cargo test -p fkst-control-plane routes::admin → all pass.
  7. Document in docs/api-reference.md: the endpoint(s), the AdminSessionsResponse / AdminSessionView / LiveOverlay shape, the fkst:admin:read requirement (admin bypasses), the partial-overlay semantics (only locally-run sessions are live-enriched), the per-repo errors[] + scanned_repos contract, and the explicit "prompt and secrets are never returned" guarantee. Note the GLOBAL-vs-LOCAL distinction relative to chore: config, observability & deployment topology for the controller/worker model #144's controller-local observability work.

    • Verify: cargo test -p fkst-control-plane && cargo clippy -p fkst-control-plane -- -D warnings → green.

Constraints / Non-goals

  • NEVER make repos public / no visibility change. This feature reads private repos via the App installation token (the H2 reader); it does not alter repo visibility. Do not add any visibility-change code (goals/repo_create.rs already sets private at create and there is no post-create change).
  • NEVER expose the goal prompt or any secret/env value. The prompt lives only in controller memory; the marker never carries it (marker.rs:6-8). The admin response must contain only non-sensitive metadata + the live overlay fields enumerated above.
  • No object-layer owner check in this handler — admin observability is global by design; the only gate is the action-layer permission.
  • This handler does NOT call GitHub directly. All GitHub enumeration/aggregation/pagination/rate-limit/partial-failure logic belongs to the H1/H2 reader; this issue is the consume-and-overlay layer only.
  • Never modify the kernel engine (fkst-substrate) or any upstream repo; this is purely a hosted user-facing surface.
  • File-only spec: no implementation lands from this issue itself; it specifies the work for the sibling implement loop.
  • Keep every source file under 500 lines; split admin.rs (e.g. DTOs / overlay / handler) if it approaches the limit.

Definition of Done

  • GET /api/v1/admin/sessions returns the global issue-derived list merged with the live in-memory overlay, gated by fkst:admin:read (fkst:admin bypasses), with errors[] + scanned_repos.
  • Overlay is partial by design (only locally-run sessions enriched) and documented as such.
  • Response contains no prompt/description and no secret/env value (asserted by a test).
  • Tests added/updated and green (merge, gating 403/200, no-secrets, partial-failure surfaced); coverage stays ≥ 80%.
  • docs/api-reference.md updated and notes the GLOBAL-vs-LOCAL distinction relative to chore: config, observability & deployment topology for the controller/worker model #144.
  • No kernel-engine / upstream change; no repo-visibility change.
  • No Co-Authored-By (or any co-author) trailer in any commit.
  • Commits are small, atomic, and buildable per commit; tree compiles + tests pass at each commit.
  • gitleaks clean; no secrets in code, tests, fixtures, or logs.
  • PR targets develop (or develop-auto), links this issue with Closes #<n>, includes a changeset (npx changeset), and CI is green (auto-merge on green).

Metadata

Metadata

Labels

apiHTTP API surfacebackendRust/Axum backend workpriority:P1High. Should be done this cycle.size:MMedium: a few files, one concern. Size is informational.status:readyFully specced (size + priority + acceptance criteria). Ready to implement.type:featureNew user-facing capability.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions