Skip to content

Admin global session observability — App-installation cross-repo goal-issue aggregation reader (H2) #211

@chronoai-shining

Description

@chronoai-shining

Part of milestone #6 (Admin global session observability).

Background

fkst-hosted is DB-free: a controller's only knowledge of goals/sessions is what it created, held in the per-controller in-memory GoalIssueStore (backend/fkst-control-plane/src/goals/issue_store.rs) — Arc<Mutex<HashMap<…>>> plus a GitHub-Issue mirror, with no reconcile-from-GitHub path and no registry of connected repos. GROUND: there is therefore no way to build a platform-wide view; an admin can see only the sessions that happen to live in one controller's memory.

The durable, non-secret facts needed for a global view DO already exist on each goal's GitHub Issue, and the parsers for them already exist:

  • Marker (goal metadata). parse_marker(issue_body) -> Result<GoalMarker, MarkerError> (backend/fkst-control-plane/src/goals/marker.rs:63) extracts the hidden <!-- fkst-hosted:goal\n{json}\n--> block into GoalMarker { v: u8, goal_id: String, owner_user_id: String, org_id: Option<String>, package_names: Vec<String>, repo: Option<RepoRef> } (marker.rs:27). GROUND: marker.rs:6-8 and the marker_never_contains_the_prompt test (marker.rs:126) prove the engine prompt is NEVER written to GitHub — it lives only in controller memory. The marker is the server-controlled source of truth and package_names/repo are re-validated on read (validate_goal_fields at marker.rs:83). Note repo is Option<RepoRef> (RepoRef is defined in backend/fkst-shared/src/models/mod.rs:21 and re-exported as goals::model::RepoRef).
  • Labels (session lifecycle, feat: session-lifecycle goal-issue labels + persisted terminal cause (user-stop vs graceful completion) #180). backend/fkst-control-plane/src/goals/labels.rs defines GOAL_LABEL = "fkst-goal" (labels.rs:19), the per-session link session_label(id: bson::Uuid) -> "fkst-session-<uuid>" (labels.rs:32), and exactly one lifecycle word: LABEL_RUNNING = "fkst-running" (labels.rs:22), LABEL_TERMINATED = "fkst-terminated", LABEL_COMPLETED = "fkst-completed", LABEL_FAILED = "fkst-failed" (labels.rs:24-28). GROUND: a session's state AND its session_id are recoverable from an issue's labels alone.

So a goal issue carries everything a global session row needs (non-secret), but nothing reads it cross-repo:

  • Issue aggregation today is user-scoped, not global, not App-based. aggregate_issues (backend/fkst-control-plane/src/github_hub/fanout.rs:117) fans out over the caller's own linked GitHub accounts via the NyxID proxy (proxy.accounts() at fanout.rs:125, proxy.request(&connection.connection_id, …) at fanout.rs:197). GROUND: this is the user's connected account, not the App installation, and it only covers repos that user can see — it cannot enumerate platform-wide. No GitHub Search API (/search/issues) usage exists anywhere in the crate.
  • The App can read private repos but has no enumeration/aggregation reader. backend/fkst-control-plane/src/github_app/ mints PER-REPO installation tokens via GithubAppTokens::token_for_repo (github_app/mod.rs:344) over GithubApi::installation_for_repo + create_installation_token (github_app/api.rs:77,87). default_permissions() (github_app/mod.rs:164) requests issues: Some("write") (⊇ read) per chore: grant substrate session tokens administration:write (+ pull_requests) for the whole session #110. The Contents READ helper (github_app/contents.rs, get_contents at contents.rs:112) already demonstrates the direct-reqwest request/classify pattern for reading a private repo via the App token, propagating NotInstalled/InstallationGone unchanged (contents.rs:117-119). GROUND: the App layer has NO GET /app/installations and NO GET /installation/repositories call, and no issue-list reader — only the per-installation POST /app/installations/{id}/access_tokens token mint exists. So global cross-repo aggregation via the App is entirely new.
  • No scan-scope config. backend/fkst-control-plane/src/config.rs loads variables via three fail-closed envy passes (FKST_HOSTED_* prefixed, FKST_* journal/auth, and the unprefixed pass; config.rs:3-5,17-20). GROUND: no org-allowlist / scan-orgs env var exists.

Purpose

Add a single backend service that builds the platform-wide session list by: enumerating connected repos (via the sibling install-enum service, gap H1), optionally scoped by a new FKST_ADMIN_SCAN_ORGS; for each repo, reading its fkst-goal-labeled issues with the App INSTALLATION token (private-safe, issues:read per #110 — NOT the user NyxID proxy); parsing each issue with the existing parse_marker + the labels.rs constants into a non-secret global session model. Repos stay private (no visibility change). This closes gap H2 and is consumed by the sibling admin-API issue (H3).

This issue is file-only / spec-only: no implementation is performed here.

Relationships

Affected Files

File Action Why
backend/fkst-control-plane/src/config.rs modify Add FKST_ADMIN_SCAN_ORGS (comma-separated org logins; unset/empty = all installations), fail-closed parse + doc comment + unit test.
backend/fkst-control-plane/src/github_app/api.rs modify Add the App issue-list transport method (GET /repos/{o}/{r}/issues?labels=fkst-goal&state=all&per_page=100, paginated) to the GithubApi trait + HttpGithubApi, reusing reset_seconds/is_rate_limited.
backend/fkst-control-plane/src/github_app/mod.rs modify Add a public issues_for_repo(owner_repo, labels) method on GithubAppTokens that mints an issues:read token and drives the paginated read (mirroring get_contents).
backend/fkst-control-plane/src/admin/mod.rs new Admin module root (re-exports the reader + its view types).
backend/fkst-control-plane/src/admin/session_reader.rs new The global aggregation reader: enumerate connected repos → read fkst-goal issues via App token → parse marker + labels → SessionView, resilient per-repo.
backend/fkst-control-plane/src/admin/session_view.rs new SessionView + SessionState enum + the per-repo RepoReadError (non-secret model).
backend/fkst-control-plane/src/lib.rs modify Declare pub mod admin; (alongside the existing pub mod …; list, e.g. after pub mod authz;).

Implementation Instructions

Each numbered item is one atomic, buildable, independently-reviewable commit. The repo must compile and cargo test -p fkst-control-plane must pass after every commit.

  1. Add the FKST_ADMIN_SCAN_ORGS config. In config.rs, add a field (e.g. pub admin_scan_orgs: Vec<String>) populated from FKST_ADMIN_SCAN_ORGS, following the existing fail-closed envy-pass pattern (it rides the FKST_/unprefixed pass like the other operational vars — match whichever pass the surrounding admin/operational vars use). Parse = split on ,, trim() each, drop empties; unset or empty string ⇒ empty Vec (meaning "all installations", documented). Add a doc comment explaining the why: this is an OPTIONAL scope on the installation set, NOT a separate public-repo scan, and the empty default means all installations. Add unit tests for: unset ⇒ empty; " a , ,b "["a","b"]; single value.

    • Test: cargo test -p fkst-control-plane config:: → all green, including the new scan-orgs cases.
  2. Define the non-secret view types. Add admin/session_view.rs:

    • SessionState enum { Running, Terminated, Completed, Failed, Unknown }, derived from the single lifecycle label via the labels.rs constants (LABEL_RUNNINGRunning, LABEL_TERMINATEDTerminated, LABEL_COMPLETEDCompleted, LABEL_FAILEDFailed); Unknown when no lifecycle label is present (goal created, no session yet).
    • SessionView { session_id: Option<String>, state: SessionState, goal_id: String, owner_user_id: String, org_id: Option<String>, package_names: Vec<String>, repo: Option<RepoRef>, issue_number: u64, issue_url: String, created_at: String, updated_at: String } — non-secret ONLY. repo is Option<RepoRef> because GoalMarker.repo is itself optional (marker.rs:33); do NOT unwrap it. Derive Serialize. Add a doc comment stating it MUST NEVER carry the goal prompt/description or any env/secret value (and that none are present in the issue to begin with — see marker.rs:6-8).
    • RepoReadError { repo: String, kind: String, message: String, retry_after_secs: Option<u64> } — structured, credential-free, mirroring fanout.rs's AccountError.
    • Provide a SessionState::from_labels(&[String]) -> (SessionState, Option<String>) helper that resolves the lifecycle state AND extracts the session_id by stripping the fkst-session- prefix (use labels::session_label / the label consts as the single source of truth — do NOT hardcode the strings).
    • Unit tests: from_labels(&["fkst-goal","fkst-session-<id>","fkst-completed"])(Completed, Some("<id>")); a label set with no lifecycle word ⇒ (Unknown, None); a set with the goal+session link but no lifecycle word ⇒ (Unknown, Some("<id>")).
    • Test: cargo test -p fkst-control-plane admin::session_view → green.
  3. Add the App issue-list transport. In github_app/api.rs, extend the GithubApi trait with list_repo_issues(&self, token: &SecretString, owner: &str, repo: &str, labels: &str, page: u32) -> Result<RawIssuePage, GithubAppError> and implement it on HttpGithubApi:

    • GET {api_base}/repos/{owner}/{repo}/issues?labels={labels}&state=all&per_page=100&page={page}, accept: application/vnd.github+json, bearer_auth(token) (the installation token, NOT the app JWT — quote this distinction in the doc comment: the App-JWT is for installation_for_repo / token mint; the per-repo INSTALLATION token reads issues).
    • Classify exactly like installation_for_repo (api.rs:182-203): reuse reset_seconds/is_rate_limited for 403 disambiguation → GithubAppError::RateLimited(secs); 401/plain-403 → AppAuth; 404 → a typed "repo not readable" (reuse NotFound or Http); non-success → Http.
    • RawIssuePage { issues: Vec<RawIssue>, has_more: bool } where RawIssue { number: u64, html_url: String, body: String, labels: Vec<String>, created_at: String, updated_at: String } (tolerant Deserialize, labels decoded from GitHub's { "name": … } objects). Set has_more from the Link rel="next" header by reusing the existing public github_hub::service::has_next_page(&HeaderMap) -> bool (github_hub/service.rs:149) — it is pub and already covers RFC-5988 rel="next"; do NOT add a new parser or a dependency.
    • wiremock unit tests (mirror api.rs's existing wiremock style): page-1 returns 2 issues + a Link next ⇒ has_more=true; last page ⇒ has_more=false; a 403 with x-ratelimit-remaining: 0RateLimited(_); a 401 ⇒ AppAuth.
    • Test: cargo test -p fkst-control-plane github_app::api → green.
  4. Add the per-repo App issue reader. In github_app/mod.rs, add pub async fn issues_for_repo(&self, owner_repo: &str, labels: &str) -> Result<Vec<RawIssue>, GithubAppError>:

    • Mint an installation token via the existing token_for_repo(owner_repo, Some(perms)) (mod.rs:344) where perms is a TokenPermissions with issues: Some("read".into()) and everything else None (least privilege; the App holds issues:write ⊇ read per chore: grant substrate session tokens administration:write (+ pull_requests) for the whole session #110, so the subset mint succeeds).
    • Loop pages via list_repo_issues until has_more == false; bound the loop with a MAX_PAGES const (document the read cost: ≤ MAX_PAGES × 100 issues per repo; behavior at large installation counts is bounded by this and the caller's concurrency). Propagate NotInstalled/InstallationGone UNCHANGED (mirror get_contents at contents.rs:117-119).
    • wiremock test: installation + token-mint mounts (reuse the existing mount_token_mint//app/installations/{id}/access_tokens pattern from api.rs/contents.rs tests) + a 2-page issues mock ⇒ the merged Vec<RawIssue> has all items; assert the minted-token request body carries issues: "read".
    • Test: cargo test -p fkst-control-plane github_app:: → green.
  5. Implement the global reader (resilient aggregation). Add admin/session_reader.rs with an injectable design mirroring fanout.rs:

    • A trait the reader depends on for repo enumeration (provided by the H1 install-enum sibling), e.g. ConnectedRepos { async fn connected_repos(&self, scope_orgs: &[String]) -> Result<Vec<RepoRef>, AppError> }, and a trait for reading a repo's goal issues (implemented by GithubAppTokens::issues_for_repo), e.g. IssueReader { async fn issues_for_repo(&self, owner_repo: &str, labels: &str) -> Result<Vec<RawIssue>, GithubAppError> }. Inject both so the reader is unit-testable against fakes with no live GitHub.
    • pub async fn read_global_sessions(repos: &dyn ConnectedRepos, app: Arc<impl IssueReader>, scope_orgs: &[String]) -> GlobalSessionsResult where GlobalSessionsResult { sessions: Vec<SessionView>, errors: Vec<RepoReadError>, skipped: u64 }.
    • Behavior: call connected_repos(scope_orgs) (the only call that may bubble up a non-partial error). For each repo, issues_for_repo("{owner}/{name}", GOAL_LABEL) concurrently via JoinSet (mirror fanout.rs:138) with a per-repo timeout; a per-repo failure is collected into errors as a RepoReadError, NEVER fatal. For each issue: parse_marker(&body) → on Err, increment skipped and tracing::warn! with the repo + issue number (do NOT crash — an issue may have a missing/hand-broken marker); resolve labels → SessionState + session_id via SessionState::from_labels; build a SessionView from marker + labels + issue fields (non-secret only). Respect the pagination + rate-limit + partial-failure contract from steps 3–4.
    • Unit tests (fakes only): two fake repos, one returning two well-formed goal issues (one running, one completed) across pages and one returning a transport error ⇒ result has 2 sessions from the good repo, 1 RepoReadError for the bad repo, correct SessionState mapping; an issue with no marker ⇒ skipped += 1 and absent from sessions; assert no SessionView field ever contains the prompt (use a fake issue body whose marker has no prompt — the marker never does — and assert the serialized JSON has no description/prompt/secret keys).
    • Test: cargo test -p fkst-control-plane admin::session_reader → green.
  6. Wire the module. Add pub mod admin; to lib.rs (alongside the existing pub mod …; declarations) and re-export the reader + view types from admin/mod.rs. No HTTP route is added here — the route belongs to the sibling H3 issue, so do NOT touch router.rs or add a routes/admin.rs.

    • Test: cargo build -p fkst-control-plane && cargo test -p fkst-control-plane → green.
  7. Pagination / rate-limit / partial-failure summary doc. Add a module-level //! doc to admin/session_reader.rs stating the explicit, bounded read cost (≤ MAX_PAGES × 100 per repo × repo count, concurrency-capped), the per-repo partial-failure contract (a slow/failing repo records a RepoReadError, never aborts the aggregate), and the rate-limit handling (RateLimited surfaced, bounded retries). This is documentation-of-intent for the H3 consumer; no behavior change.

    • Test: cargo test -p fkst-control-plane → green (doc-only).

Constraints / Non-goals

  • NEVER make repos public / no visibility change. Reads go through the App installation token (issues:read) so private repos are read in place. Do NOT touch goals/repo_create.rs's private: spec.private or add any visibility-change code.
  • NEVER expose the goal prompt or any secret/env value. The prompt and env/secrets are not in GitHub by design (marker.rs:6-8); SessionView must carry non-secret fields only, and a test must assert no prompt/secret key appears in the serialized output.
  • Reuse, do not redefine. Use the existing parse_marker (goals/marker.rs), the labels.rs constants, and github_hub::service::has_next_page as the single sources of truth for the marker/label/pagination format. Do not re-implement any of them.
  • App token, not user proxy. Read via GithubAppTokens, never via github_hub's NyxID-proxy user-account path (fanout.rs:197) — that is the caller's own account and cannot see platform-wide private repos.
  • Never modify the kernel engine (fkst-substrate) or any upstream reference repo.
  • File-only spec, no implementation in this issue. Scope is exactly H2; the admin HTTP surface + live overlay (H3) and the install enumeration (H1) are separate issues.

Definition of Done

  • FKST_ADMIN_SCAN_ORGS parses fail-closed (trim, drop empties; unset/empty = all installations) with a doc comment and unit tests.
  • App issue-list transport (list_repo_issues) + GithubAppTokens::issues_for_repo read fkst-goal issues via the installation token, paginated (reusing has_next_page), with rate-limit handling, behind wiremock tests.
  • The global reader enumerates connected repos (scoped by FKST_ADMIN_SCAN_ORGS), parses marker + lifecycle labels into SessionView, and collects per-repo failures as RepoReadError without aborting; unparseable-marker issues are skipped + counted.
  • A test asserts no SessionView field carries the goal prompt or any secret/env value.
  • parse_marker, the labels.rs consts, and has_next_page are reused (not re-defined); no repo visibility change; no kernel-engine change.
  • Tests added/updated and green: cargo test -p fkst-control-plane passes; success AND failure/edge paths covered; coverage ≥ 80% for new code.
  • No Co-Authored-By (or any co-author trailer) in commits.
  • Commits are small, atomic, and buildable per commit (the tree compiles and tests pass at each).
  • gitleaks clean (no secrets committed).
  • PR targets develop (or develop-auto), links this issue with Closes #N, includes a changeset (npx changeset), and CI is green before auto-merge.

Metadata

Metadata

Labels

backendRust/Axum backend workengine-integrationEncapsulating/invoking the fkst-substrate enginepriority:P2Normal. Default priority.size:LLarge: cross-cutting, multiple modules. Size is informational.status:readyFully specced (size + priority + acceptance criteria). Ready to implement.type:featureNew user-facing capability.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions