feat: knowledge graph creation, GMA, extraction jobs, and maintenance pipeline#737
Merged
Conversation
* chore(skills): add subagent delivery execution protocol Add a reusable subagent skill that standardizes issue-based branching, TDD execution, PR structure, and merge/conflict handling into feature/manage-knowledge-graph. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(management): add knowledge graph workspace mode lifecycle Implement schema_bootstrap as the default workspace mode and persist irreversible transition state to extraction_operations across domain, repository, API responses, and migration coverage. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
…681) Add a workspace-status API projection with mode, readiness flags, transition eligibility, and session pointers, including service and route authorization coverage for manage workspace rendering. Co-authored-by: Cursor <cursoragent@cursor.com>
…#682) Enforce workspace readiness checks for minimum entity/relationship type coverage and prepopulated type instance presence, and project blocking reasons so validate/transition workflows can render actionable feedback. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose authorized validate and transition commands for knowledge graph workspaces, persist session pointers, and create an extraction-mode session identifier when moving from bootstrap to extraction operations. Co-authored-by: Cursor <cursoragent@cursor.com>
Add durable run-level mutation metadata storage and lifecycle persistence for session/scope identity, timestamps, token-cost totals, and operation-count summaries linked to each sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Emit operation-class counts and token/cost totals from mutation-log application results into MutationsApplied payloads so downstream sync lifecycle persistence can finalize run-level metadata. Co-authored-by: Cursor <cursoragent@cursor.com>
#686) Scaffold extraction application/presentation package structure and add pytest-archon rules enforcing DDD layer boundaries plus cross-context isolation so subsequent extraction features stay architecturally clean. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement per-user/per-knowledge-graph/per-mode extraction session lifecycle behaviors with clear-chat reset semantics and archived-session retention backed by repository ports and unit coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve mode-specific extraction skill templates from global defaults and apply deterministic knowledge-graph override merges so session prompts are stable, customizable, and repeatable. Co-authored-by: Cursor <cursoragent@cursor.com>
Persist clone-head, last-extraction baseline, and tracked-branch head commit references for data sources and expose them in management API responses for downstream ingestion and UI commit-status workflows. Co-authored-by: Cursor <cursoragent@cursor.com>
Prepare Git-backed ingestion context by loading data-source commit references, refreshing tracked branch head, and passing baseline commit plus resolved credentials into the ingestion pipeline before packaging begins. Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/api/ingestion/application/services/ingestion_service.py # src/api/ingestion/infrastructure/event_handler.py # src/api/ingestion/ports/services.py # src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py
Skip heavy extraction when tracked branch head equals the last extraction baseline by emitting a completed lifecycle event and recording an explicit no-change audit log entry on the sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose a data-source diff summary API that compares the last extraction baseline to tracked branch head and returns aggregate counts plus a large-list-safe changed-file preview for maintenance decisions. Co-authored-by: Cursor <cursoragent@cursor.com>
Show commit-based diff counts immediately on each data source card and render the changed-file list as collapsed-by-default with explicit expand/collapse controls for large-diff safe browsing. Co-authored-by: Cursor <cursoragent@cursor.com>
…695) Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning. Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery. Co-authored-by: Cursor <cursoragent@cursor.com>
) (#698) Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests. Co-authored-by: Cursor <cursoragent@cursor.com>
…679) (#699) Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration. Co-authored-by: Cursor <cursoragent@cursor.com>
#700) Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
…671) (#701) Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data. Co-authored-by: Cursor <cursoragent@cursor.com>
Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI. Co-authored-by: Cursor <cursoragent@cursor.com>
…704) Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract. Co-authored-by: Cursor <cursoragent@cursor.com>
Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation. Co-authored-by: Cursor <cursoragent@cursor.com>
…pace perms Use filter=data and pre-check tar members before extractall, validate session_id before building sticky-session directories, and stop granting world-writable fallback permissions on agent workspaces. Co-authored-by: Cursor <cursoragent@cursor.com>
Load active sessions through tenant and knowledge graph filters before appending mutation journal entries from workload mutation apply routes. Co-authored-by: Cursor <cursoragent@cursor.com>
…th boundary Reject tenant, knowledge graph, session, and job scope values unless they match a safe alphanumeric/underscore/hyphen pattern before downstream use. Co-authored-by: Cursor <cursoragent@cursor.com>
…in scopes Gate graph reads, mutation apply, and schema or job configuration changes on distinct workload scopes while keeping workload:chat as legacy full access for interactive agent runtimes. Co-authored-by: Cursor <cursoragent@cursor.com>
Pin alpine/git and curlimages/curl by digest, bind COMMIT_SHA through Tekton env instead of shell interpolation, and use netrc-backed curl for GitHub API calls instead of wget with tokens on the command line. Co-authored-by: Cursor <cursoragent@cursor.com>
Clear unused imports, harden maintenance launch rollback, validate pipeline_mode, fix workload credential imports, and drop legacy Docker socket mounts from compose.dev now that local dev is OpenShell-only. Co-authored-by: Cursor <cursoragent@cursor.com>
Collaborator
Author
Review response summaryMaster reply for CodeRabbit + JSELL security review triage on #737. Individual threads updated where we had a concrete fix or decision. Fixed on branch
CodeRabbit — addressed in
|
Format the API tree for ruff, resolve pytest-archon boundary violations via domain moves and composition-root hooks, and update unit tests for the injected baseline advancer and ontology authoring payload port. Co-authored-by: Cursor <cursoragent@cursor.com>
Align protocol signatures, fix Annotated Depends parameter ordering, tighten dict typing in workload routes, and add targeted mypy overrides for test fakes so the expanded PR surface passes type checking. Co-authored-by: Cursor <cursoragent@cursor.com>
CI runs pytest over the whole API tree; exclude example scanner templates via testpaths and replace deprecated nested pytest_plugins with explicit Management fixture imports in extraction integration conftest. Co-authored-by: Cursor <cursoragent@cursor.com>
Restore missing return in parse_desired_instances, export management db fixtures for extraction integration tests, wire canonical schema repo in workspace flow test, and enable vertex env in gcloud bind unit test. Co-authored-by: Cursor <cursoragent@cursor.com>
… tests Use explicit --input for scanner JSONL helpers so argparse accepts file paths after optional flags on CI Python builds, and wire canonical schema repository into extraction integration tests that call save_ontology. Co-authored-by: Cursor <cursoragent@cursor.com>
Match the argparse change for scanner JSONL helpers so CI accepts the input file path after relationship type positional arguments. Co-authored-by: Cursor <cursoragent@cursor.com>
Install OpenShell from GitHub release tarball in the API image instead of Fedora dnf, harden data-source host matching, and avoid leaking stack traces in agent-runtime streaming error responses. Co-authored-by: Cursor <cursoragent@cursor.com>
Tighten session retention spec, harden error handling and authz-adjacent code paths, improve dev-ui polling/routing, and resolve review feedback across extraction, management, and graph modules. Co-authored-by: Cursor <cursoragent@cursor.com>
OpenShell 0.0.62 encodes network enforcement per endpoint, not via a global CLI flag. Passing --enforcement caused extraction job sandboxes to fail policy application and run with the restrictive default. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
… stream Validate SyncStarted pipeline_mode at the aggregate boundary with proper Literal typing, and extract the NDJSON stream helper so turn failures log server-side only and return a generic client error payload. Co-authored-by: Cursor <cursoragent@cursor.com>
jsell-rh
previously approved these changes
Jun 23, 2026
jsell-rh
left a comment
Collaborator
There was a problem hiding this comment.
Security Re-Review — All Findings Addressed ✅
Re-reviewed all 15 security findings from the initial review against HEAD 4707ea3b. Every issue has been adequately fixed:
Authorization (AUTHZ-1 through AUTHZ-6)
- Signing key: Dev fallback now gated by
KARTOGRAPH_ENV ∈ {development, dev, local}; production raisesRuntimeError. Minimum 32-byte key length enforced vianormalize_workload_token_signing_key(). - Tenant isolation:
maintenance_baseline_fetcher.pySQL now filters ontenant_id; session journal usesget_active_by_id_for_scope()with tenant + KG scoping. - JWT claims: Tokens include
iss/aud; verified on decode withrequire_expandrequire_iat. - Scope model: Proper read/write/admin hierarchy replaces the flat
workload:chatgate. All scope-derived identifiers validated against^[0-9A-Za-z_.-]+$.
Sandbox (SANDBOX-1 through SANDBOX-4)
- Policy enforcement: Both
extraction-job.yamlandgma-sticky-base.yamlset tohard_requirement. - Tar extraction:
_safe_extract_tar()rejects members that escape the target directory and usesfilter="data". - Path traversal:
validate_session_id()rejects traversal characters; zip entries validated viavalidate_zip_entry_name(). - Permissions: No world-writable bits — only
S_IWUSR | S_IWGRP; symlinks skipped.
CI/CD (CI-1, CI-2)
- Images pinned by digest (
alpine/git:v2.47.2@sha256:…,curlimages/curl:8.12.1@sha256:…). COMMIT_SHApassed via env binding with hex-only validation.- Auth via netrc workspace; payload files created
chmod 600and cleaned up.
Kubernetes (K8S-1, K8S-2, K8S-3)
- Deploy manifests reverted to main (
d26f1ea9) — no longer in PR scope.
No remaining security concerns. Approving.
ON CONFLICT DO UPDATE referenced metadata_json but the database column is metadata, which broke integration tests after the upsert change. Co-authored-by: Cursor <cursoragent@cursor.com>
Add an explicit permissions block so CodeQL no longer flags missing workflow permissions on the Python test workflow. Co-authored-by: Cursor <cursoragent@cursor.com>
Log executor failures server-side and return generic client messages so CodeQL no longer flags stack trace exposure via the NDJSON stream. Co-authored-by: Cursor <cursoragent@cursor.com>
Collaborator
|
LGTM |
This was referenced Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end Knowledge Graph manage workspace: onboarding data sources, graph design via the Graph Management Assistant (GMA), extraction job orchestration, and automated maintenance — with OpenShell-backed agent runtimes and stage deploy wiring.
Management & data sources
Graph Management Assistant (GMA)
inference.local(credentials stay in OpenShell provider, not sandboxes)Extraction jobs
containerbackendMaintenance pipeline
ingest_onlysyncs → maintenance job materialization → shared extraction worker poolDeploy (stage / prod prep)
kartograph-api; API image bundles OpenShell CLIkartograph/stage/*including newextraction-runtimepath)KARTOGRAPH_EXTRACTION_RUNTIME_BACKEND=openshellFollow-ups outside this repo: sync
deploy/tohp-fleet-gitops, populate Vaultkartograph/stage/extraction-runtime(workload token signing key + Vertex ADC), build/pushkartograph-openshell-gatewayimage.Test plan
make test-unit(3473 passed; 5 pre-existing architecture-boundary failures on extraction/management imports)make test-integrationagainst isolated instancemake devwith OpenShell backend — GMA chat, extraction job run, maintenance trigger