User Story
As a backend developer or AI engineer, I want the Elder data layer to use pgvector for embedding storage and a graph-aware relationship model so that discovery data can be queried semantically, relationships traversed efficiently, and the backend easily consumed by AI agents via MCP.
Background
Currently Elder stores infrastructure relationships (entity→networking→services→dependencies) in flat relational tables with explicit FK joins. This works for tabular queries but is:
- Inefficient for multi-hop relationship traversal (e.g., "which workloads share this VPC?")
- Incompatible with semantic/vector search
- Not AI-agent-friendly — no native MCP surface for LLM consumption
Proposed Changes
1. pgvector Extension
- Enable
pgvector in the PostgreSQL deployment (Helm + Kustomize)
- Add
embedding vector(1536) columns to key tables: entities, networking_resources, services, identities
- Store embeddings generated from resource metadata (name, tags, type, provider, region)
- Add
ivfflat or hnsw index per embedding column for ANN search
2. Graph Relationship Layer
- Add a
relationships table: (id, src_id, src_type, dst_id, dst_type, rel_type, weight, metadata jsonb)
- Populate from existing
network_entity_mappings and dependencies tables during migration
- Use recursive CTEs (or pgvector cosine distance + graph walk) for multi-hop traversal
- Expose graph queries via a
RelationshipService in apps/api/services/
3. Migration Path
- Alembic migration to add vector columns +
relationships table
- Backfill script to seed relationships from existing junction tables
- Keep existing tables intact (additive migration, no breaking changes)
- New
DB_ENABLE_VECTOR=true env var to gate pgvector usage (graceful fallback if extension absent)
4. AI Agent Readiness
RelationshipService.search_similar(embedding, k=10) — semantic nearest-neighbor lookup
RelationshipService.traverse(src_id, depth=2) — graph walk returning subgraph
- Both methods return structured JSON suitable for MCP tool responses
- Pairs with the dedicated MCP server (see companion issue)
Acceptance Criteria
Notes
- pgvector version: 0.7.x (supports HNSW)
- Embedding model: defer to caller (openai, local ollama, or WaddleAI) — service accepts pre-computed vectors
- Companion issue: MCP server for relationship/info lookups
- Reference: https://github.com/pgvector/pgvector
User Story
As a backend developer or AI engineer, I want the Elder data layer to use pgvector for embedding storage and a graph-aware relationship model so that discovery data can be queried semantically, relationships traversed efficiently, and the backend easily consumed by AI agents via MCP.
Background
Currently Elder stores infrastructure relationships (entity→networking→services→dependencies) in flat relational tables with explicit FK joins. This works for tabular queries but is:
Proposed Changes
1. pgvector Extension
pgvectorin the PostgreSQL deployment (Helm + Kustomize)embedding vector(1536)columns to key tables:entities,networking_resources,services,identitiesivfflatorhnswindex per embedding column for ANN search2. Graph Relationship Layer
relationshipstable:(id, src_id, src_type, dst_id, dst_type, rel_type, weight, metadata jsonb)network_entity_mappingsanddependenciestables during migrationRelationshipServiceinapps/api/services/3. Migration Path
relationshipstableDB_ENABLE_VECTOR=trueenv var to gate pgvector usage (graceful fallback if extension absent)4. AI Agent Readiness
RelationshipService.search_similar(embedding, k=10)— semantic nearest-neighbor lookupRelationshipService.traverse(src_id, depth=2)— graph walk returning subgraphAcceptance Criteria
relationshipstable created via Alembic migrationentities,networking_resources,services,identitiesivfflat/hnswindex created on each embedding columnRelationshipServiceimplemented withsearch_similar()andtraverse()methodsrelationshipsfrom existing junction tablesDB_ENABLE_VECTORenv var controls vector feature activationNotes