Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions doc/developer/catalog-ontology.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place for this file. doc/developer/generated shouldn't be touched by users. I think this is either developer documentation (not in generated), or a design doc.

Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Catalog Ontology Views

Generates four built-in views in `mz_internal` that describe the structure and
relationships of the Materialize system catalog. Designed to help LLMs,
diagnostic tools, and developers discover the right tables, join paths, and ID
types when writing catalog queries.

## Views

| View | Columns | Purpose |
|---|---|---|
| `mz_internal.mz_ontology_entity_types` | `name, relation, properties, description` | What kinds of things exist. `properties` jsonb has `{"primary_key": ["id"]}`. |
| `mz_internal.mz_ontology_semantic_types` | `name, sql_type, description` | Typed ID domains and other semantic column types (CatalogItemId, GlobalId, ByteCount, etc.) |
| `mz_internal.mz_ontology_properties` | `entity_type, column_name, semantic_type, description` | Maps every column to its semantic type and describes what it means. |
| `mz_internal.mz_ontology_link_types` | `name, source_entity, target_entity, properties, description` | Named relationships between entity types. |

The views are generated at startup by `generate_views()`, which enumerates all
builtins that have `ontology: Some(...)` annotations and extracts metadata from
their `RelationDesc`, column comments, and semantic type annotations.

## How it works

1. **Entity types** — one row per builtin with an `Ontology` annotation. The
`relation` column is `schema.table_name`, `properties` contains primary key
info extracted from `RelationDesc::typ().keys`.

2. **Semantic types** — a static reference table of 20 ID/value domains
(e.g., `CatalogItemId`, `GlobalId`, `ReplicaId`, `ByteCount`).

3. **Properties** — one row per column per annotated entity. Joins against
`mz_columns` at runtime to discover column names and types. Semantic type
annotations come from `RelationDesc::get_semantic_type()`. Column
descriptions come from `mz_comments`.

4. **Link types** — one row per `OntologyLink` on each annotated entity.
The `properties` JSONB column contains structured relationship metadata
(kind, source_column, target_column, cardinality, source_id_type, etc.).

## Link type properties

The `properties` jsonb in `mz_ontology_link_types` uses a `"kind"` field:

- `"foreign_key"` — column-level join with `source_column`, `target_column`, `cardinality`
- `"measures"` — a measurement/metric relationship
- `"depends_on"` — a dependency relationship
- `"maps_to"` — an ID mapping (e.g., CatalogItemId to GlobalId)
- `"union"` — a UNION view includes another entity type

Common keys in the properties JSONB:

| Key | Description |
|---|---|
| `kind` | Relationship kind: `foreign_key`, `measures`, `depends_on`, `maps_to`, or `union`. |
| `source_column` | Column name on the source entity used for the join. |
| `target_column` | Column name on the target entity used for the join. |
| `cardinality` | Join cardinality: `many_to_one`, `one_to_one`, `many_to_many`. |
| `nullable` | `true` if the FK column can be NULL (optional relationship). |
| `source_id_type` | Semantic ID type of the source column (e.g., `CatalogItemId`, `GlobalId`). |
| `requires_mapping` | Mapping table needed to bridge ID namespaces (e.g., `mz_internal.mz_object_global_ids`). |
| `from_type` | Source semantic ID type for `maps_to` links (e.g., `CatalogItemId`). |
| `to_type` | Target semantic ID type for `maps_to` links (e.g., `GlobalId`). |
| `via` | Intermediate table or view used to perform a mapping or indirect join. |
| `metric` | Name of the metric or statistic measured by a `measures` link (e.g., `cpu_time_ns`, `materialization_lag`). |
| `discriminator_column` | Column on the `union` view that identifies the member type (e.g., `type`). |
| `discriminator_value` | Value in `discriminator_column` that selects the specific member entity. |
| `note` | Free-text clarification for unusual join semantics or caveats. |

## For LLMs

If connected to a Materialize instance, query these views **before** writing
catalog queries. They help find the right tables, correct join paths, and avoid
the GlobalId/CatalogItemId trap.

### Key queries

**Find all entities related to X:**
```sql
SELECT l.name, l.source_entity, l.target_entity,
l.properties->>'source_id_type' AS id_type
FROM mz_internal.mz_ontology_link_types l
WHERE l.source_entity = 'X' OR l.target_entity = 'X';
```

**Discover columns and types for entity Z:**
```sql
SELECT p.column_name, p.semantic_type, p.description
FROM mz_internal.mz_ontology_properties p
WHERE p.entity_type = 'Z'
ORDER BY p.column_name;
```

**Look up the actual table name for an entity:**
```sql
SELECT name, relation FROM mz_internal.mz_ontology_entity_types WHERE name = 'mv';
-- mv -> mz_catalog.mz_materialized_views
```

### GlobalId vs CatalogItemId

Many `object_id` columns in `mz_internal` and `mz_introspection` use
**GlobalId**, not **CatalogItemId**. Both are `text`, both look like `u42`,
but they are different ID namespaces. A direct join to `mz_objects.id`
(CatalogItemId) will silently return wrong results after ALTER operations.

Check `mz_ontology_properties.semantic_type` before writing joins. If the
types differ, bridge through `mz_internal.mz_object_global_ids`.

## Stats

- ~117 entity types (mz_catalog + mz_internal + mz_introspection)
- 20 semantic types
- ~450 column properties
- ~150 named relationships

## Related files

- `src/catalog/src/builtin.rs` — `Ontology` and `OntologyLink` struct definitions, per-builtin annotations
- `src/repr/src/relation.rs` — `semantic_types` field on `RelationDesc`
- `src/storage-client/src/healthcheck.rs` — semantic type annotations on status history tables
- `misc/ontology/` — SQL files for loading the same data as user-space tables
8 changes: 6 additions & 2 deletions doc/user/content/reference/system-catalog/mz_internal.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@ SQL objects that don't exist in the compute layer (such as views) are omitted.
<!-- RELATION_SPEC mz_internal.mz_compute_dependencies -->
| Field | Type | Meaning |
| ----------- | -------- | -------- |
| `object_id` | [`text`] | The ID of a compute object. Corresponds to [`mz_catalog.mz_indexes.id`](../mz_catalog#mz_indexes), [`mz_catalog.mz_materialized_views.id`](../mz_catalog#mz_materialized_views), or [`mz_internal.mz_subscriptions`](#mz_subscriptions). |
| `object_id` | [`text`] | The ID of a compute object. Corresponds to [`mz_catalog.mz_indexes.id`](../mz_catalog#mz_indexes), [`mz_catalog.mz_materialized_views.id`](../mz_catalog#mz_materialized_views), or [`mz_internal.mz_subscriptions.id`](#mz_subscriptions). |
| `dependency_id` | [`text`] | The ID of a compute dependency. Corresponds to [`mz_catalog.mz_indexes.id`](../mz_catalog#mz_indexes), [`mz_catalog.mz_materialized_views.id`](../mz_catalog#mz_materialized_views), [`mz_catalog.mz_sources.id`](../mz_catalog#mz_sources), or [`mz_catalog.mz_tables.id`](../mz_catalog#mz_tables). |

## `mz_compute_hydration_statuses`
Expand Down Expand Up @@ -658,6 +658,10 @@ system. The view can be accessed by Materialize _superusers_.
| `object_id` | [`text`] | The ID of the materialized view or index. Corresponds to [`mz_objects.id`](../mz_catalog/#mz_objects). For global notices, this column is `NULL`. |
| `created_at` | [`timestamp with time zone`] | The time at which the notice was created. Note that some notices are re-created on `environmentd` restart. |

<!-- RELATION_SPEC_UNDOCUMENTED mz_internal.mz_ontology_entity_types -->
<!-- RELATION_SPEC_UNDOCUMENTED mz_internal.mz_ontology_link_types -->
<!-- RELATION_SPEC_UNDOCUMENTED mz_internal.mz_ontology_properties -->
<!-- RELATION_SPEC_UNDOCUMENTED mz_internal.mz_ontology_semantic_types -->
<!-- RELATION_SPEC_UNDOCUMENTED mz_internal.mz_optimizer_notices -->

## `mz_notices_redacted`
Expand Down Expand Up @@ -835,7 +839,7 @@ in the system.
| Field | Type | Meaning |
| -----------------| ----------------------| -------- |
| `name` | [`text`] | The name of the network policy rule. Can be combined with `policy_id` to form a unique identifier. |
| `policy_id` | [`text`] | The ID the network policy the rule is part of. Corresponds to [`mz_network_policy_rules.id`](#mz_network_policy_rules). |
| `policy_id` | [`text`] | The ID the network policy the rule is part of. Corresponds to [`mz_internal.mz_network_policies.id`](#mz_network_policies). |
| `action` | [`text`] | The action of the rule. `allow` is the only supported action. |
| `address` | [`text`] | The address the rule will take action on. |
| `direction` | [`text`] | The direction of traffic the rule applies to. `ingress` is the only supported direction. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ The `mz_compute_exports` view describes the objects exported by [dataflows][data
<!-- RELATION_SPEC mz_introspection.mz_compute_exports -->
| Field | Type | Meaning |
| -------------- |-----------| -------- |
| `export_id` | [`text`] | The ID of the index, materialized view, or subscription exported by the dataflow. Corresponds to [`mz_catalog.mz_indexes.id`](../mz_catalog#mz_indexes), [`mz_catalog.mz_materialized_views.id`](../mz_catalog#mz_materialized_views), or [`mz_internal.mz_subscriptions`](../mz_internal#mz_subscriptions). |
| `export_id` | [`text`] | The ID of the index, materialized view, or subscription exported by the dataflow. Corresponds to [`mz_catalog.mz_indexes.id`](../mz_catalog#mz_indexes), [`mz_catalog.mz_materialized_views.id`](../mz_catalog#mz_materialized_views), or [`mz_internal.mz_subscriptions.id`](../mz_internal#mz_subscriptions). |
| `dataflow_id` | [`uint8`] | The ID of the dataflow. Corresponds to [`mz_dataflows.id`](#mz_dataflows). |

<!-- RELATION_SPEC_UNDOCUMENTED mz_introspection.mz_compute_exports_per_worker -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,7 @@ fn make_builtin_table(name: String) -> (SystemObjectDescription, &'static Builti
column_comments: BTreeMap::new(),
is_retained_metrics_object: false,
access: Vec::new(),
ontology: None,
};
let builtin = leak(Builtin::Table(leak(builtin)));

Expand All @@ -309,6 +310,7 @@ fn make_builtin_source(name: String) -> (SystemObjectDescription, &'static Built
column_comments: BTreeMap::new(),
is_retained_metrics_object: false,
access: Vec::new(),
ontology: None,
};
let builtin = leak(Builtin::Source(leak(builtin)));

Expand Down
Loading
Loading