[ontology] Add built-in catalog ontology views by mtabebe · Pull Request #36159 · MaterializeInc/materialize

mtabebe · 2026-04-20T13:49:04Z

Add four built-in views in mz_internal that describe the structure and relationships of the Materialize system catalog. These views are designed to help LLMs, diagnostic tools, and developers discover the right tables, join paths, and ID types when writing catalog queries.

Views:

mz_ontology_entity_types: what catalog objects exist and where
mz_ontology_semantic_types: typed ID domains (CatalogItemId, GlobalId, etc.)
mz_ontology_properties: column-level metadata with semantic types
mz_ontology_link_types: named relationships between entity types

The views are generated at startup from annotations on existing builtin definitions. Each builtin can carry an Ontology struct declaring its entity name, description, and FK relationships, plus per-column semantic type annotations via RelationDesc::with_semantic_type().

mtabebe · 2026-04-20T18:22:14Z

I know there are some test failures here, but I think it is worth getting some early feedback on the structure of this change.

@ggevay there are 4 commits here that might be easier to look at individually: (1) defines the interface for the type, (2) adds all the annotations, (3) adds a doc, (4) adds tests

ggevay

I wrote some comments, will continue tomorrow.

ggevay · 2026-04-28T15:46:56Z

@@ -3510,7 +3975,27 @@ FROM
 WHERE data->>'kind' = 'Role'",
        is_retained_metrics_object: false,
        access: vec![PUBLIC_SELECT],
-        ontology: None,
+        ontology: Some(Ontology {
+            entity_name: "role_member",


Maybe role_membership?

ggevay · 2026-04-28T15:48:35Z

+    ontology: Some(Ontology {
+        entity_name: "operator",
+        description: "A built-in SQL operator",
+        links: &[],


Could we add return_type_id as a link, like on mz_functions?

ggevay · 2026-04-28T15:53:56Z

+        description: "An array type with its element type",
+        links: &[
+            OntologyLink {
+                name: "is_subtype_of",


This is not is_subtype_of. I'm not sure what would be a good term here.

And the same issue for list and map.

I renamed to detail_of

ggevay · 2026-04-28T15:59:30Z

+                    name: "granted_by",
+                    target: "role",
+                    properties_json: r#"{"kind": "foreign_key", "source_column": "grantor", "target_column": "id", "cardinality": "many_to_one"}"#,
+                },


member_of_role and has_member here might have the same issue as object dependencies, mentioned above: It sounds like as if a role membership would have a has_member, when it's actually a role that has a member.

(granted_by might be ok, though.)

ggevay · 2026-04-28T16:01:03Z

+        entity_name: "role_parameter",
+        description: "A session parameter default set for a role",
+        links: &[OntologyLink {
+            name: "parameter_of",


Maybe something like default_parameter_setting_of?

Add the structural foundation for built-in ontology views that describe the Materialize system catalog. This includes: - `Ontology` and `OntologyLink` structs on builtin definitions - `semantic_types` field on `RelationDesc` with `with_semantic_type()` builder - View generation code in `ontology.rs` that produces 4 views: `mz_ontology_entity_types`, `mz_ontology_semantic_types`, `mz_ontology_properties`, `mz_ontology_link_types` - OID constants for the new views - `ontology: None` on all existing builtins (no annotations yet) The views are generated at startup by enumerating builtins that have `ontology: Some(...)` annotations. This commit only adds the infrastructure; annotations are added in the next commit.

Add ontology annotations to builtin catalog objects and semantic type annotations. This populates the ontology views introduced in the previous commit with: - Entity types: databases, schemas, roles, clusters, replicas, tables, sources, views, MVs, indexes, sinks, connections, secrets, types, functions, and ~100 internal/introspection objects - Semantic types: CatalogItemId, GlobalId, ClusterId, ReplicaId, etc. - Link types: relationships (owned_by, in_schema, runs_on_cluster, depends_on, details_of, etc.) - Column-level semantic type annotations via with_semantic_type()

Add documentation for the ontology module covering the four built-in views, their schema, link type properties and LLM usage guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…sistent - Replace SemanticType &'static str with a typed enum; update all call sites in builtin.rs and healthcheck.rs - Rename OntologyLink names to noun phrases throughout: dependent_object, referenced_object, transitively_dependent_object/referenced_object, detail_of, group_role, member_role, default_parameter_setting_of - Add missing FK links on source-table detail tables and mz_operators (returns_type), plus a test for ontology_consistency - Improve SEMANTIC_TYPE_DEFS descriptions with examples and clearer wording (MzTimestamp, ObjectType, ConnectionType, SourceType)

…ties Nineteen annotated builtin relations had empty links columns that point to other annotated entities

antiguru

Leaving some comments below. I think this is valuable to have, so mostly comments around the implementation.

When/how do we include more parts in the ontology? I think specifically the compute introspection could benefit from this.

antiguru · 2026-04-29T18:10:24Z

+use mz_repr::{RelationDesc, SemanticType, SqlScalarType};
+use mz_sql::catalog::NameReference;
+
+use super::{Builtin, BuiltinView, Ontology, PUBLIC_SELECT};


Use crate imports, not super.

antiguru · 2026-04-29T18:11:59Z

+struct Info<'a> {
+    table_name: &'static str,
+    schema_name: &'static str,
+    entity_name: String,
+    desc: &'a RelationDesc,
+    ontology: &'a Ontology,
+}


Please document struct and fields.

antiguru · 2026-04-29T18:12:29Z

+fn leak(v: BuiltinView) -> &'static BuiltinView {
+    Box::leak(Box::new(v))
+}


This'll show up in tooling to detect leaks. Can we use a non-leaking approach here to avoid alarms in the future?

Ok, I saw other uses of this in builtin, so thought this was the pattern, I'll fix

Yeah, there are. Personal opinion is to not add more :)

antiguru · 2026-04-29T18:16:30Z

+    let keys = desc.typ().keys.first()?;
+    let cols: Vec<_> = keys
+        .iter()
+        .map(|&i| format!("\"{}\"", desc.get_name(i)))


What about columns that contain double quotes?

antiguru · 2026-04-30T07:55:34Z

+    /// Optional semantic type annotations for columns.
+    /// Keyed by column index. Only populated for builtin catalog objects.
+    /// Excluded from Eq/Hash/serialization — it's ontology metadata, not schema.
+    #[serde(skip)]
+    semantic_types: BTreeMap<usize, SemanticType>,


Please do not add quirks around ignoring parts of the type for Eq, Hash, etc. From experience, this is a maintenance burden for our future selves.

Ack I was probably too clever for my own good

antiguru · 2026-04-30T07:59:46Z

+    /// Annotates the most recently added column with a semantic type.
+    ///
+    /// Possible values are enumerated in [`SemanticType`].
+    pub fn with_semantic_type(mut self, semantic_type: SemanticType) -> RelationDescBuilder {


I don't think this function is great as it is context-sensitive. Could we add this to a separate with_column_semantic_type?

I agree. A specific danger is that someone editing unrelated stuff near a call site might not realize that this has to be directly after a column that it refers to, and e.g. accidentally add a new column between a with_semantic_type and its with_column.

antiguru · 2026-04-30T08:02:39Z

+    ontology: Some(Ontology {
+        entity_name: "kafka_sink",
+        description: "Kafka-specific sink configuration (topic)",
+        links: &[OntologyLink {
+            name: "details_of",
+            target: "sink",
+            properties_json: r#"{"kind": "foreign_key", "source_column": "id", "target_column": "id", "cardinality": "one_to_one"}"#,
+        }],
+    }),


How do we ensure that the ontology doesn't get out-of-sync with the rest of the system?

I agree this is a challenge. I did add an SLT test and based on this feedback will add more unit tests.

The main invariants that I see are:

Column semantic types: enforced at compile time via the with_column_semantic_type API the columns need to exist (but this doesn't enforce anything about the types)

Link target entity names: test_ontology_consistency asserts every OntologyLink::target references a known annotated entity

source_column values in properties_json: currently only checked implicitly (the FK coverage test reads them). I'm adding add an explicit check that each source_column value names an actual column in the entity's RelationDesc, so renames are caught.

I would totally be open to more ideas on how to make this more robust.

antiguru · 2026-04-30T08:06:25Z

This is not the right place for this file. doc/developer/generated shouldn't be touched by users. I think this is either developer documentation (not in generated), or a design doc.

mtabebe · 2026-04-30T10:32:05Z

Leaving some comments below. I think this is valuable to have, so mostly comments around the implementation.

When/how do we include more parts in the ontology? I think specifically the compute introspection could benefit from this.

Yeah, I intentionally left compute introspection out of this for now, but my feeling is that we can add the same type of annotations to BuiltinLog and update the generate_views function to handle them. This work has sort of spiraled bigger then I was anticipating, so I think extending it to compute introspection should be the next thing (and properly planned)

ggevay

Wrote some more comments. Will continue after lunch.

ggevay · 2026-04-30T08:58:18Z

+        description: "Kafka source table-level details",
+        links: &[OntologyLink {
+            name: "describes_source_table",
+            target: "object",


Could this be target: "table",?

ggevay · 2026-04-30T10:05:12Z

+/// JSON object, e.g. `{"primary_key": ["id", "schema_id"]}`. Returns `None`
+/// if the relation has no keys defined.
+fn pk_json(desc: &RelationDesc) -> Option<String> {
+    let keys = desc.typ().keys.first()?;


Several builtins declare more than one key. If we take only the first key here, then which one this surfaces would depend on the declared key order. Is this intended?

Maybe we could make it do something like

{"primary_key": [...], "alternate_keys": [[...], ...]}

Also, a nit: If it's just one key, could you name the let keys just let key? (I know it's a Vec, so the s at the end feels natural, but it's still just one key at this point, so key is more accurate. We had tons of these key variable naming issues also in the optimizer, and at some point we made a conscious effort to fix all of them.)

Good point about the alternate keys, I'll implement that

ggevay · 2026-04-30T10:15:09Z

@@ -2323,18 +2469,37 @@ pub static MZ_COMPUTE_DEPENDENCIES: LazyLock<BuiltinSource> = LazyLock::new(|| B
    ]),
    is_retained_metrics_object: false,
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "compute_dependency",
+        description: "Dependency edges within compute dataflows",


Is it intended that edges is plural here? I think many (maybe most) other descriptions have analogous things in the singular.

ggevay · 2026-04-30T10:17:10Z

Typo: .id missing at the end.

ggevay · 2026-04-30T10:23:22Z

+            OntologyLink {
+                name: "dependent_compute_object",
+                target: "object",
+                properties_json: r#"{"source_id_type": "GlobalId", "requires_mapping": "mz_internal.mz_object_global_ids", "kind": "foreign_key", "source_column": "object_id", "target_column": "id", "cardinality": "many_to_one"}"#,


I'm wondering if it would be maybe better to lift this into a tyyped struct, something like

LinkProperties { kind: LinkKind, source_column: &'static str, target_column: &'static str, cardinality: Cardinality, source_id_type: Option<SemanticType>, requires_mapping: Option<&'static str>, nullable: bool }

I think it would make them more readable (there could be a helper function for common cases), and would also increase the chances of people getting them right when writing new ones. Although, it might also be over-engineering, I'm not sure.

That is a good idea

ggevay

Posting some more comments, but also hitting approve, because these are just minor things. Looks great overall!

ggevay · 2026-04-30T12:06:55Z

@@ -2690,30 +2987,48 @@ pub static MZ_SSH_TUNNEL_CONNECTIONS: LazyLock<BuiltinTable> = LazyLock::new(||
    ]),
    is_retained_metrics_object: false,
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "ssh_tunnel",
+        description: "SSH tunnel connection with public keys",


(comment from Claude)

entity_name: "ssh_tunnel", description: "SSH tunnel connection with public keys",

Compare its peers:

mz_kafka_connections → entity_name: "kafka_connection"

mz_aws_privatelink_connections → entity_name: "aws_privatelink" (also drops the _connection!)

mz_aws_connections → entity_name: "aws_connection"

So three different conventions for the four connection-detail tables: <x>_connection, <x> (dropping the suffix), and inconsistent ones in between. Pick one — almost certainly <x>_connection to match the <x>_source and <x>_source_table families. Specifically:

aws_privatelink should be aws_privatelink_connection

ssh_tunnel should be ssh_tunnel_connection

ggevay · 2026-04-30T12:08:58Z

+    /// Target entity name (e.g., "role", "schema").
+    pub target: &'static str,
+    /// JSON for the `properties` JSONB column (kind, source_column, target_column, etc.).
+    pub properties_json: &'static str,


I think I raised this somewhere else too, but it would be nice to have more structure around properties_json, or at least more documentation. E.g., requires_mapping doesn't seem like a trivial thing, but its meaning doesn't seem to be documented anywhere.

ggevay · 2026-04-30T12:12:52Z

+                    let key = "\"source_column\": \"";
+                    let start = json.find(key)? + key.len();
+                    let end = json[start..].find('"')? + start;
+                    Some(&json[start..end])


Could we just run it through a json parser instead of string matching? As it is, I worry that someone in the future will

write the json with slightly different whitespace, and then the test gives a false positive

write invalid json, but the test doesn't catch it if it's in some other part of the json.

Eliminated as part of the strong typing

ggevay · 2026-04-30T12:15:41Z

+    Some(format!("{{\"primary_key\": [{}]}}", cols.join(", ")))
+}
+
+// ── View builders ────────────────────────────────────────────


(comment from Claude)

The 4 view builders are 95% the same code

entity_types_view, properties_view, semantic_types_view, link_types_view all do:

infos.iter().map(|i| format!("(...)", esc(...), ...)).collect()

format!("SELECT * FROM (VALUES {}) AS t(col1, col2, ...)", vals.join(","))

wrap in view(name, oid, &cols, &keys, sql)

This is begging for a single helper:

fn values_view(name, oid, cols, keys, rows: impl Iterator<Item = Vec<SqlLiteral>>) -> BuiltinView

…with a SqlLiteral enum for typed values (Str, Bool, Json, Null). That removes every esc(...) call site and centralizes the "this is going inside '...'" decision in one place — which would also have prevented the pk_json "-escaping bug in one stroke.

ggevay · 2026-04-30T12:30:40Z

+#[derive(Clone, Hash, Debug, PartialEq, Eq)]
+pub struct OntologyLink {
+    /// Relationship name (e.g., "owned_by", "in_schema").
+    pub name: &'static str,


OntologyLink somehow seems to be a non-trivial concept, judging by how many times I had to correct my AI agent on this while reviewing the PR, and also how the PR's original version had some of these wrong (e.g., the dependency ones). I'm wondering if we could add more doc commenting here to make it clearer / more explicit what these mean.

It's surprisingly tricky. After some back-and-forth with my AI agent, we arrived at two possibilities:

A. Allow active verbs. This one encompasses all the existing examples in the PR's current state, except for session_on_cluster.

/// A foreign-key relationship from this catalog object to another ontology /// entity. /// /// **Contract.** For each row, the value in `properties_json.source_column` /// references a row of `target`'s primary table via /// `properties_json.target_column`. `name` is a label for this relationship /// and must be unique within an `Ontology`. /// /// **Direction.** A link always points *from* this row's `source_column` /// *to* the `target` entity's `target_column`. `name` is just a label for /// that one outgoing edge — it never reverses direction, regardless of how /// it reads in English. When in doubt, the columns define the direction; /// the name is descriptive only. /// /// **Naming.** Several name shapes are in use, each with its own natural /// reading: /// /// - **Noun role** (preferred when natural): /// `dependent_object` on `object_dependency`, `element_type` on /// `array_type`, `default_parameter_setting_of` on `role_parameter`. /// Read as: *"the `<target>` that is the `<name>` of this row"* — /// e.g. "the object that is the *dependent_object* of this dependency edge." /// /// - **Passive verb / prepositional**: /// `owned_by` on `database`, `in_schema` on `object`, `details_of` on /// `kafka_source`, `granted_by` on `role_membership`. /// Read as: *"the `<target>` this row is `<name>`"* — /// e.g. "the role this database is *owned_by*", "the schema this object /// is *in*". /// /// - **Active verb** (use sparingly, see caveat below): /// `depends_on`, `has_element_type`, `references_source`, `runs_on_cluster`. /// Read as: *"this row `<name>` the `<target>`"* — with the **row** as /// the verb's subject — e.g. "this array_type *has_element_type* a type", /// "this index *runs_on_cluster* a cluster." /// /// **Caveat about active verbs.** Active verbs admit more than one English /// reading (the row, the source-column's referent, or the target can each /// be read as the subject), and historically every direction bug in this /// module's review history has been on an active-verb name. The contract /// above pins direction regardless, but if a natural noun phrase or passive /// verb exists, prefer it. In particular, **do not** use an active verb on /// an *edge entity* (a row that itself represents a relationship — e.g. /// `mz_object_dependencies`, `mz_role_members`); the row is not an actor, /// so a verb-with-row-as-subject is a category error. Use noun-role /// endpoint names there (`dependent_object` / `referenced_object`, /// `member_role` / `group_role`).

B. Disallow active verbs. This would require slight changes in many of the current OntologyLinks.

/// A foreign-key relationship from this catalog object to another ontology /// entity. /// /// **Contract.** For each row, the value in `properties_json.source_column` /// references a row of `target`'s primary table via /// `properties_json.target_column`. `name` is a label for this relationship /// and must be unique within an `Ontology`. /// /// **Direction.** A link always points *from* this row's `source_column` /// *to* `target`'s `target_column`. The columns define direction; `name` /// is descriptive only and never reverses it. /// /// **Naming convention.** `name` denotes the role the `<target>` plays /// relative to this row. Pick `name` so the link reads as a noun phrase /// under: /// /// > *"the `<target>` that is the `<name>` of this row."* /// /// Three name shapes fit this frame and are the only ones permitted: /// /// - **Noun role** (preferred): `dependent_object` on `object_dependency` /// → "the object that is the *dependent_object* of this dependency edge." /// - **Passive verb**: `owned_by` on `database` /// → "the role this database is *owned by*." /// - **Prepositional**: `in_schema` on `object` /// → "the schema this object is *in*." /// /// **Active verbs are disallowed** (`depends_on`, `has_member`, `references`, /// `uses_X`, `has_X`, `returns_X`, `describes_X`, `runs_on_X`, …). An active /// verb has a subject, and the subject can be read as the row, the /// source-column's referent, or the target — an interpretive axis on top of /// direction that has produced every direction bug in this module's review /// history. Rewrite as a noun role: `has_element_type` → `element_type`, /// `returns_type` → `return_type`, `references_source` → `referenced_source`, /// `uses_connection` → `connection` (or `used_connection`), /// `runs_on_cluster` → `host_cluster`, `depends_on` → /// `dependent_object` / `referenced_object`, `describes_source_table` → /// `details_of`. /// /// **Edge entities.** For catalog objects whose rows represent a relationship /// between two other things (e.g. `mz_object_dependencies`, /// `mz_role_members`, `mz_compute_dependencies`), the rule is strictest: the /// row is not an actor, so a verb-with-row-as-subject (the row "depends on" /// something, "has" a member) is a category error regardless of direction. /// Use noun-role endpoint names — `dependent_object` / `referenced_object`, /// `member_role` / `group_role` — naming each link for the role its /// endpoint plays in the edge.

I'm not sure which one is better.

I will update the docs, but I don't really want to be restrictive on the verbs that we use.

My intuition is that some of this will be cleaned up by having strongly typed link properties, too

ggevay · 2026-04-30T13:50:44Z

+    /// Keyed by column index. Only populated for builtin catalog objects.
+    /// Excluded from Eq/Hash/serialization — it's ontology metadata, not schema.
+    #[serde(skip)]
+    semantic_types: BTreeMap<usize, SemanticType>,


Should this also use ColumnIndex, like the existing metadata?

ggevay

Some more comments

ggevay · 2026-04-30T14:00:27Z

+        description: "Recent query activity with execution stats",
+        links: &[
+            OntologyLink {
+                name: "session_on_cluster",


ran_on_cluster?

Also, could we add an OntologyLink to "session"?

ggevay · 2026-04-30T14:05:51Z

@@ -4285,6 +5068,15 @@ pub static MZ_SESSION_HISTORY: LazyLock<BuiltinSource> = LazyLock::new(|| Builti
    ]),
    is_retained_metrics_object: false,
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "session_history",


"session_history" feels a bit awkward here. How about changing it to simply "session", and changing the current "session" to "active_session"? And then anywhere where we have a link to either of these, we should actually have links to both of these (one nullable).

Ah sorry, I think I misunderstood what "nullable": true means. For a moment I though it means that if you do an outer join, then some rows will come back null. But actually, it probably means that the column is nullable on our side, right?

So, "foreign key" traditionally means that it's ok to do an inner join, you won't lose stuff. But sometimes it can be also interesting to point out a link where you might lose stuff with an inner join, so you need an outer join. Do we have/want a way to express those links as well?

Edit: Or am I misunderstanding this, and "nullable": true can mean that a non-null in our column might not find a match?

Btw. this also ties back to the problem mentioned elsewhere that properties_json is under-documented.

If it's just that the column is nullable, is there an automated test that checks this? Or even better, why not derive it automatically? We could do that if properties_json would be a structured thing with smart constructors.

Ah, on sink_status_history, it seems even non-null values on our side might not find a match on the other side!

But actually, it probably means that the column is nullable on our side, right?

Yes that is what it means. I agree it would be better to derive it automatically, feels like something I could do in the future.

I also agree with what you are saying adding more annotations around what the joins can see. I am kind of hesitant to add more though... so maybe we can defer?

…x RelationDesc Hash/Eq

ggevay

Sorry, I can't stop, lol

ggevay · 2026-04-30T14:24:03Z

@@ -4673,6 +5492,11 @@ pub static MZ_STATEMENT_LIFECYCLE_HISTORY: LazyLock<BuiltinSource> = LazyLock::n
            MONITOR_REDACTED_SELECT,
            MONITOR_SELECT,
        ],
+        ontology: Some(Ontology {
+            entity_name: "statement_lifecycle",


statement_lifecycle_event

ggevay · 2026-04-30T14:24:36Z

+        ontology: Some(Ontology {
+            entity_name: "statement_lifecycle",
+            description: "Statement lifecycle events (parse, bind, execute)",
+            links: &[],


Missing link to mz_recent_activity_log.execution_id.

ggevay · 2026-04-30T14:30:55Z

@@ -4891,6 +5727,22 @@ pub static MZ_SINK_STATUS_HISTORY: LazyLock<BuiltinSource> = LazyLock::new(|| Bu
    ]),
    is_retained_metrics_object: false,
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "sink_status_history",


"history" sounds like multiple events, but entity_name is supposed to describe one row of this relation, right? (Maybe this could be explicitly added to its doc comment.) So, maybe rename to sink_status_event?

And the corresponding OntologyLink status_history_of_sink has the same issue.

And the same for all the history relations:

sink_status_history

source_status_history

replica_status_history

wallclock_lag_history

I have updated... and added a note to the doc to clarify it names a single row

ggevay · 2026-04-30T14:35:32Z

@@ -4285,6 +5068,15 @@ pub static MZ_SESSION_HISTORY: LazyLock<BuiltinSource> = LazyLock::new(|| Builti
    ]),
    is_retained_metrics_object: false,
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "session_history",


Ah, on sink_status_history, it seems even non-null values on our side might not find a match on the other side!

ggevay

meta-comment: I found the following issues by asking Claude to correlate the Ontology::description field values with what we have in our existing docs e.g. in mz_internal.md, mz_catalog.md. We should look into unifying these, e.g. sourcing the descriptions in our docs from Ontology::description. This could be also a follow-up PR. (But the below issues need to be fixed here.)

ggevay · 2026-04-30T14:44:54Z

+        links: &[OntologyLink {
+            name: "status_of_replica",
+            target: "replica",
+            properties_json: r#"{"source_id_type": "CatalogItemId", "kind": "foreign_key", "source_column": "replica_id", "target_column": "id", "cardinality": "one_to_one"}"#,


It's not one-to-one, because a replica can have multiple processes, and we have a row here for each process. The other fields are also wrong.

ggevay · 2026-04-30T15:03:09Z

@@ -6195,6 +7403,15 @@ pub static MZ_OBJECT_LIFETIMES: LazyLock<BuiltinView> = LazyLock::new(|| Builtin
    FROM mz_catalog.mz_audit_events a
    WHERE a.event_type = 'create' OR a.event_type = 'drop'",
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "object_lifetime",
+        description: "Computed lifetime span (created_at to dropped_at) for objects",


There are no created_at and dropped_at columns. Maybe object_lifetime is switched/confused with object_history?

ggevay · 2026-04-30T15:07:31Z

@@ -5280,6 +6198,15 @@ pub static MZ_FRONTIERS: LazyLock<BuiltinSource> = LazyLock::new(|| BuiltinSourc
    ]),
    is_retained_metrics_object: false,
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "frontier",
+        description: "Current read/write frontiers per object (source)",


Why mention "source"? The old docs say

the frontiers of each source, sink, table, materialized view, index, and subscription

ggevay · 2026-04-30T15:08:30Z

            .with_column("redacted_sql", SqlScalarType::String.nullable(false))
            .with_key(vec![0, 1, 2])
            .finish(),
        column_comments: BTreeMap::new(),
        sql: "SELECT DISTINCT sql_hash, sql, redacted_sql FROM mz_internal.mz_sql_text WHERE prepared_day + INTERVAL '4 days' >= mz_now()",
        access: vec![MONITOR_SELECT],
+        ontology: Some(Ontology {
+            entity_name: "recent_sql_text",
+            description: "Recent SQL text (indexed, last 3 days)",


Claude:

"last 3 days" understates the actual retention

Verified in builtin.rs 4990–5013:

Ontology: "Recent SQL text (indexed, last 3 days)."

SQL: WHERE prepared_day + INTERVAL '4 days' >= mz_now()

Inline comment immediately above: "This should always be 1 day more than the interval in MZ_RECENT_THINNED_ACTIVITY_LOG, because prepared_day is rounded down to the nearest day. Thus something that actually happened three days ago could have a prepared_day anywhere from 3 to 4 days back."

So the actual retention is 3–4 days, with 4 days being the filter constant. Suggested: "(indexed, last ~3–4 days)", or just "recent". Mild but real — the description gives a tighter bound than the implementation guarantees.

ggevay · 2026-04-30T15:10:33Z

@@ -2323,18 +2469,37 @@ pub static MZ_COMPUTE_DEPENDENCIES: LazyLock<BuiltinSource> = LazyLock::new(|| B
    ]),
    is_retained_metrics_object: false,
    access: vec![PUBLIC_SELECT],
+    ontology: Some(Ontology {
+        entity_name: "compute_dependency",
+        description: "Dependency edges within compute dataflows",


Claude:

"within compute dataflows" mischaracterizes the rows:

one row = compute_object → input source edge, and mz_internal.md says the relation "describes the dependency structure between each compute object (index, materialized view, or subscription) and the sources of its data." "Within compute dataflows" reads as if the rows describe operator-to-operator edges inside one dataflow; they don't. Suggested: "A dependency edge from a compute object (index, materialized view, or subscription) to one of the sources of its data." (Combines naturally with the plural-vs-singular fix.)

ggevay · 2026-04-30T15:16:17Z

@@ -3733,6 +4395,22 @@ WHERE
    mz_internal.parse_catalog_create_sql(data->'value'->'definition'->'V1'->>'create_sql')->>'type' = 'secret'",
        is_retained_metrics_object: false,
        access: vec![PUBLIC_SELECT],
+        ontology: Some(Ontology {
+            entity_name: "secret",
+            description: "An encrypted secret value used by connections",


The secret entity description here says "An encrypted secret value used by connections", but secrets aren't only used by connections — webhook sources also reference secrets directly via CHECK ... WITH (SECRET ...) (see WebhookValidationSecret in src/sql/src/plan.rs). Suggest broadening to something like:

"A user-defined secret containing sensitive configuration (e.g., credentials)."

or, if you want to enumerate consumers:

"A user-defined secret containing sensitive configuration (e.g., credentials), referenced by connections and webhook sources."

(This is also wrong in the old docs.)

… enum Replace properties_json (raw JSON blob) with a typed LinkProperties enum that has 5 variants (ForeignKey, Union, MapsTo, DependsOn, Measures), each with documented fields and serde::Serialize so the JSONB output is identical to the old hand-written strings.

Introduce Lit enum (Str/Json/Null), values_sql(), and values_view() helpers in ontology.rs so all SQL literal escaping is centralized in Lit::render(). The three static view builders (entity_types, semantic_types, link_types) and the two inline VALUES lists in properties_view now use Lit instead of direct esc() calls at each site.

…xes)

…uction

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch 4 times, most recently from 1ecb51e to 2f07bba Compare April 20, 2026 18:20

mtabebe marked this pull request as ready for review April 20, 2026 18:22

mtabebe requested review from a team as code owners April 20, 2026 18:22

mtabebe requested a review from ggevay April 20, 2026 18:22

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch 2 times, most recently from ebd3c9a to 8a82efd Compare April 22, 2026 15:33

mtabebe requested a review from a team as a code owner April 22, 2026 15:33

mtabebe requested a review from aljoscha April 22, 2026 17:08

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch from 8a82efd to d7191b0 Compare April 22, 2026 17:17

ggevay reviewed Apr 28, 2026

View reviewed changes

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch from d7191b0 to ff78461 Compare April 28, 2026 18:39

antiguru self-requested a review April 29, 2026 15:22

mtabebe and others added 5 commits April 29, 2026 15:40

[ontology] add developer documentation for catalog ontology views

8e6ce3f

Add documentation for the ontology module covering the four built-in views, their schema, link type properties and LLM usage guide

[ontology] tests for ontology

a5d72b4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch 3 times, most recently from 5beda2b to 3a5fc04 Compare April 29, 2026 20:20

[ontology] fill in more missing OntologyLink entries for builtin enti…

ab6fccc

…ties Nineteen annotated builtin relations had empty links columns that point to other annotated entities

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch from 3a5fc04 to ab6fccc Compare April 29, 2026 21:24

antiguru reviewed Apr 30, 2026

View reviewed changes

ggevay reviewed Apr 30, 2026

View reviewed changes

ggevay approved these changes Apr 30, 2026

View reviewed changes

ggevay reviewed Apr 30, 2026

View reviewed changes

[ontology] rename with_semantic_type to with_column_semantic_type; fi…

5c909e6

…x RelationDesc Hash/Eq

ggevay reviewed Apr 30, 2026

View reviewed changes

mtabebe added 3 commits April 30, 2026 11:34

address ontology review feedback (entity names, descriptions, link fi…

5031254

…xes)

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch 5 times, most recently from 4567b72 to 2700db4 Compare April 30, 2026 21:31

Add semantic_types in proto roundtrip and builtin catalog item constr…

057397b

…uction

mtabebe force-pushed the ma/ontology/sql-built-in-v2 branch from 2700db4 to 057397b Compare April 30, 2026 23:38

Conversation

mtabebe commented Apr 20, 2026

Uh oh!

mtabebe commented Apr 20, 2026

Uh oh!

ggevay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggevay Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antiguru left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtabebe commented Apr 30, 2026

Uh oh!

ggevay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggevay left a comment

ggevay Apr 28, 2026 •

edited

Loading