[ontology] Add built-in catalog ontology views#36159
[ontology] Add built-in catalog ontology views#36159mtabebe wants to merge 11 commits intoMaterializeInc:mainfrom
Conversation
1ecb51e to
2f07bba
Compare
|
I know there are some test failures here, but I think it is worth getting some early feedback on the structure of this change. @ggevay there are 4 commits here that might be easier to look at individually: (1) defines the interface for the type, (2) adds all the annotations, (3) adds a doc, (4) adds tests |
ebd3c9a to
8a82efd
Compare
8a82efd to
d7191b0
Compare
ggevay
left a comment
There was a problem hiding this comment.
I wrote some comments, will continue tomorrow.
| @@ -3510,7 +3975,27 @@ FROM | |||
| WHERE data->>'kind' = 'Role'", | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: None, | |||
| ontology: Some(Ontology { | |||
| entity_name: "role_member", | |||
| ontology: Some(Ontology { | ||
| entity_name: "operator", | ||
| description: "A built-in SQL operator", | ||
| links: &[], |
There was a problem hiding this comment.
Could we add return_type_id as a link, like on mz_functions?
| description: "An array type with its element type", | ||
| links: &[ | ||
| OntologyLink { | ||
| name: "is_subtype_of", |
There was a problem hiding this comment.
This is not is_subtype_of. I'm not sure what would be a good term here.
There was a problem hiding this comment.
And the same issue for list and map.
There was a problem hiding this comment.
I renamed to detail_of
| name: "granted_by", | ||
| target: "role", | ||
| properties_json: r#"{"kind": "foreign_key", "source_column": "grantor", "target_column": "id", "cardinality": "many_to_one"}"#, | ||
| }, |
There was a problem hiding this comment.
member_of_role and has_member here might have the same issue as object dependencies, mentioned above: It sounds like as if a role membership would have a has_member, when it's actually a role that has a member.
(granted_by might be ok, though.)
| entity_name: "role_parameter", | ||
| description: "A session parameter default set for a role", | ||
| links: &[OntologyLink { | ||
| name: "parameter_of", |
There was a problem hiding this comment.
Maybe something like default_parameter_setting_of?
d7191b0 to
ff78461
Compare
Add the structural foundation for built-in ontology views that describe the Materialize system catalog. This includes: - `Ontology` and `OntologyLink` structs on builtin definitions - `semantic_types` field on `RelationDesc` with `with_semantic_type()` builder - View generation code in `ontology.rs` that produces 4 views: `mz_ontology_entity_types`, `mz_ontology_semantic_types`, `mz_ontology_properties`, `mz_ontology_link_types` - OID constants for the new views - `ontology: None` on all existing builtins (no annotations yet) The views are generated at startup by enumerating builtins that have `ontology: Some(...)` annotations. This commit only adds the infrastructure; annotations are added in the next commit.
Add ontology annotations to builtin catalog objects and semantic type annotations. This populates the ontology views introduced in the previous commit with: - Entity types: databases, schemas, roles, clusters, replicas, tables, sources, views, MVs, indexes, sinks, connections, secrets, types, functions, and ~100 internal/introspection objects - Semantic types: CatalogItemId, GlobalId, ClusterId, ReplicaId, etc. - Link types: relationships (owned_by, in_schema, runs_on_cluster, depends_on, details_of, etc.) - Column-level semantic type annotations via with_semantic_type()
Add documentation for the ontology module covering the four built-in views, their schema, link type properties and LLM usage guide
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sistent - Replace SemanticType &'static str with a typed enum; update all call sites in builtin.rs and healthcheck.rs - Rename OntologyLink names to noun phrases throughout: dependent_object, referenced_object, transitively_dependent_object/referenced_object, detail_of, group_role, member_role, default_parameter_setting_of - Add missing FK links on source-table detail tables and mz_operators (returns_type), plus a test for ontology_consistency - Improve SEMANTIC_TYPE_DEFS descriptions with examples and clearer wording (MzTimestamp, ObjectType, ConnectionType, SourceType)
5beda2b to
3a5fc04
Compare
…ties Nineteen annotated builtin relations had empty links columns that point to other annotated entities
3a5fc04 to
ab6fccc
Compare
antiguru
left a comment
There was a problem hiding this comment.
Leaving some comments below. I think this is valuable to have, so mostly comments around the implementation.
When/how do we include more parts in the ontology? I think specifically the compute introspection could benefit from this.
| use mz_repr::{RelationDesc, SemanticType, SqlScalarType}; | ||
| use mz_sql::catalog::NameReference; | ||
|
|
||
| use super::{Builtin, BuiltinView, Ontology, PUBLIC_SELECT}; |
| struct Info<'a> { | ||
| table_name: &'static str, | ||
| schema_name: &'static str, | ||
| entity_name: String, | ||
| desc: &'a RelationDesc, | ||
| ontology: &'a Ontology, | ||
| } |
There was a problem hiding this comment.
Please document struct and fields.
| fn leak(v: BuiltinView) -> &'static BuiltinView { | ||
| Box::leak(Box::new(v)) | ||
| } |
There was a problem hiding this comment.
This'll show up in tooling to detect leaks. Can we use a non-leaking approach here to avoid alarms in the future?
There was a problem hiding this comment.
Ok, I saw other uses of this in builtin, so thought this was the pattern, I'll fix
There was a problem hiding this comment.
Yeah, there are. Personal opinion is to not add more :)
| let keys = desc.typ().keys.first()?; | ||
| let cols: Vec<_> = keys | ||
| .iter() | ||
| .map(|&i| format!("\"{}\"", desc.get_name(i))) |
There was a problem hiding this comment.
What about columns that contain double quotes?
| /// Optional semantic type annotations for columns. | ||
| /// Keyed by column index. Only populated for builtin catalog objects. | ||
| /// Excluded from Eq/Hash/serialization — it's ontology metadata, not schema. | ||
| #[serde(skip)] | ||
| semantic_types: BTreeMap<usize, SemanticType>, |
There was a problem hiding this comment.
Please do not add quirks around ignoring parts of the type for Eq, Hash, etc. From experience, this is a maintenance burden for our future selves.
There was a problem hiding this comment.
Ack I was probably too clever for my own good
| /// Annotates the most recently added column with a semantic type. | ||
| /// | ||
| /// Possible values are enumerated in [`SemanticType`]. | ||
| pub fn with_semantic_type(mut self, semantic_type: SemanticType) -> RelationDescBuilder { |
There was a problem hiding this comment.
I don't think this function is great as it is context-sensitive. Could we add this to a separate with_column_semantic_type?
There was a problem hiding this comment.
I agree. A specific danger is that someone editing unrelated stuff near a call site might not realize that this has to be directly after a column that it refers to, and e.g. accidentally add a new column between a with_semantic_type and its with_column.
| ontology: Some(Ontology { | ||
| entity_name: "kafka_sink", | ||
| description: "Kafka-specific sink configuration (topic)", | ||
| links: &[OntologyLink { | ||
| name: "details_of", | ||
| target: "sink", | ||
| properties_json: r#"{"kind": "foreign_key", "source_column": "id", "target_column": "id", "cardinality": "one_to_one"}"#, | ||
| }], | ||
| }), |
There was a problem hiding this comment.
How do we ensure that the ontology doesn't get out-of-sync with the rest of the system?
There was a problem hiding this comment.
I agree this is a challenge. I did add an SLT test and based on this feedback will add more unit tests.
The main invariants that I see are:
- Column semantic types: enforced at compile time via the with_column_semantic_type API the columns need to exist (but this doesn't enforce anything about the types)
- Link target entity names: test_ontology_consistency asserts every OntologyLink::target references a known annotated entity
- source_column values in properties_json: currently only checked implicitly (the FK coverage test reads them). I'm adding add an explicit check that each source_column value names an actual column in the entity's RelationDesc, so renames are caught.
I would totally be open to more ideas on how to make this more robust.
There was a problem hiding this comment.
This is not the right place for this file. doc/developer/generated shouldn't be touched by users. I think this is either developer documentation (not in generated), or a design doc.
Yeah, I intentionally left compute introspection out of this for now, but my feeling is that we can add the same type of annotations to BuiltinLog and update the generate_views function to handle them. This work has sort of spiraled bigger then I was anticipating, so I think extending it to compute introspection should be the next thing (and properly planned) |
ggevay
left a comment
There was a problem hiding this comment.
Wrote some more comments. Will continue after lunch.
| description: "Kafka source table-level details", | ||
| links: &[OntologyLink { | ||
| name: "describes_source_table", | ||
| target: "object", |
There was a problem hiding this comment.
Could this be target: "table",?
| /// JSON object, e.g. `{"primary_key": ["id", "schema_id"]}`. Returns `None` | ||
| /// if the relation has no keys defined. | ||
| fn pk_json(desc: &RelationDesc) -> Option<String> { | ||
| let keys = desc.typ().keys.first()?; |
There was a problem hiding this comment.
Several builtins declare more than one key. If we take only the first key here, then which one this surfaces would depend on the declared key order. Is this intended?
Maybe we could make it do something like
{"primary_key": [...], "alternate_keys": [[...], ...]}
There was a problem hiding this comment.
Also, a nit: If it's just one key, could you name the let keys just let key? (I know it's a Vec, so the s at the end feels natural, but it's still just one key at this point, so key is more accurate. We had tons of these key variable naming issues also in the optimizer, and at some point we made a conscious effort to fix all of them.)
There was a problem hiding this comment.
Good point about the alternate keys, I'll implement that
| @@ -2323,18 +2469,37 @@ pub static MZ_COMPUTE_DEPENDENCIES: LazyLock<BuiltinSource> = LazyLock::new(|| B | |||
| ]), | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "compute_dependency", | |||
| description: "Dependency edges within compute dataflows", | |||
There was a problem hiding this comment.
Is it intended that edges is plural here? I think many (maybe most) other descriptions have analogous things in the singular.
There was a problem hiding this comment.
Typo: .id missing at the end.
| OntologyLink { | ||
| name: "dependent_compute_object", | ||
| target: "object", | ||
| properties_json: r#"{"source_id_type": "GlobalId", "requires_mapping": "mz_internal.mz_object_global_ids", "kind": "foreign_key", "source_column": "object_id", "target_column": "id", "cardinality": "many_to_one"}"#, |
There was a problem hiding this comment.
I'm wondering if it would be maybe better to lift this into a tyyped struct, something like
LinkProperties {
kind: LinkKind,
source_column: &'static str,
target_column: &'static str,
cardinality: Cardinality,
source_id_type: Option<SemanticType>,
requires_mapping: Option<&'static str>,
nullable: bool
}
I think it would make them more readable (there could be a helper function for common cases), and would also increase the chances of people getting them right when writing new ones. Although, it might also be over-engineering, I'm not sure.
There was a problem hiding this comment.
That is a good idea
ggevay
left a comment
There was a problem hiding this comment.
Posting some more comments, but also hitting approve, because these are just minor things. Looks great overall!
| @@ -2690,30 +2987,48 @@ pub static MZ_SSH_TUNNEL_CONNECTIONS: LazyLock<BuiltinTable> = LazyLock::new(|| | |||
| ]), | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "ssh_tunnel", | |||
| description: "SSH tunnel connection with public keys", | |||
There was a problem hiding this comment.
(comment from Claude)
entity_name: "ssh_tunnel",
description: "SSH tunnel connection with public keys",Compare its peers:
mz_kafka_connections→entity_name: "kafka_connection"mz_aws_privatelink_connections→entity_name: "aws_privatelink"(also drops the_connection!)mz_aws_connections→entity_name: "aws_connection"
So three different conventions for the four connection-detail tables: <x>_connection, <x> (dropping the suffix), and inconsistent ones in between. Pick one — almost certainly <x>_connection to match the <x>_source and <x>_source_table families. Specifically:
aws_privatelinkshould beaws_privatelink_connectionssh_tunnelshould bessh_tunnel_connection
| /// Target entity name (e.g., "role", "schema"). | ||
| pub target: &'static str, | ||
| /// JSON for the `properties` JSONB column (kind, source_column, target_column, etc.). | ||
| pub properties_json: &'static str, |
There was a problem hiding this comment.
I think I raised this somewhere else too, but it would be nice to have more structure around properties_json, or at least more documentation. E.g., requires_mapping doesn't seem like a trivial thing, but its meaning doesn't seem to be documented anywhere.
| let key = "\"source_column\": \""; | ||
| let start = json.find(key)? + key.len(); | ||
| let end = json[start..].find('"')? + start; | ||
| Some(&json[start..end]) |
There was a problem hiding this comment.
Could we just run it through a json parser instead of string matching? As it is, I worry that someone in the future will
- write the json with slightly different whitespace, and then the test gives a false positive
- write invalid json, but the test doesn't catch it if it's in some other part of the json.
There was a problem hiding this comment.
Eliminated as part of the strong typing
| Some(format!("{{\"primary_key\": [{}]}}", cols.join(", "))) | ||
| } | ||
|
|
||
| // ── View builders ──────────────────────────────────────────── |
There was a problem hiding this comment.
(comment from Claude)
The 4 view builders are 95% the same code
entity_types_view, properties_view, semantic_types_view, link_types_view all do:
infos.iter().map(|i| format!("(...)", esc(...), ...)).collect()format!("SELECT * FROM (VALUES {}) AS t(col1, col2, ...)", vals.join(","))- wrap in
view(name, oid, &cols, &keys, sql)
This is begging for a single helper:
fn values_view(name, oid, cols, keys, rows: impl Iterator<Item = Vec<SqlLiteral>>) -> BuiltinView…with a SqlLiteral enum for typed values (Str, Bool, Json, Null). That removes every esc(...) call site and centralizes the "this is going inside '...'" decision in one place — which would also have prevented the pk_json "-escaping bug in one stroke.
| #[derive(Clone, Hash, Debug, PartialEq, Eq)] | ||
| pub struct OntologyLink { | ||
| /// Relationship name (e.g., "owned_by", "in_schema"). | ||
| pub name: &'static str, |
There was a problem hiding this comment.
OntologyLink somehow seems to be a non-trivial concept, judging by how many times I had to correct my AI agent on this while reviewing the PR, and also how the PR's original version had some of these wrong (e.g., the dependency ones). I'm wondering if we could add more doc commenting here to make it clearer / more explicit what these mean.
There was a problem hiding this comment.
It's surprisingly tricky. After some back-and-forth with my AI agent, we arrived at two possibilities:
A. Allow active verbs. This one encompasses all the existing examples in the PR's current state, except for session_on_cluster.
/// A foreign-key relationship from this catalog object to another ontology
/// entity.
///
/// **Contract.** For each row, the value in `properties_json.source_column`
/// references a row of `target`'s primary table via
/// `properties_json.target_column`. `name` is a label for this relationship
/// and must be unique within an `Ontology`.
///
/// **Direction.** A link always points *from* this row's `source_column`
/// *to* the `target` entity's `target_column`. `name` is just a label for
/// that one outgoing edge — it never reverses direction, regardless of how
/// it reads in English. When in doubt, the columns define the direction;
/// the name is descriptive only.
///
/// **Naming.** Several name shapes are in use, each with its own natural
/// reading:
///
/// - **Noun role** (preferred when natural):
/// `dependent_object` on `object_dependency`, `element_type` on
/// `array_type`, `default_parameter_setting_of` on `role_parameter`.
/// Read as: *"the `<target>` that is the `<name>` of this row"* —
/// e.g. "the object that is the *dependent_object* of this dependency edge."
///
/// - **Passive verb / prepositional**:
/// `owned_by` on `database`, `in_schema` on `object`, `details_of` on
/// `kafka_source`, `granted_by` on `role_membership`.
/// Read as: *"the `<target>` this row is `<name>`"* —
/// e.g. "the role this database is *owned_by*", "the schema this object
/// is *in*".
///
/// - **Active verb** (use sparingly, see caveat below):
/// `depends_on`, `has_element_type`, `references_source`, `runs_on_cluster`.
/// Read as: *"this row `<name>` the `<target>`"* — with the **row** as
/// the verb's subject — e.g. "this array_type *has_element_type* a type",
/// "this index *runs_on_cluster* a cluster."
///
/// **Caveat about active verbs.** Active verbs admit more than one English
/// reading (the row, the source-column's referent, or the target can each
/// be read as the subject), and historically every direction bug in this
/// module's review history has been on an active-verb name. The contract
/// above pins direction regardless, but if a natural noun phrase or passive
/// verb exists, prefer it. In particular, **do not** use an active verb on
/// an *edge entity* (a row that itself represents a relationship — e.g.
/// `mz_object_dependencies`, `mz_role_members`); the row is not an actor,
/// so a verb-with-row-as-subject is a category error. Use noun-role
/// endpoint names there (`dependent_object` / `referenced_object`,
/// `member_role` / `group_role`).
B. Disallow active verbs. This would require slight changes in many of the current OntologyLinks.
/// A foreign-key relationship from this catalog object to another ontology
/// entity.
///
/// **Contract.** For each row, the value in `properties_json.source_column`
/// references a row of `target`'s primary table via
/// `properties_json.target_column`. `name` is a label for this relationship
/// and must be unique within an `Ontology`.
///
/// **Direction.** A link always points *from* this row's `source_column`
/// *to* `target`'s `target_column`. The columns define direction; `name`
/// is descriptive only and never reverses it.
///
/// **Naming convention.** `name` denotes the role the `<target>` plays
/// relative to this row. Pick `name` so the link reads as a noun phrase
/// under:
///
/// > *"the `<target>` that is the `<name>` of this row."*
///
/// Three name shapes fit this frame and are the only ones permitted:
///
/// - **Noun role** (preferred): `dependent_object` on `object_dependency`
/// → "the object that is the *dependent_object* of this dependency edge."
/// - **Passive verb**: `owned_by` on `database`
/// → "the role this database is *owned by*."
/// - **Prepositional**: `in_schema` on `object`
/// → "the schema this object is *in*."
///
/// **Active verbs are disallowed** (`depends_on`, `has_member`, `references`,
/// `uses_X`, `has_X`, `returns_X`, `describes_X`, `runs_on_X`, …). An active
/// verb has a subject, and the subject can be read as the row, the
/// source-column's referent, or the target — an interpretive axis on top of
/// direction that has produced every direction bug in this module's review
/// history. Rewrite as a noun role: `has_element_type` → `element_type`,
/// `returns_type` → `return_type`, `references_source` → `referenced_source`,
/// `uses_connection` → `connection` (or `used_connection`),
/// `runs_on_cluster` → `host_cluster`, `depends_on` →
/// `dependent_object` / `referenced_object`, `describes_source_table` →
/// `details_of`.
///
/// **Edge entities.** For catalog objects whose rows represent a relationship
/// between two other things (e.g. `mz_object_dependencies`,
/// `mz_role_members`, `mz_compute_dependencies`), the rule is strictest: the
/// row is not an actor, so a verb-with-row-as-subject (the row "depends on"
/// something, "has" a member) is a category error regardless of direction.
/// Use noun-role endpoint names — `dependent_object` / `referenced_object`,
/// `member_role` / `group_role` — naming each link for the role its
/// endpoint plays in the edge.
I'm not sure which one is better.
There was a problem hiding this comment.
I will update the docs, but I don't really want to be restrictive on the verbs that we use.
There was a problem hiding this comment.
My intuition is that some of this will be cleaned up by having strongly typed link properties, too
| /// Keyed by column index. Only populated for builtin catalog objects. | ||
| /// Excluded from Eq/Hash/serialization — it's ontology metadata, not schema. | ||
| #[serde(skip)] | ||
| semantic_types: BTreeMap<usize, SemanticType>, |
There was a problem hiding this comment.
Should this also use ColumnIndex, like the existing metadata?
| description: "Recent query activity with execution stats", | ||
| links: &[ | ||
| OntologyLink { | ||
| name: "session_on_cluster", |
There was a problem hiding this comment.
Also, could we add an OntologyLink to "session"?
| @@ -4285,6 +5068,15 @@ pub static MZ_SESSION_HISTORY: LazyLock<BuiltinSource> = LazyLock::new(|| Builti | |||
| ]), | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "session_history", | |||
There was a problem hiding this comment.
"session_history" feels a bit awkward here. How about changing it to simply "session", and changing the current "session" to "active_session"? And then anywhere where we have a link to either of these, we should actually have links to both of these (one nullable).
There was a problem hiding this comment.
Ah sorry, I think I misunderstood what "nullable": true means. For a moment I though it means that if you do an outer join, then some rows will come back null. But actually, it probably means that the column is nullable on our side, right?
There was a problem hiding this comment.
So, "foreign key" traditionally means that it's ok to do an inner join, you won't lose stuff. But sometimes it can be also interesting to point out a link where you might lose stuff with an inner join, so you need an outer join. Do we have/want a way to express those links as well?
Edit: Or am I misunderstanding this, and "nullable": true can mean that a non-null in our column might not find a match?
There was a problem hiding this comment.
Btw. this also ties back to the problem mentioned elsewhere that properties_json is under-documented.
There was a problem hiding this comment.
If it's just that the column is nullable, is there an automated test that checks this? Or even better, why not derive it automatically? We could do that if properties_json would be a structured thing with smart constructors.
There was a problem hiding this comment.
Ah, on sink_status_history, it seems even non-null values on our side might not find a match on the other side!
There was a problem hiding this comment.
But actually, it probably means that the column is nullable on our side, right?
Yes that is what it means. I agree it would be better to derive it automatically, feels like something I could do in the future.
I also agree with what you are saying adding more annotations around what the joins can see. I am kind of hesitant to add more though... so maybe we can defer?
…x RelationDesc Hash/Eq
| @@ -4673,6 +5492,11 @@ pub static MZ_STATEMENT_LIFECYCLE_HISTORY: LazyLock<BuiltinSource> = LazyLock::n | |||
| MONITOR_REDACTED_SELECT, | |||
| MONITOR_SELECT, | |||
| ], | |||
| ontology: Some(Ontology { | |||
| entity_name: "statement_lifecycle", | |||
| ontology: Some(Ontology { | ||
| entity_name: "statement_lifecycle", | ||
| description: "Statement lifecycle events (parse, bind, execute)", | ||
| links: &[], |
There was a problem hiding this comment.
Missing link to mz_recent_activity_log.execution_id.
| @@ -4891,6 +5727,22 @@ pub static MZ_SINK_STATUS_HISTORY: LazyLock<BuiltinSource> = LazyLock::new(|| Bu | |||
| ]), | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "sink_status_history", | |||
There was a problem hiding this comment.
"history" sounds like multiple events, but entity_name is supposed to describe one row of this relation, right? (Maybe this could be explicitly added to its doc comment.) So, maybe rename to sink_status_event?
And the corresponding OntologyLink status_history_of_sink has the same issue.
There was a problem hiding this comment.
And the same for all the history relations:
sink_status_historysource_status_historyreplica_status_historywallclock_lag_history
There was a problem hiding this comment.
I have updated... and added a note to the doc to clarify it names a single row
| @@ -4285,6 +5068,15 @@ pub static MZ_SESSION_HISTORY: LazyLock<BuiltinSource> = LazyLock::new(|| Builti | |||
| ]), | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "session_history", | |||
There was a problem hiding this comment.
Ah, on sink_status_history, it seems even non-null values on our side might not find a match on the other side!
ggevay
left a comment
There was a problem hiding this comment.
meta-comment: I found the following issues by asking Claude to correlate the Ontology::description field values with what we have in our existing docs e.g. in mz_internal.md, mz_catalog.md. We should look into unifying these, e.g. sourcing the descriptions in our docs from Ontology::description. This could be also a follow-up PR. (But the below issues need to be fixed here.)
| links: &[OntologyLink { | ||
| name: "status_of_replica", | ||
| target: "replica", | ||
| properties_json: r#"{"source_id_type": "CatalogItemId", "kind": "foreign_key", "source_column": "replica_id", "target_column": "id", "cardinality": "one_to_one"}"#, |
There was a problem hiding this comment.
It's not one-to-one, because a replica can have multiple processes, and we have a row here for each process. The other fields are also wrong.
| @@ -6195,6 +7403,15 @@ pub static MZ_OBJECT_LIFETIMES: LazyLock<BuiltinView> = LazyLock::new(|| Builtin | |||
| FROM mz_catalog.mz_audit_events a | |||
| WHERE a.event_type = 'create' OR a.event_type = 'drop'", | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "object_lifetime", | |||
| description: "Computed lifetime span (created_at to dropped_at) for objects", | |||
There was a problem hiding this comment.
There are no created_at and dropped_at columns. Maybe object_lifetime is switched/confused with object_history?
| @@ -5280,6 +6198,15 @@ pub static MZ_FRONTIERS: LazyLock<BuiltinSource> = LazyLock::new(|| BuiltinSourc | |||
| ]), | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "frontier", | |||
| description: "Current read/write frontiers per object (source)", | |||
There was a problem hiding this comment.
Why mention "source"? The old docs say
the frontiers of each source, sink, table, materialized view, index, and subscription
| .with_column("redacted_sql", SqlScalarType::String.nullable(false)) | ||
| .with_key(vec![0, 1, 2]) | ||
| .finish(), | ||
| column_comments: BTreeMap::new(), | ||
| sql: "SELECT DISTINCT sql_hash, sql, redacted_sql FROM mz_internal.mz_sql_text WHERE prepared_day + INTERVAL '4 days' >= mz_now()", | ||
| access: vec![MONITOR_SELECT], | ||
| ontology: Some(Ontology { | ||
| entity_name: "recent_sql_text", | ||
| description: "Recent SQL text (indexed, last 3 days)", |
There was a problem hiding this comment.
Claude:
"last 3 days" understates the actual retention
Verified in builtin.rs 4990–5013:
- Ontology: "Recent SQL text (indexed, last 3 days)."
- SQL:
WHERE prepared_day + INTERVAL '4 days' >= mz_now() - Inline comment immediately above: "This should always be 1 day more than the interval in
MZ_RECENT_THINNED_ACTIVITY_LOG, becauseprepared_dayis rounded down to the nearest day. Thus something that actually happened three days ago could have aprepared_dayanywhere from 3 to 4 days back."
So the actual retention is 3–4 days, with 4 days being the filter constant. Suggested: "(indexed, last ~3–4 days)", or just "recent". Mild but real — the description gives a tighter bound than the implementation guarantees.
| @@ -2323,18 +2469,37 @@ pub static MZ_COMPUTE_DEPENDENCIES: LazyLock<BuiltinSource> = LazyLock::new(|| B | |||
| ]), | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "compute_dependency", | |||
| description: "Dependency edges within compute dataflows", | |||
There was a problem hiding this comment.
Claude:
"within compute dataflows" mischaracterizes the rows:
one row = compute_object → input source edge, and mz_internal.md says the relation "describes the dependency structure between each compute object (index, materialized view, or subscription) and the sources of its data." "Within compute dataflows" reads as if the rows describe operator-to-operator edges inside one dataflow; they don't. Suggested: "A dependency edge from a compute object (index, materialized view, or subscription) to one of the sources of its data." (Combines naturally with the plural-vs-singular fix.)
| @@ -3733,6 +4395,22 @@ WHERE | |||
| mz_internal.parse_catalog_create_sql(data->'value'->'definition'->'V1'->>'create_sql')->>'type' = 'secret'", | |||
| is_retained_metrics_object: false, | |||
| access: vec![PUBLIC_SELECT], | |||
| ontology: Some(Ontology { | |||
| entity_name: "secret", | |||
| description: "An encrypted secret value used by connections", | |||
There was a problem hiding this comment.
The secret entity description here says "An encrypted secret value used by connections", but secrets aren't only used by connections — webhook sources also reference secrets directly via CHECK ... WITH (SECRET ...) (see WebhookValidationSecret in src/sql/src/plan.rs). Suggest broadening to something like:
"A user-defined secret containing sensitive configuration (e.g., credentials)."
or, if you want to enumerate consumers:
"A user-defined secret containing sensitive configuration (e.g., credentials), referenced by connections and webhook sources."
(This is also wrong in the old docs.)
… enum Replace properties_json (raw JSON blob) with a typed LinkProperties enum that has 5 variants (ForeignKey, Union, MapsTo, DependsOn, Measures), each with documented fields and serde::Serialize so the JSONB output is identical to the old hand-written strings.
Introduce Lit enum (Str/Json/Null), values_sql(), and values_view() helpers in ontology.rs so all SQL literal escaping is centralized in Lit::render(). The three static view builders (entity_types, semantic_types, link_types) and the two inline VALUES lists in properties_view now use Lit instead of direct esc() calls at each site.
4567b72 to
2700db4
Compare
2700db4 to
057397b
Compare
Add four built-in views in mz_internal that describe the structure and relationships of the Materialize system catalog. These views are designed to help LLMs, diagnostic tools, and developers discover the right tables, join paths, and ID types when writing catalog queries.
Views:
The views are generated at startup from annotations on existing builtin definitions. Each builtin can carry an
Ontologystruct declaring its entity name, description, and FK relationships, plus per-column semantic type annotations viaRelationDesc::with_semantic_type().