[docs][lakehouse] add Apache Ozone storage docs and catalog examples#3421
[docs][lakehouse] add Apache Ozone storage docs and catalog examples#3421xylaaaaa wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds documentation for using Apache Ozone (via S3 Gateway) as a Lakehouse storage option, and extends Lakehouse catalog docs with Ozone examples/properties (EN/ZH). It also updates the docs sidebar navigation.
Changes:
- Add new Ozone storage docs (EN + zh-CN) and link Ozone as a supported storage in Hive/Iceberg/Paimon catalog docs.
- Update Iceberg/Paimon/Hive catalog docs with additional capability notes, examples, and formatting improvements.
- Modify
sidebars.tsto include Ozone in Lakehouse Storages and restructure/expand multiple other sidebar sections.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 39 comments.
Show a summary per file
| File | Description |
|---|---|
| sidebars.ts | Adds Ozone to Lakehouse Storages, but also introduces many new sidebar entries and restructures catalog navigation. |
| docs/lakehouse/storages/ozone.md | New EN Ozone storage parameter/reference doc. |
| i18n/zh-CN/.../lakehouse/storages/ozone.md | New zh-CN Ozone storage parameter/reference doc. |
| docs/lakehouse/catalogs/hive-catalog.mdx | Adds Ozone support note/example and additional feature/parameter sections. |
| i18n/zh-CN/.../lakehouse/catalogs/hive-catalog.mdx | Same as EN, plus localized updates and an added Ozone example. |
| docs/lakehouse/catalogs/iceberg-catalog.mdx | Adds Ozone examples and expands Iceberg capabilities/actions docs (incl. JDBC references). |
| i18n/zh-CN/.../lakehouse/catalogs/iceberg-catalog.mdx | Same as EN, plus localized updates; JDBC examples added. |
| docs/lakehouse/catalogs/paimon-catalog.mdx | Adds Ozone support note/example and improves notes/formatting. |
| i18n/zh-CN/.../lakehouse/catalogs/paimon-catalog.mdx | Same as EN, plus localized updates and an Ozone example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| label: 'AI', | ||
| items: [ | ||
| 'ai/ai-overview', | ||
| 'ai/ai-function-overview', | ||
| { | ||
| type: 'category', | ||
| label: 'Text Search', | ||
| items: [ | ||
| 'ai/text-search/overview', | ||
| 'ai/text-search/search-operators', | ||
| 'ai/text-search/search-function', | ||
| 'ai/text-search/custom-analyzer', | ||
| 'ai/text-search/custom-normalizer', | ||
| 'ai/text-search/scoring', | ||
| ], | ||
| }, | ||
| { | ||
| type: 'category', | ||
| label: 'Vector Search', | ||
| items: [ | ||
| 'ai/vector-search/overview', | ||
| 'ai/vector-search/hnsw', | ||
| 'ai/vector-search/ivf', | ||
| 'ai/vector-search/index-management', | ||
| 'ai/vector-search/performance', | ||
| 'ai/vector-search/behind-index', |
There was a problem hiding this comment.
The sidebar adds ai/ai-overview and ai/vector-search/performance / ai/vector-search/behind-index, but these docs are not present under docs/ai/ or docs/ai/vector-search/. Please add the pages or remove the sidebar entries to avoid unknown doc IDs.
| > | ||
| > Doris 当前不支持带时区的 `Timestamp` 类型。所有 `timestamp` 和 `timestamptz` 会统一映射到 `datetime(N)` 类型上。但在读取和写入时,Doris 会根据实际源类型正确处理时区。如通过 `SET time_zone=<tz>` 指定时区后,会影响 `timestamptz` 列的读取和写入结果。 | ||
| > | ||
| > 可以在 `DESCRIBE table_name` 语句中的 Extra 列查看源类型是否带时区信息。如显示 `WITH_TIMEZONE`,则表示源类型是带时区的类型。(该功能自 3.1.0 版本支持)。 | ||
| > 可以在 `DESCRIBE table_name` 语句中的 Extra 列查看源类型是否带时区信息。如显示 `WITH_TIMEZONE`,则表示源类型是带时区的类型(该功能自 3.1.0 版本支持)。 | ||
| > | ||
| > 4.0.3 后开始支持,可以映射 `timestamptz (Timestamp with timezone)` 到 Doris 的 `timestamptz` 类型。 | ||
|
|
There was a problem hiding this comment.
这里的注释同时写了“Doris 当前不支持带时区的 Timestamp 类型,统一映射到 datetime(N)”以及“4.0.3 后开始支持映射到 Doris 的 timestamptz 类型”,两者表述存在冲突。建议明确说明:是否仅在开启 enable.mapping.timestamp_tz 时才映射到 timestamptz,并相应调整“不支持”这句话,避免读者误解。
| > 注: | ||
| > | ||
| > Doris 当前不支持带时区的 `Timestamp` 类型。所有 `timestamp_without_time_zone` 和 `timestamp_with_local_time_zone` 会统一映射到 `datetime(N)` 类型上。但在读取时,Doris 会根据实际源类型正确处理时区。如通过 `SET time_zone=<tz>` 指定时区后,会影响 `timestamp_with_local_time_zone` 列的返回结果。 | ||
| > | ||
| > 可以在 `DESCRIBE table_name` 语句中的 Extra 列查看源类型是否带时区信息。如显示 `WITH_TIMEZONE`,则表示源类型是带时区的类型。(该功能自 3.0.8 版本支持)。 | ||
| > 可以在 `DESCRIBE table_name` 语句中的 Extra 列查看源类型是否带时区信息。如显示 `WITH_TIMEZONE`,则表示源类型是带时区的类型(该功能自 3.0.8 版本支持)。 | ||
| > | ||
| > 4.0.3 后开始支持,可以映射 `timestamp_with_local_time_zone` 到 Doris 的 `timestamptz` 类型。 | ||
|
|
There was a problem hiding this comment.
这里的注释写了“Doris 当前不支持带时区的 Timestamp 类型,统一映射到 datetime(N)”但同时又新增了“4.0.3 后开始支持映射到 Doris 的 timestamptz 类型”。建议补充说明该映射是有条件的(例如 enable.mapping.timestamp_tz=true),并调整“不支持/统一映射”的表述,避免前后矛盾。
| 'lakehouse/best-practices/doris-lakekeeper', | ||
| 'lakehouse/best-practices/doris-nessie' |
There was a problem hiding this comment.
The Iceberg Catalog sidebar category includes lakehouse/best-practices/doris-lakekeeper and lakehouse/best-practices/doris-nessie, but these docs are not present under docs/lakehouse/best-practices/. Please add the missing pages or remove/update these items to avoid unknown doc IDs.
| 'lakehouse/best-practices/doris-lakekeeper', | |
| 'lakehouse/best-practices/doris-nessie' |
| 'sql-manual/basic-element/sql-data-types/date-time/DATE', | ||
| 'sql-manual/basic-element/sql-data-types/date-time/TIME', | ||
| 'sql-manual/basic-element/sql-data-types/date-time/DATETIME', | ||
| 'sql-manual/basic-element/sql-data-types/date-time/TIMESTAMPTZ', |
There was a problem hiding this comment.
The sidebar adds sql-manual/basic-element/sql-data-types/date-time/TIMESTAMPTZ, but there is no TIMESTAMPTZ.md in docs/sql-manual/basic-element/sql-data-types/date-time/ (only DATE/TIME/DATETIME exist). Please add the missing doc or remove the sidebar entry to prevent build failures.
| 'sql-manual/basic-element/sql-data-types/date-time/TIMESTAMPTZ', |
| 'sql-manual/sql-functions/table-valued-functions/local', | ||
| 'sql-manual/sql-functions/table-valued-functions/mv_infos', | ||
| 'sql-manual/sql-functions/table-valued-functions/numbers', | ||
| 'sql-manual/sql-functions/table-valued-functions/parquet-meta', |
There was a problem hiding this comment.
The sidebar adds the TVF doc sql-manual/sql-functions/table-valued-functions/parquet-meta, but there is no parquet-meta.md under docs/sql-manual/sql-functions/table-valued-functions/. Please add the missing doc or remove/update the entry.
| 'sql-manual/sql-functions/table-valued-functions/parquet-meta', |
| - **Drop partition key** | ||
|
|
||
| ```sql | ||
| ALTER TABLE table_name DROP PARTITION KEY partition_transform|key_name; |
There was a problem hiding this comment.
In the Partition Evolution example, ALTER TABLE table_name DROP PARTITION KEY partition_transform|key_name; uses a full-width vertical bar character (|). This can be hard to copy/paste and is not valid SQL syntax. Please replace it with a normal ASCII | (or rewrite the example to avoid using | for alternatives).
| ALTER TABLE table_name DROP PARTITION KEY partition_transform|key_name; | |
| ALTER TABLE table_name DROP PARTITION KEY partition_transform_or_key_name; |
| 'iceberg.jdbc.schema-version' = 'V1', | ||
| 'iceberg.jdbc.driver_class' = 'org.postgresql.Driver', | ||
| 'iceberg.jdbc.driver_url' = '<jdbc_driver_jar>' | ||
| 'warehouse' = 's3://bucket/warehouse', | ||
| 's3.access_key' = '<ak>', |
There was a problem hiding this comment.
Iceberg JDBC Catalog 示例里 iceberg.jdbc.driver_url 这一行后面缺少逗号,导致后面的 'warehouse' = ... 被拼接到同一条属性上,SQL 示例无法直接复制执行。请在 driver_url 行末补上逗号(PostgreSQL/MySQL/SQLite 三处都一样)。
| --- | ||
| { | ||
| "title": "Apache Ozone | Storages", | ||
| "language": "en", | ||
| "description": "Starting from version 4.0.4, Doris supports accessing Apache Ozone through the S3 Gateway.", | ||
| "sidebar_label": "Apache Ozone" | ||
| } |
There was a problem hiding this comment.
Front matter title is set to "Apache Ozone | Storages", but other storage docs use just the storage name as the title (e.g. docs/lakehouse/storages/s3.md, azure-blob.md). Consider aligning this to "Apache Ozone" (and rely on sidebar/category for context) to keep titles consistent across storage pages.
| 'ecosystem/observability/loongcollector', | ||
| 'ecosystem/observability/langfuse', | ||
| 'ecosystem/observability/vector', |
There was a problem hiding this comment.
The sidebar adds observability pages ecosystem/observability/loongcollector, langfuse, and vector, but these docs do not exist under docs/ecosystem/observability/. Please add the pages or remove the entries to avoid unknown doc IDs.
| 'ecosystem/observability/loongcollector', | |
| 'ecosystem/observability/langfuse', | |
| 'ecosystem/observability/vector', |
692842b to
e4d7325
Compare
Summary
Notes