diff --git a/docs/lakehouse/catalogs/hive-catalog.mdx b/docs/lakehouse/catalogs/hive-catalog.mdx index 190dd3bce7c7d..deccda9908be6 100644 --- a/docs/lakehouse/catalogs/hive-catalog.mdx +++ b/docs/lakehouse/catalogs/hive-catalog.mdx @@ -76,6 +76,52 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( The CommonProperties section is for entering common attributes. Please see the "Common Properties" section in the [Catalog Overview](../catalog-overview.md). +## Metadata Cache (4.0.4+) {#meta-cache-404} + +Starting from Doris 4.0.4, Hive Catalog metadata caches are configured with the unified `meta.cache.*` properties. +This section focuses on **how to use** and **how to observe** the Hive-related cache modules. + +For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md). + +### Cache Modules {#meta-cache-404-modules} + +| Module | Property key prefix | Cached content (typical) | +|---|---|---| +| `partition-values` | `meta.cache.hive.partition-values.` | Partition values/names list used by partition pruning and partition enumeration. | +| `partition` | `meta.cache.hive.partition.` | Partition properties (location, input format, storage descriptor, etc.). | +| `file` | `meta.cache.hive.file.` | File listing under partition/table paths (reduces remote LIST overhead). | + +Example (disable file listing cache for freshness): + +```sql +ALTER CATALOG hive_ctl SET PROPERTIES ( + "meta.cache.hive.file.ttl-second" = "0" +); +``` + +### Observability {#meta-cache-404-observability} + +Hive cache metrics are available in `information_schema.catalog_meta_cache_statistics`. +For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md). + +The `cache_name` values for Hive modules are: + +| Module | cache_name | +|---|---| +| `partition-values` | `hive_partition_values_cache` | +| `partition` | `hive_partition_cache` | +| `file` | `hive_file_cache` | + +Example query (filter one catalog and Hive caches): + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'hive_ctl' + AND cache_name LIKE 'hive_%' +ORDER BY cache_name, metric_name; +``` + ### Supported Hive Versions Supports Hive 1.x, 2.x, 3.x, and 4.x. diff --git a/docs/lakehouse/catalogs/hudi-catalog.md b/docs/lakehouse/catalogs/hudi-catalog.md index 22b8f227ac30d..1dcd347cc566a 100644 --- a/docs/lakehouse/catalogs/hudi-catalog.md +++ b/docs/lakehouse/catalogs/hudi-catalog.md @@ -51,6 +51,52 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( | ------------------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- | | `hudi.use_hive_sync_partition` | `use_hive_sync_partition` | Whether to use the partition information already synchronized by Hive Metastore. If true, partition information will be obtained directly from Hive Metastore. Otherwise, it will be obtained from the metadata file of the file system. Obtaining information from Hive Metastore is more efficient, but users need to ensure that the latest metadata has been synchronized to Hive Metastore. | false | +## Metadata Cache (4.0.4+) {#meta-cache-404} + +Starting from Doris 4.0.4, Hudi-related metadata caches are configured with the unified `meta.cache.*` properties. +This section focuses on **how to use** and **how to observe** the Hudi cache modules. + +For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md). + +### Cache Modules {#meta-cache-404-modules} + +| Module | Property key prefix | Cached content (typical) | +|---|---|---| +| `partition` | `meta.cache.hudi.partition.` | Hudi partition-related metadata (used by partition discovery/pruning). | +| `fs-view` | `meta.cache.hudi.fs-view.` | Hudi filesystem view related metadata. | +| `meta-client` | `meta.cache.hudi.meta-client.` | Hudi meta client related metadata. | + +Example (reduce cache footprint by lowering capacity): + +```sql +ALTER CATALOG hudi_ctl SET PROPERTIES ( + "meta.cache.hudi.partition.capacity" = "2000" +); +``` + +### Observability {#meta-cache-404-observability} + +Hudi cache metrics are available in `information_schema.catalog_meta_cache_statistics`. +For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md). + +The `cache_name` values for Hudi modules are: + +| Module | cache_name | +|---|---| +| `partition` | `hudi_partition_cache` | +| `fs-view` | `hudi_fs_view_cache` | +| `meta-client` | `hudi_meta_client_cache` | + +Example query: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'hudi_ctl' + AND cache_name LIKE 'hudi_%' +ORDER BY cache_name, metric_name; +``` + ### Supported Hudi Versions The current dependent Hudi version is 0.15. It is recommended to access Hudi data version 0.14 and above. diff --git a/docs/lakehouse/catalogs/iceberg-catalog.mdx b/docs/lakehouse/catalogs/iceberg-catalog.mdx index f8c2b78d05181..f5329b4fe61d6 100644 --- a/docs/lakehouse/catalogs/iceberg-catalog.mdx +++ b/docs/lakehouse/catalogs/iceberg-catalog.mdx @@ -85,6 +85,50 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( The CommonProperties section is for entering general properties. See the [Catalog Overview](../catalog-overview.md) for details on common properties. +## Metadata Cache (4.0.4+) {#meta-cache-404} + +Starting from Doris 4.0.4, Iceberg Catalog metadata caches are configured with the unified `meta.cache.*` properties. +This section focuses on **how to use** and **how to observe** the Iceberg-related cache modules. + +For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md). + +### Cache Modules {#meta-cache-404-modules} + +| Module | Property key prefix | Cached content (typical) | +|---|---|---| +| `table` | `meta.cache.iceberg.table.` | Iceberg table metadata object (reduces catalog/metastore round trips). | +| `manifest` | `meta.cache.iceberg.manifest.` | Manifest-related metadata (reduces repeated manifest access overhead). | + +Example (shorter TTL for manifest to prioritize freshness): + +```sql +ALTER CATALOG iceberg_ctl SET PROPERTIES ( + "meta.cache.iceberg.manifest.ttl-second" = "600" +); +``` + +### Observability {#meta-cache-404-observability} + +Iceberg cache metrics are available in `information_schema.catalog_meta_cache_statistics`. +For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md). + +The `cache_name` values for Iceberg modules are: + +| Module | cache_name | +|---|---| +| `table` | `iceberg_table_cache` | +| `manifest` | `iceberg_manifest_cache` | + +Example query: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'iceberg_ctl' + AND cache_name LIKE 'iceberg_%' +ORDER BY cache_name, metric_name; +``` + ### Supported Iceberg Versions | Doris Version | Iceberg SDK Version | diff --git a/docs/lakehouse/catalogs/maxcompute-catalog.md b/docs/lakehouse/catalogs/maxcompute-catalog.md index 1bea22a762869..e44cafd703b0d 100644 --- a/docs/lakehouse/catalogs/maxcompute-catalog.md +++ b/docs/lakehouse/catalogs/maxcompute-catalog.md @@ -68,6 +68,49 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( The CommonProperties section is used to fill in common properties. Please refer to the "Common Properties" section in [Catalog Overview](../catalog-overview.md). +## Metadata Cache (4.0.4+) {#meta-cache-404} + +Starting from Doris 4.0.4, MaxCompute Catalog metadata caches are configured with the unified `meta.cache.*` properties. +This section focuses on **how to use** and **how to observe** the MaxCompute-related cache module. + +For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md). + +### Cache Modules {#meta-cache-404-modules} + +| Module | Property key prefix | Cached content (typical) | +|---|---|---| +| `partition-values` | `meta.cache.maxcompute.partition-values.` | Partition values list (reduces repeated remote listing overhead). | + +Example: + +```sql +ALTER CATALOG mc_ctl SET PROPERTIES ( + "meta.cache.maxcompute.partition-values.ttl-second" = "3600", + "meta.cache.maxcompute.partition-values.capacity" = "5000" +); +``` + +### Observability {#meta-cache-404-observability} + +MaxCompute cache metrics are available in `information_schema.catalog_meta_cache_statistics`. +For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md). + +The `cache_name` value for MaxCompute module is: + +| Module | cache_name | +|---|---| +| `partition-values` | `maxcompute_partition_values_cache` | + +Example query: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'mc_ctl' + AND cache_name LIKE 'maxcompute_%' +ORDER BY cache_name, metric_name; +``` + ### Supported MaxCompute Versions Only the public cloud version of MaxCompute is supported. For private cloud version support, please contact Doris community support. diff --git a/docs/lakehouse/catalogs/paimon-catalog.mdx b/docs/lakehouse/catalogs/paimon-catalog.mdx index 9a07d2ed16268..8d7fbab012fa2 100644 --- a/docs/lakehouse/catalogs/paimon-catalog.mdx +++ b/docs/lakehouse/catalogs/paimon-catalog.mdx @@ -88,6 +88,48 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( The CommonProperties section is used to fill in common properties. Please refer to the [Catalog Overview](../catalog-overview.md) section on [Common Properties]. +## Metadata Cache (4.0.4+) {#meta-cache-404} + +Starting from Doris 4.0.4, Paimon Catalog metadata caches are configured with the unified `meta.cache.*` properties. +This section focuses on **how to use** and **how to observe** the Paimon-related cache modules. + +For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md). + +### Cache Modules {#meta-cache-404-modules} + +| Module | Property key prefix | Cached content (typical) | +|---|---|---| +| `table` | `meta.cache.paimon.table.` | Paimon table metadata used for query planning (schema/snapshot/partition related metadata, depending on workload). | + +Example (disable module cache and always load on demand): + +```sql +ALTER CATALOG paimon_ctl SET PROPERTIES ( + "meta.cache.paimon.table.ttl-second" = "0" +); +``` + +### Observability {#meta-cache-404-observability} + +Paimon cache metrics are available in `information_schema.catalog_meta_cache_statistics`. +For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md). + +The `cache_name` value for Paimon module is: + +| Module | cache_name | +|---|---| +| `table` | `paimon_table_cache` | + +Example query: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'paimon_ctl' + AND cache_name LIKE 'paimon_%' +ORDER BY cache_name, metric_name; +``` + ### Supported Paimon Versions The currently dependent Paimon version is 1.0.0. diff --git a/docs/lakehouse/meta-cache.md b/docs/lakehouse/meta-cache.md index 16985562ea399..def91a4f57826 100644 --- a/docs/lakehouse/meta-cache.md +++ b/docs/lakehouse/meta-cache.md @@ -18,6 +18,11 @@ For **data cache**, refer to the [data cache documentation](./data-cache.md). This document applies to versions after 2.1.6. ::: +:::note +For Doris 4.0.4 and later, external meta cache has been refactored with unified configuration keys `meta.cache.*`. +See [Unified External Meta Cache (4.0.4+)](./meta-cache/unified-meta-cache.md). +::: + ## Cache Strategies Most caches have the following three strategy indicators: @@ -321,6 +326,12 @@ This section mainly introduces the cache behavior that users may be concerned ab For all types of External Catalogs, if you want to see the latest Table Schema in real time, you can disable the Schema Cache: +:::note +Starting from Doris 4.0.4, the legacy catalog-level cache property `schema.cache.ttl-second` is deprecated. +For 4.0.4+, keep using the FE config method below, and refer to: +[Unified External Meta Cache (4.0.4+)](./meta-cache/unified-meta-cache.md). +::: + - Disable globally ```text @@ -341,6 +352,13 @@ After setting, Doris will see the latest Table Schema in real time. However, thi For Hive Catalog, if you want to disable the cache to query real-time updated data, you can configure the following parameters: +:::note +Starting from Doris 4.0.4, the legacy catalog-level properties `file.meta.cache.ttl-second` and `partition.cache.ttl-second` +are deprecated. Use unified `meta.cache.hive.*` properties instead. See: +[Hive Catalog](./catalogs/hive-catalog.mdx#meta-cache-404) and +[Unified External Meta Cache (4.0.4+)](./meta-cache/unified-meta-cache.md). +::: + - Disable globally ```text @@ -363,4 +381,3 @@ After setting the above parameters: - Changes in partition data files can be queried in real time. But this will increase the access pressure on external data sources (such as Hive Metastore and HDFS), which may cause unstable metadata access latency and other phenomena. - diff --git a/docs/lakehouse/meta-cache/unified-meta-cache.md b/docs/lakehouse/meta-cache/unified-meta-cache.md new file mode 100644 index 0000000000000..32a7579f539f2 --- /dev/null +++ b/docs/lakehouse/meta-cache/unified-meta-cache.md @@ -0,0 +1,84 @@ +--- +{ + "title": "Unified External Meta Cache (4.0.4+)", + "language": "en", + "description": "User guide for unified external metadata cache: unified meta.cache.* properties, what is cached, and where to configure per catalog." +} +--- + +Starting from **Doris 4.0.4**, external metadata caching is unified for major External Catalog engines. As a user, you only need to know: + +| You want to know | Where in docs | +|---|---| +| Where to configure | Catalog `PROPERTIES` with `meta.cache.*` keys (see the catalog pages linked below). | +| What it affects | Depends on catalog engine (partitions, file listing, table metadata, manifests, etc.). | +| How to observe | `information_schema.catalog_meta_cache_statistics` (see the observability section below). | + +:::tip +Applies to Doris 4.0.4 and later. +::: + +## Unified Property Model + +All engine cache modules share the same property key pattern: + +`meta.cache...{enable,ttl-second,capacity}` + +The following table describes the property semantics: + +| Property | Example | Meaning | +|---|---|---| +| `enable` | `true/false` | Whether this cache module is enabled. | +| `ttl-second` | `600`, `0`, `-1` | `0` disables the module; `-1` means no expiration; otherwise expire after access by TTL. | +| `capacity` | `10000` | Max entry count (count-based). `0` disables the module. | + +Example (edit catalog properties): + +```sql +ALTER CATALOG hive_ctl SET PROPERTIES ( + "meta.cache.hive.file.ttl-second" = "0" +); +``` + +## What External Meta Cache Includes + +External meta cache covers different kinds of metadata. Some are configured by unified catalog properties, and some are controlled by FE configs: + +| Category | Examples | How to configure | +|---|---|---| +| Engine module caches | Hive partitions/files, Iceberg manifests, Paimon table metadata, etc. | Catalog `PROPERTIES`: `meta.cache...*` | +| Schema cache | Table schema, isolated by schema version token | FE configs (for example: `max_external_schema_cache_num`) | + +## Catalog-Specific Configuration (Links) + +For each catalog engine, the supported cache modules and the recommended properties are documented in its catalog page: + +| Catalog engine | Where to configure module caches | +|---|---| +| Hive | [Hive Catalog](../catalogs/hive-catalog.mdx#meta-cache-404) | +| Iceberg | [Iceberg Catalog](../catalogs/iceberg-catalog.mdx#meta-cache-404) | +| Paimon | [Paimon Catalog](../catalogs/paimon-catalog.mdx#meta-cache-404) | +| Hudi | [Hudi Catalog](../catalogs/hudi-catalog.md#meta-cache-404) | +| MaxCompute | [MaxCompute Catalog](../catalogs/maxcompute-catalog.md#meta-cache-404) | + +## Observability + +Use the system table to observe cache metrics: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +ORDER BY catalog_name, cache_name, metric_name; +``` + +This table is documented at: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md). + +Naming convention: + +| Field | Convention | +|---|---| +| `cache_name` | `__cache` (module `-` is converted to `_`) | + +## Migration Note (Legacy Properties) + +Starting from Doris 4.0.4, legacy catalog cache properties (for example, `schema.cache.ttl-second`, `file.meta.cache.ttl-second`) are deprecated. Use `meta.cache.*` properties instead and follow the catalog-specific pages above. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hive-catalog.mdx b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hive-catalog.mdx index 34ec561730e8d..06feca6dba542 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hive-catalog.mdx +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hive-catalog.mdx @@ -78,6 +78,51 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( CommonProperties 部分用于填写通用属性。请参阅[ 数据目录概述 ](../catalog-overview.md)中【通用属性】部分。 +## 元数据缓存(4.0.4+) {#meta-cache-404} + +从 Doris 4.0.4 开始,Hive Catalog 的外表元数据缓存使用统一键 `meta.cache.*` 进行配置。本节只介绍**如何使用**与**如何观测**。 + +统一属性语义可参阅:[统一外表元数据缓存(4.0.4+)](../meta-cache/unified-meta-cache.md)。 + +### 缓存模块 {#meta-cache-404-modules} + +| 模块 | 属性键前缀 | 典型缓存内容 | +|---|---|---| +| `partition-values` | `meta.cache.hive.partition-values.` | 分区值/分区名称列表(常用于分区剪枝与分区枚举)。 | +| `partition` | `meta.cache.hive.partition.` | 分区属性(location、输入格式、存储描述等)。 | +| `file` | `meta.cache.hive.file.` | 分区/表路径下的文件列表(减少远端 LIST 开销)。 | + +示例(为保证新鲜度,关闭文件列表缓存): + +```sql +ALTER CATALOG hive_ctl SET PROPERTIES ( + "meta.cache.hive.file.ttl-second" = "0" +); +``` + +### 可观测性 {#meta-cache-404-observability} + +Hive 缓存指标可通过 `information_schema.catalog_meta_cache_statistics` 查询。 +系统表字段与指标说明见:[catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md)。 + +Hive 各模块对应的 `cache_name` 如下: + +| 模块 | cache_name | +|---|---| +| `partition-values` | `hive_partition_values_cache` | +| `partition` | `hive_partition_cache` | +| `file` | `hive_file_cache` | + +示例(只看某个 catalog 的 Hive 缓存): + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'hive_ctl' + AND cache_name LIKE 'hive_%' +ORDER BY cache_name, metric_name; +``` + ### 支持的 Hive 版本 支持 Hive 1.x,2.x,3.x,4.x。 @@ -1101,4 +1146,3 @@ DROP DATABASE [IF EXISTS] hive_ctl.hive_db; | -------- | ------------------------------------ | | 2.1.6 | 支持 Hive 表数据写回 | | 3.0.4 | 支持 JsonSerDe 格式的 Hive 表。支持 Hive4 的事务表。 | - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hudi-catalog.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hudi-catalog.md index cb4ac7cc702bc..0e52dfcdca59b 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hudi-catalog.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/hudi-catalog.md @@ -51,6 +51,51 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( | ------------------------------- | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | | `hudi.use_hive_sync_partition` | `use_hive_sync_partition` | 是否使用 Hive Metastore 已同步的分区信息。如果为 true,则会直接从 Hive Metastore 中获取分区信息。否则,会从文件系统的元数据文件中获取分区信息。通过 Hive Metastore 获取信息性能更好,但需要用户保证最新的元数据已经同步到了 Hive Metastore。 | false | +## 元数据缓存(4.0.4+) {#meta-cache-404} + +从 Doris 4.0.4 开始,Hudi 相关外表元数据缓存使用统一键 `meta.cache.*` 进行配置。本节只介绍**如何使用**与**如何观测**。 + +统一属性语义可参阅:[统一外表元数据缓存(4.0.4+)](../meta-cache/unified-meta-cache.md)。 + +### 缓存模块 {#meta-cache-404-modules} + +| 模块 | 属性键前缀 | 典型缓存内容 | +|---|---|---| +| `partition` | `meta.cache.hudi.partition.` | Hudi 分区相关元数据(用于分区发现/剪枝等)。 | +| `fs-view` | `meta.cache.hudi.fs-view.` | Hudi FS View 相关元数据。 | +| `meta-client` | `meta.cache.hudi.meta-client.` | Hudi Meta Client 相关元数据。 | + +示例(通过降低 capacity 控制缓存规模): + +```sql +ALTER CATALOG hudi_ctl SET PROPERTIES ( + "meta.cache.hudi.partition.capacity" = "2000" +); +``` + +### 可观测性 {#meta-cache-404-observability} + +Hudi 缓存指标可通过 `information_schema.catalog_meta_cache_statistics` 查询。 +系统表字段与指标说明见:[catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md)。 + +Hudi 各模块对应的 `cache_name` 如下: + +| 模块 | cache_name | +|---|---| +| `partition` | `hudi_partition_cache` | +| `fs-view` | `hudi_fs_view_cache` | +| `meta-client` | `hudi_meta_client_cache` | + +示例: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'hudi_ctl' + AND cache_name LIKE 'hudi_%' +ORDER BY cache_name, metric_name; +``` + ### 支持的 Hudi 版本 当前依赖的 Hudi 版本为 0.15。推荐访问 0.14 版本以上的 Hudi 数据。 @@ -226,4 +271,3 @@ SELECT * from hudi_table@incr('beginTime'='xxx', ['endTime'='xxx'], ['hoodie.rea | Doris 版本 | 功能支持 | | ----------- | ----------------------------------------- | | 2.1.8/3.0.4 | Hudi 依赖升级到 0.15。新增 Hadoop Hudi JNI Scanner。 | - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx index b41d97363498f..26a9ece00470a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx @@ -87,6 +87,49 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( CommonProperties 部分用于填写通用属性。请参阅[数据目录概述](../catalog-overview.md)中【通用属性】部分。 +## 元数据缓存(4.0.4+) {#meta-cache-404} + +从 Doris 4.0.4 开始,Iceberg Catalog 的外表元数据缓存使用统一键 `meta.cache.*` 进行配置。本节只介绍**如何使用**与**如何观测**。 + +统一属性语义可参阅:[统一外表元数据缓存(4.0.4+)](../meta-cache/unified-meta-cache.md)。 + +### 缓存模块 {#meta-cache-404-modules} + +| 模块 | 属性键前缀 | 典型缓存内容 | +|---|---|---| +| `table` | `meta.cache.iceberg.table.` | Iceberg 表元数据对象(减少 catalog/metastore 往返)。 | +| `manifest` | `meta.cache.iceberg.manifest.` | manifest 相关元数据(减少重复读取 manifest 的开销)。 | + +示例(缩短 manifest TTL,优先新鲜度): + +```sql +ALTER CATALOG iceberg_ctl SET PROPERTIES ( + "meta.cache.iceberg.manifest.ttl-second" = "600" +); +``` + +### 可观测性 {#meta-cache-404-observability} + +Iceberg 缓存指标可通过 `information_schema.catalog_meta_cache_statistics` 查询。 +系统表字段与指标说明见:[catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md)。 + +Iceberg 各模块对应的 `cache_name` 如下: + +| 模块 | cache_name | +|---|---| +| `table` | `iceberg_table_cache` | +| `manifest` | `iceberg_manifest_cache` | + +示例: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'iceberg_ctl' + AND cache_name LIKE 'iceberg_%' +ORDER BY cache_name, metric_name; +``` + ### 支持的 Iceberg 版本 | Doris 版本 | Iceberg SDK 版本 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/maxcompute-catalog.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/maxcompute-catalog.md index 856633f99447e..91af178aaa838 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/maxcompute-catalog.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/maxcompute-catalog.md @@ -68,6 +68,48 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( CommonProperties 部分用于填写通用属性。请参阅[数据目录概述](../catalog-overview.md)中「通用属性」部分。 +## 元数据缓存(4.0.4+) {#meta-cache-404} + +从 Doris 4.0.4 开始,MaxCompute Catalog 的外表元数据缓存使用统一键 `meta.cache.*` 进行配置。本节只介绍**如何使用**与**如何观测**。 + +统一属性语义可参阅:[统一外表元数据缓存(4.0.4+)](../meta-cache/unified-meta-cache.md)。 + +### 缓存模块 {#meta-cache-404-modules} + +| 模块 | 属性键前缀 | 典型缓存内容 | +|---|---|---| +| `partition-values` | `meta.cache.maxcompute.partition-values.` | 分区值列表(减少重复的远端枚举开销)。 | + +示例: + +```sql +ALTER CATALOG mc_ctl SET PROPERTIES ( + "meta.cache.maxcompute.partition-values.ttl-second" = "3600", + "meta.cache.maxcompute.partition-values.capacity" = "5000" +); +``` + +### 可观测性 {#meta-cache-404-observability} + +MaxCompute 缓存指标可通过 `information_schema.catalog_meta_cache_statistics` 查询。 +系统表字段与指标说明见:[catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md)。 + +MaxCompute 模块对应的 `cache_name` 如下: + +| 模块 | cache_name | +|---|---| +| `partition-values` | `maxcompute_partition_values_cache` | + +示例: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'mc_ctl' + AND cache_name LIKE 'maxcompute_%' +ORDER BY cache_name, metric_name; +``` + ### 支持的 MaxCompute 版本 仅支持公有云版本的 MaxCompute。私有云版本支持请联系 Doris 社区支持。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.mdx b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.mdx index 3f209206b15f6..669bf0def98f6 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.mdx +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.mdx @@ -88,6 +88,47 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES ( CommonProperties 部分用于填写通用属性。请参阅[数据目录概述](../catalog-overview.md)中【通用属性】部分。 +## 元数据缓存(4.0.4+) {#meta-cache-404} + +从 Doris 4.0.4 开始,Paimon Catalog 的外表元数据缓存使用统一键 `meta.cache.*` 进行配置。本节只介绍**如何使用**与**如何观测**。 + +统一属性语义可参阅:[统一外表元数据缓存(4.0.4+)](../meta-cache/unified-meta-cache.md)。 + +### 缓存模块 {#meta-cache-404-modules} + +| 模块 | 属性键前缀 | 典型缓存内容 | +|---|---|---| +| `table` | `meta.cache.paimon.table.` | Paimon 表元数据(用于查询规划,实际涉及 schema/snapshot/partition 等元数据加载)。 | + +示例(关闭 module 缓存,按需实时加载): + +```sql +ALTER CATALOG paimon_ctl SET PROPERTIES ( + "meta.cache.paimon.table.ttl-second" = "0" +); +``` + +### 可观测性 {#meta-cache-404-observability} + +Paimon 缓存指标可通过 `information_schema.catalog_meta_cache_statistics` 查询。 +系统表字段与指标说明见:[catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md)。 + +Paimon 模块对应的 `cache_name` 如下: + +| 模块 | cache_name | +|---|---| +| `table` | `paimon_table_cache` | + +示例: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +WHERE catalog_name = 'paimon_ctl' + AND cache_name LIKE 'paimon_%' +ORDER BY cache_name, metric_name; +``` + ### 支持的 Paimon 版本 当前依赖的 Paimon 版本为 1.0.0。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/meta-cache.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/meta-cache.md index 1c35945043b6d..0cb9d3664ce65 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/meta-cache.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/meta-cache.md @@ -18,6 +18,11 @@ 该文档适用于 2.1.6 之后的版本。 ::: +:::note +对于 Doris 4.0.4 及之后版本,外表元数据缓存已重构并使用统一配置键 `meta.cache.*`。 +请参阅[统一外表元数据缓存(4.0.4+)](./meta-cache/unified-meta-cache.md)。 +::: + ## 缓存策略 大多数缓存都有如下三个策略指标: @@ -321,6 +326,12 @@ CREATE CATALOG hive PROPERTIES ( 对于所有类型的 External Catalog,如果希望实时可见最新的 Table Schema,可以关闭 Schema Cache: +:::note +从 Doris 4.0.4 开始,旧的 catalog 级缓存参数 `schema.cache.ttl-second` 已不再推荐使用。 +对于 4.0.4+,仍可使用下面的 FE 配置方式进行全局控制,并参考: +[统一外表元数据缓存(4.0.4+)](./meta-cache/unified-meta-cache.md)。 +::: + - 全局关闭 ```text @@ -341,6 +352,13 @@ CREATE CATALOG hive PROPERTIES ( 针对 Hive Catalog,如果想关闭缓存来查询到实时更新的数据,可以配置以下参数: +:::note +从 Doris 4.0.4 开始,旧的 catalog 级参数 `file.meta.cache.ttl-second` 和 `partition.cache.ttl-second` +已不再推荐使用。请改用统一键 `meta.cache.hive.*`,并参考: +[Hive Catalog](./catalogs/hive-catalog.mdx#meta-cache-404) 与 +[统一外表元数据缓存(4.0.4+)](./meta-cache/unified-meta-cache.md)。 +::: + - 全局关闭 ```text diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/meta-cache/unified-meta-cache.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/meta-cache/unified-meta-cache.md new file mode 100644 index 0000000000000..e47a53f213732 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/meta-cache/unified-meta-cache.md @@ -0,0 +1,85 @@ +--- +{ + "title": "统一外表元数据缓存(4.0.4+)", + "language": "zh-CN", + "description": "面向用户的统一外表元数据缓存使用说明:统一配置键 meta.cache.*、缓存覆盖范围、以及各类 Catalog 的配置入口。" +} +--- + +从 **Doris 4.0.4** 开始,External Catalog 的外表元数据缓存能力进行了统一化重构。对用户来说,主要关注三件事: + +| 你需要关心的问题 | 对应入口 | +|---|---| +| 在哪里配置 | 在 Catalog `PROPERTIES` 里使用统一键 `meta.cache.*`(具体 module 见下方各 catalog 文档)。 | +| 影响哪些内容 | 取决于不同 catalog 引擎(分区信息、文件列表、表元数据、manifest 等)。 | +| 如何观测 | 通过 `information_schema.catalog_meta_cache_statistics` 查看指标(见本文观测章节)。 | + +:::tip +适用于 Doris 4.0.4 及之后版本。 +::: + +## 统一属性模型 + +各引擎缓存 module 使用统一的配置键格式: + +`meta.cache...{enable,ttl-second,capacity}` + +下表说明属性语义: + +| 属性 | 示例 | 含义 | +|---|---|---| +| `enable` | `true/false` | 是否启用该缓存 module。 | +| `ttl-second` | `600`、`0`、`-1` | `0` 表示关闭;`-1` 表示永不过期;其他值表示按访问时间计算 TTL。 | +| `capacity` | `10000` | 最大缓存条目数(按条目数量计)。`0` 表示关闭。 | + +示例(修改 catalog properties): + +```sql +ALTER CATALOG hive_ctl SET PROPERTIES ( + "meta.cache.hive.file.ttl-second" = "0" +); +``` + +## 外表 Meta Cache 覆盖范围 + +外表元数据缓存覆盖多种元数据类型。其中一部分由统一 `meta.cache.*` 键配置,另一部分由 FE 配置控制: + +| 类别 | 示例 | 配置方式 | +|---|---|---| +| 引擎 module 缓存 | Hive 分区/文件、Iceberg manifest、Paimon 表元数据等 | Catalog `PROPERTIES`:`meta.cache...*` | +| Schema cache | 表 schema(按版本 token 隔离) | FE 配置(例如:`max_external_schema_cache_num`) | + +## 各类 Catalog 的配置入口(链接) + +不同 Catalog 引擎支持的缓存 module 不同,具体 module、推荐配置与可观测性请参考对应 Catalog 文档: + +| Catalog 引擎 | module 缓存配置与可观测性 | +|---|---| +| Hive | [Hive Catalog](../catalogs/hive-catalog.mdx#meta-cache-404) | +| Iceberg | [Iceberg Catalog](../catalogs/iceberg-catalog.mdx#meta-cache-404) | +| Paimon | [Paimon Catalog](../catalogs/paimon-catalog.mdx#meta-cache-404) | +| Hudi | [Hudi Catalog](../catalogs/hudi-catalog.md#meta-cache-404) | +| MaxCompute | [MaxCompute Catalog](../catalogs/maxcompute-catalog.md#meta-cache-404) | + +## 观测方式 + +通过系统表统一观测缓存指标: + +```sql +SELECT * +FROM information_schema.catalog_meta_cache_statistics +ORDER BY catalog_name, cache_name, metric_name; +``` + +该系统表文档见:[catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md)。 + +约定与常见指标: + +| 内容 | 说明 | +|---|---| +| `cache_name` | `__cache`(module 中的 `-` 会被替换为 `_`) | +| 常见指标 | `hit_ratio`、`hit_count`、`read_count`、`eviction_count`、`average_load_penalty`、`estimated_size` | + +## 旧参数迁移说明 + +从 Doris 4.0.4 开始,旧版 catalog cache 参数(例如 `schema.cache.ttl-second`、`file.meta.cache.ttl-second`)已不再推荐使用。请改用 `meta.cache.*` 统一键,并参考上文对应的 catalog 文档。 diff --git a/sidebars.ts b/sidebars.ts index 31f7044a02d8c..e3d8d39afffea 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -548,6 +548,7 @@ const sidebars: SidebarsConfig = { }, 'lakehouse/data-cache', 'lakehouse/meta-cache', + 'lakehouse/meta-cache/unified-meta-cache', 'lakehouse/compute-node', 'lakehouse/statistics', {