Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docs/lakehouse/catalogs/hive-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,52 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

The CommonProperties section is for entering common attributes. Please see the "Common Properties" section in the [Catalog Overview](../catalog-overview.md).

## Metadata Cache (4.0.4+) {#meta-cache-404}

Starting from Doris 4.0.4, Hive Catalog metadata caches are configured with the unified `meta.cache.*` properties.
This section focuses on **how to use** and **how to observe** the Hive-related cache modules.

For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md).

### Cache Modules {#meta-cache-404-modules}

| Module | Property key prefix | Cached content (typical) |
|---|---|---|
| `partition-values` | `meta.cache.hive.partition-values.` | Partition values/names list used by partition pruning and partition enumeration. |
| `partition` | `meta.cache.hive.partition.` | Partition properties (location, input format, storage descriptor, etc.). |
| `file` | `meta.cache.hive.file.` | File listing under partition/table paths (reduces remote LIST overhead). |

Example (disable file listing cache for freshness):

```sql
ALTER CATALOG hive_ctl SET PROPERTIES (
"meta.cache.hive.file.ttl-second" = "0"
);
```

### Observability {#meta-cache-404-observability}

Hive cache metrics are available in `information_schema.catalog_meta_cache_statistics`.
For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

The `cache_name` values for Hive modules are:

| Module | cache_name |
|---|---|
| `partition-values` | `hive_partition_values_cache` |
| `partition` | `hive_partition_cache` |
| `file` | `hive_file_cache` |

Example query (filter one catalog and Hive caches):

```sql
SELECT *
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'hive_ctl'
AND cache_name LIKE 'hive_%'
ORDER BY cache_name, metric_name;
```

### Supported Hive Versions

Supports Hive 1.x, 2.x, 3.x, and 4.x.
Expand Down
46 changes: 46 additions & 0 deletions docs/lakehouse/catalogs/hudi-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,52 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
| ------------------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- |
| `hudi.use_hive_sync_partition` | `use_hive_sync_partition` | Whether to use the partition information already synchronized by Hive Metastore. If true, partition information will be obtained directly from Hive Metastore. Otherwise, it will be obtained from the metadata file of the file system. Obtaining information from Hive Metastore is more efficient, but users need to ensure that the latest metadata has been synchronized to Hive Metastore. | false |

## Metadata Cache (4.0.4+) {#meta-cache-404}

Starting from Doris 4.0.4, Hudi-related metadata caches are configured with the unified `meta.cache.*` properties.
This section focuses on **how to use** and **how to observe** the Hudi cache modules.

For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md).

### Cache Modules {#meta-cache-404-modules}

| Module | Property key prefix | Cached content (typical) |
|---|---|---|
| `partition` | `meta.cache.hudi.partition.` | Hudi partition-related metadata (used by partition discovery/pruning). |
| `fs-view` | `meta.cache.hudi.fs-view.` | Hudi filesystem view related metadata. |
| `meta-client` | `meta.cache.hudi.meta-client.` | Hudi meta client related metadata. |

Example (reduce cache footprint by lowering capacity):

```sql
ALTER CATALOG hudi_ctl SET PROPERTIES (
"meta.cache.hudi.partition.capacity" = "2000"
);
```

### Observability {#meta-cache-404-observability}

Hudi cache metrics are available in `information_schema.catalog_meta_cache_statistics`.
For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

The `cache_name` values for Hudi modules are:

| Module | cache_name |
|---|---|
| `partition` | `hudi_partition_cache` |
| `fs-view` | `hudi_fs_view_cache` |
| `meta-client` | `hudi_meta_client_cache` |

Example query:

```sql
SELECT *
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'hudi_ctl'
AND cache_name LIKE 'hudi_%'
ORDER BY cache_name, metric_name;
```

### Supported Hudi Versions

The current dependent Hudi version is 0.15. It is recommended to access Hudi data version 0.14 and above.
Expand Down
44 changes: 44 additions & 0 deletions docs/lakehouse/catalogs/iceberg-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,50 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

The CommonProperties section is for entering general properties. See the [Catalog Overview](../catalog-overview.md) for details on common properties.

## Metadata Cache (4.0.4+) {#meta-cache-404}

Starting from Doris 4.0.4, Iceberg Catalog metadata caches are configured with the unified `meta.cache.*` properties.
This section focuses on **how to use** and **how to observe** the Iceberg-related cache modules.

For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md).

### Cache Modules {#meta-cache-404-modules}

| Module | Property key prefix | Cached content (typical) |
|---|---|---|
| `table` | `meta.cache.iceberg.table.` | Iceberg table metadata object (reduces catalog/metastore round trips). |
| `manifest` | `meta.cache.iceberg.manifest.` | Manifest-related metadata (reduces repeated manifest access overhead). |

Example (shorter TTL for manifest to prioritize freshness):

```sql
ALTER CATALOG iceberg_ctl SET PROPERTIES (
"meta.cache.iceberg.manifest.ttl-second" = "600"
);
```

### Observability {#meta-cache-404-observability}

Iceberg cache metrics are available in `information_schema.catalog_meta_cache_statistics`.
For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

The `cache_name` values for Iceberg modules are:

| Module | cache_name |
|---|---|
| `table` | `iceberg_table_cache` |
| `manifest` | `iceberg_manifest_cache` |

Example query:

```sql
SELECT *
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'iceberg_ctl'
AND cache_name LIKE 'iceberg_%'
ORDER BY cache_name, metric_name;
```

### Supported Iceberg Versions

| Doris Version | Iceberg SDK Version |
Expand Down
43 changes: 43 additions & 0 deletions docs/lakehouse/catalogs/maxcompute-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,49 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

The CommonProperties section is used to fill in common properties. Please refer to the "Common Properties" section in [Catalog Overview](../catalog-overview.md).

## Metadata Cache (4.0.4+) {#meta-cache-404}

Starting from Doris 4.0.4, MaxCompute Catalog metadata caches are configured with the unified `meta.cache.*` properties.
This section focuses on **how to use** and **how to observe** the MaxCompute-related cache module.

For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md).

### Cache Modules {#meta-cache-404-modules}

| Module | Property key prefix | Cached content (typical) |
|---|---|---|
| `partition-values` | `meta.cache.maxcompute.partition-values.` | Partition values list (reduces repeated remote listing overhead). |

Example:

```sql
ALTER CATALOG mc_ctl SET PROPERTIES (
"meta.cache.maxcompute.partition-values.ttl-second" = "3600",
"meta.cache.maxcompute.partition-values.capacity" = "5000"
);
```

### Observability {#meta-cache-404-observability}

MaxCompute cache metrics are available in `information_schema.catalog_meta_cache_statistics`.
For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

The `cache_name` value for MaxCompute module is:

| Module | cache_name |
|---|---|
| `partition-values` | `maxcompute_partition_values_cache` |

Example query:

```sql
SELECT *
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'mc_ctl'
AND cache_name LIKE 'maxcompute_%'
ORDER BY cache_name, metric_name;
```

### Supported MaxCompute Versions

Only the public cloud version of MaxCompute is supported. For private cloud version support, please contact Doris community support.
Expand Down
42 changes: 42 additions & 0 deletions docs/lakehouse/catalogs/paimon-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,48 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

The CommonProperties section is used to fill in common properties. Please refer to the [Catalog Overview](../catalog-overview.md) section on [Common Properties].

## Metadata Cache (4.0.4+) {#meta-cache-404}

Starting from Doris 4.0.4, Paimon Catalog metadata caches are configured with the unified `meta.cache.*` properties.
This section focuses on **how to use** and **how to observe** the Paimon-related cache modules.

For the unified property semantics, see: [Unified External Meta Cache (4.0.4+)](../meta-cache/unified-meta-cache.md).

### Cache Modules {#meta-cache-404-modules}

| Module | Property key prefix | Cached content (typical) |
|---|---|---|
| `table` | `meta.cache.paimon.table.` | Paimon table metadata used for query planning (schema/snapshot/partition related metadata, depending on workload). |

Example (disable module cache and always load on demand):

```sql
ALTER CATALOG paimon_ctl SET PROPERTIES (
"meta.cache.paimon.table.ttl-second" = "0"
);
```

### Observability {#meta-cache-404-observability}

Paimon cache metrics are available in `information_schema.catalog_meta_cache_statistics`.
For the table definition and metric meanings, see: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

The `cache_name` value for Paimon module is:

| Module | cache_name |
|---|---|
| `table` | `paimon_table_cache` |

Example query:

```sql
SELECT *
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'paimon_ctl'
AND cache_name LIKE 'paimon_%'
ORDER BY cache_name, metric_name;
```

### Supported Paimon Versions

The currently dependent Paimon version is 1.0.0.
Expand Down
19 changes: 18 additions & 1 deletion docs/lakehouse/meta-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ For **data cache**, refer to the [data cache documentation](./data-cache.md).
This document applies to versions after 2.1.6.
:::

:::note
For Doris 4.0.4 and later, external meta cache has been refactored with unified configuration keys `meta.cache.*`.
See [Unified External Meta Cache (4.0.4+)](./meta-cache/unified-meta-cache.md).
:::

## Cache Strategies

Most caches have the following three strategy indicators:
Expand Down Expand Up @@ -321,6 +326,12 @@ This section mainly introduces the cache behavior that users may be concerned ab

For all types of External Catalogs, if you want to see the latest Table Schema in real time, you can disable the Schema Cache:

:::note
Starting from Doris 4.0.4, the legacy catalog-level cache property `schema.cache.ttl-second` is deprecated.
For 4.0.4+, keep using the FE config method below, and refer to:
[Unified External Meta Cache (4.0.4+)](./meta-cache/unified-meta-cache.md).
:::

- Disable globally

```text
Expand All @@ -341,6 +352,13 @@ After setting, Doris will see the latest Table Schema in real time. However, thi

For Hive Catalog, if you want to disable the cache to query real-time updated data, you can configure the following parameters:

:::note
Starting from Doris 4.0.4, the legacy catalog-level properties `file.meta.cache.ttl-second` and `partition.cache.ttl-second`
are deprecated. Use unified `meta.cache.hive.*` properties instead. See:
[Hive Catalog](./catalogs/hive-catalog.mdx#meta-cache-404) and
[Unified External Meta Cache (4.0.4+)](./meta-cache/unified-meta-cache.md).
:::

- Disable globally

```text
Expand All @@ -363,4 +381,3 @@ After setting the above parameters:
- Changes in partition data files can be queried in real time.

But this will increase the access pressure on external data sources (such as Hive Metastore and HDFS), which may cause unstable metadata access latency and other phenomena.

84 changes: 84 additions & 0 deletions docs/lakehouse/meta-cache/unified-meta-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
{
"title": "Unified External Meta Cache (4.0.4+)",
"language": "en",
"description": "User guide for unified external metadata cache: unified meta.cache.* properties, what is cached, and where to configure per catalog."
}
---

Starting from **Doris 4.0.4**, external metadata caching is unified for major External Catalog engines. As a user, you only need to know:

| You want to know | Where in docs |
|---|---|
| Where to configure | Catalog `PROPERTIES` with `meta.cache.*` keys (see the catalog pages linked below). |
| What it affects | Depends on catalog engine (partitions, file listing, table metadata, manifests, etc.). |
| How to observe | `information_schema.catalog_meta_cache_statistics` (see the observability section below). |

:::tip
Applies to Doris 4.0.4 and later.
:::

## Unified Property Model

All engine cache modules share the same property key pattern:

`meta.cache.<engine>.<module>.{enable,ttl-second,capacity}`

The following table describes the property semantics:

| Property | Example | Meaning |
|---|---|---|
| `enable` | `true/false` | Whether this cache module is enabled. |
| `ttl-second` | `600`, `0`, `-1` | `0` disables the module; `-1` means no expiration; otherwise expire after access by TTL. |
| `capacity` | `10000` | Max entry count (count-based). `0` disables the module. |

Example (edit catalog properties):

```sql
ALTER CATALOG hive_ctl SET PROPERTIES (
"meta.cache.hive.file.ttl-second" = "0"
);
```

## What External Meta Cache Includes

External meta cache covers different kinds of metadata. Some are configured by unified catalog properties, and some are controlled by FE configs:

| Category | Examples | How to configure |
|---|---|---|
| Engine module caches | Hive partitions/files, Iceberg manifests, Paimon table metadata, etc. | Catalog `PROPERTIES`: `meta.cache.<engine>.<module>.*` |
| Schema cache | Table schema, isolated by schema version token | FE configs (for example: `max_external_schema_cache_num`) |

## Catalog-Specific Configuration (Links)

For each catalog engine, the supported cache modules and the recommended properties are documented in its catalog page:

| Catalog engine | Where to configure module caches |
|---|---|
| Hive | [Hive Catalog](../catalogs/hive-catalog.mdx#meta-cache-404) |
| Iceberg | [Iceberg Catalog](../catalogs/iceberg-catalog.mdx#meta-cache-404) |
| Paimon | [Paimon Catalog](../catalogs/paimon-catalog.mdx#meta-cache-404) |
| Hudi | [Hudi Catalog](../catalogs/hudi-catalog.md#meta-cache-404) |
| MaxCompute | [MaxCompute Catalog](../catalogs/maxcompute-catalog.md#meta-cache-404) |

## Observability

Use the system table to observe cache metrics:

```sql
SELECT *
FROM information_schema.catalog_meta_cache_statistics
ORDER BY catalog_name, cache_name, metric_name;
```

This table is documented at: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

Naming convention:

| Field | Convention |
|---|---|
| `cache_name` | `<engine>_<module>_cache` (module `-` is converted to `_`) |

## Migration Note (Legacy Properties)

Starting from Doris 4.0.4, legacy catalog cache properties (for example, `schema.cache.ttl-second`, `file.meta.cache.ttl-second`) are deprecated. Use `meta.cache.*` properties instead and follow the catalog-specific pages above.
Loading