Skip to content

Conversation

@taroface
Copy link
Contributor

@taroface taroface commented Jan 8, 2026

DOC-14809
DOC-11635
DOC-10969
DOC-10915
DOC-12760

Improve our partial statistics docs as follows:

  • Document partial statistics on the optimizer page
  • Add partial stats examples to CREATE STATISTICS page
  • Document missing session settings and table storage parameters
  • Reorganize how table stats (full, partial, forecasted) are presented on the optimizer page

These updates apply to 23.2-26.1, according to the following feature timeline (please call out if incorrect):

Version Feature / Milestone Default Behavior Updated Settings
≤ v24.2 Manual partial statistics available but disabled by default Partial stats must be explicitly enabled; optimizer has no merged partial stats support enable_create_stats_using_extremes = off (session)
v24.3 Manual partial statistics via USING EXTREMES enabled by default Optimizer can use partial stats (off by default) enable_create_stats_using_extremes = on (session); optimizer_use_merged_partial_statistics = off (session)
v25.1 Automatic partial statistics collection introduced Automatic partial stats collection on by default sql.stats.automatic_partial_collection.enabled = true; sql.stats.automatic_partial_collection.min_stale_rows = 100; sql.stats.automatic_partial_collection.fraction_stale_rows = 0.05; table-level equivalents
v25.2 Optimizer uses merged partial statistics by default; independent full/partial control Optimizer merges partial stats by default; full and partial auto-collection independently configurable optimizer_use_merged_partial_statistics = on; sql.stats.automatic_partial_collection.enabled = true; sql.stats.automatic_full_collection.enabled = true; table-level equivalents
v25.4 Predicate-based partial statistics (WHERE) Partial statistics can be manually collected using WHERE

Docs to review (v26.1; please use the version dropdown menu to change versions):

@netlify
Copy link

netlify bot commented Jan 8, 2026

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 2fbb646
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/69614763f2634d00086e65e3

@netlify
Copy link

netlify bot commented Jan 8, 2026

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 2fbb646
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/696147636e39740008486e43

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Files changed:

@netlify
Copy link

netlify bot commented Jan 8, 2026

Deploy Preview for cockroachdb-docs failed. Why did it fail? →

Name Link
🔨 Latest commit 38dbdab
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/695fd90114fc6100089138af

@netlify
Copy link

netlify bot commented Jan 8, 2026

Netlify Preview

Name Link
🔨 Latest commit 2fbb646
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/69614763cbf483000880aae5
😎 Deploy Preview https://deploy-preview-22087--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@taroface taroface requested a review from rytaft January 9, 2026 19:01
Copy link
Contributor

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for all this work! Just a few comments.


### Full statistics

By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: during schema changes -> after schema changes

- There have been at least 3 historical statistics collections.
- The historical statistics closely fit a linear pattern.

By default, the optimizer uses forecasts that closely match the historical statistics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"By default, the optimizer uses forecasts that closely match the historical statistics."

I'm not completely sure what this means....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a comment on the goodness of fit required for the optimizer to use forecasts?


This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan.

You can also create extremes statistics on specific columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specify that this is possible if there is an index with the specified column as the first key column?


*Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried.

Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when the number of stale rows reaches a threshold"

This is also true of full stats. Maybe just mention that the threshold is lower for partial stats.

CREATE INDEX ON rides (start_time);
~~~

Partial statistics are particularly valuable for timestamp columns where workloads commonly access the most recent data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the WHERE clause, you can specify arbitrary predicates, so the "recency" argument isn't needed here (unlike for USING EXTREMES). Any range of values that was recently updated (even in the middle of the index) can have partial stats collected here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants