-
Notifications
You must be signed in to change notification settings - Fork 476
Partial statistics docs and settings #22087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
38dbdab to
f0b9359
Compare
❌ Deploy Preview for cockroachdb-docs failed. Why did it fail? →
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
dacebde to
8cf6c99
Compare
8cf6c99 to
c477e0e
Compare
rytaft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for all this work! Just a few comments.
|
|
||
| ### Full statistics | ||
|
|
||
| By default, CockroachDB automatically generates full statistics when tables are [created]({% link {{ page.version.version }}/create-table.md %}) and during [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}). Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: during schema changes -> after schema changes
| - There have been at least 3 historical statistics collections. | ||
| - The historical statistics closely fit a linear pattern. | ||
|
|
||
| By default, the optimizer uses forecasts that closely match the historical statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"By default, the optimizer uses forecasts that closely match the historical statistics."
I'm not completely sure what this means....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a comment on the goodness of fit required for the optimizer to use forecasts?
|
|
||
| This creates partial statistics on all single column prefixes of forward indexes in the `rides` table by scanning only the highest and lowest index values, providing updated statistics without performing a full table scan. | ||
|
|
||
| You can also create extremes statistics on specific columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe specify that this is possible if there is an index with the specified column as the first key column?
|
|
||
| *Partial statistics* are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried. | ||
|
|
||
| Whereas [full statistics](#full-statistics) refresh infrequently and can allow stale rows to accumulate, partial statistics [automatically refresh](#automatically-collect-partial-statistics) when the number of stale rows reaches a threshold. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be [collected manually](#manually-collect-partial-statistics). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"when the number of stale rows reaches a threshold"
This is also true of full stats. Maybe just mention that the threshold is lower for partial stats.
| CREATE INDEX ON rides (start_time); | ||
| ~~~ | ||
|
|
||
| Partial statistics are particularly valuable for timestamp columns where workloads commonly access the most recent data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the WHERE clause, you can specify arbitrary predicates, so the "recency" argument isn't needed here (unlike for USING EXTREMES). Any range of values that was recently updated (even in the middle of the index) can have partial stats collected here.
DOC-14809
DOC-11635
DOC-10969
DOC-10915
DOC-12760
Improve our partial statistics docs as follows:
CREATE STATISTICSpageThese updates apply to 23.2-26.1, according to the following feature timeline (please call out if incorrect):
enable_create_stats_using_extremes= off (session)USING EXTREMESenabled by defaultenable_create_stats_using_extremes= on (session);optimizer_use_merged_partial_statistics= off (session)sql.stats.automatic_partial_collection.enabled= true;sql.stats.automatic_partial_collection.min_stale_rows= 100;sql.stats.automatic_partial_collection.fraction_stale_rows= 0.05; table-level equivalentsoptimizer_use_merged_partial_statistics= on;sql.stats.automatic_partial_collection.enabled= true;sql.stats.automatic_full_collection.enabled= true; table-level equivalentsWHERE)WHEREDocs to review (v26.1; please use the version dropdown menu to change versions):
CREATE STATISTICS