Details
-
Task
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
We need to test precision of histograms (both DOUBLE_PREC_HB and JSON_HB) with different analyze_sample_percentage settings.
Goals:
- Catch possible bugs, or just undesirable estimates.
- Validate analyze_sample_percentage=0 before making it the default.
Means:
Running full benchmarks is not a good way to achieve this:
- It is difficult to see if better query plans are caused by better/worse selectivity estimates.
- Benchmark queries do only a few requests to histogram data.
We reuse the approach from histogram-test [1] script:
- Get some dataset somewhere
- Collect histogram
- Run small test queries and see what estimates the optimizer gets from histogram code for various ranges.
- ask for common/uncommon values
- ask for wide/narrow intervals
- etc
Things to check for
- Height-balanced histograms should not produce errors that are larger than bucket_size (right?)
- But for DOUBLE_PREC_HB there is also an error due to imprecise storage of bucket endpoint. Not sure what are the bounds on that.
- Using sampling instead of the full dataset should not reduce the precision much.
- what happens to n_distinct estimate?
- Monitor disk space. Does it increase less with the percentage 0. How much less?
Attachments
Issue Links
- relates to
-
MDEV-32580 The value of cardinality in mysql.table_stats may exceed by more than 10% the actual number of records in the table
- Open
-
MDEV-6529 optimize VARCHAR temp storage during EITS ANALYZE
- Stalled