The histograms as defined in MDEV-21130 follow the approach used by SINGLE_PREC_HB/DOUBLE_PREC_HB histograms and put the bucket bound exactly after
a certain fraction of rows.
For example, consider a table with these values (each character is one row):
A histogram with 8 buckets will have these buckets:
The estimation function will be able to see that 'a' is a popular value, it will determine its frequency with bucket_size precision.
However, a better approach would be to put all 'a' values in one bucket:
This will give a more precise number for table 'a'.
It will also provide more buckets for other values.
- MySQL does something like this.
- PostgreSQL collects MCVs (Most Common Values) and removes them from consideration before building the histogram.
Again, following MySQL here: store number of distinct values in each bucket.
Store only a 40-character prefix.