[MDEV-26849] JSON Histograms: point selectivity estimates are off for non-existent values Created: 2021-10-18 Updated: 2022-01-19 Resolved: 2021-10-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.7 |
| Fix Version/s: | 10.7.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sergei Petrunia | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
Suppose the number of values in the dataset is less than the number of buckets in the histogram. Which value should be returned when producing estimates for col='non-existant-value' ? Currently, MariaDB's code produces the average. It uses "inclusion assumption" and assumes that the value that is looked up exists in the database. This is not what other databases do. The deficiency comes from the fact that when one looks at MariaDB's histograms as they were specified in (Technically, if one walks through the whole histogram and only sees buckets with ndv=1, this will guarantee that the histogram bucket endpoints are all the values that were encountered... but I won't count this as "obvious").
Then run
and for various kinds of WHERE clause we get this:
Apparently, MariaDB's assumption differs from that of other databases' and is worse... |
| Comments |
| Comment by Sergei Petrunia [ 2021-10-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The effect can be seen when the number of values in the table is greater than the number of buckets in the histogram, too.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-10-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
With the patch:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-10-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This has already been pushed:
|