[MDEV-26886] Estimation for filtered rows less precise with JSON histogram comparing to DOUBLE_PREC (#3) Created: 2021-10-22 Updated: 2022-01-19 Resolved: 2021-11-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | N/A |
| Fix Version/s: | 10.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Elena Stepanova | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
So, the actual result set contains 33 rows out of 100, DOUBLE_PREC gives filtered=32.8 which is very close, JSON histogram gives filtered=1.47 which is far off. Reproducible with InnoDB and MyISAM alike. |
| Comments |
| Comment by Elena Stepanova [ 2021-10-22 ] | ||||||||||||||||||
|
Raised to critical on procedural reasons. Feel free to demote and remove from "must-do" scope. | ||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-11-26 ] | ||||||||||||||||||
|
Histogram_size=68.
The issue is that a condition
produces a very small estimate that doesn't include the first bucket. | ||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-11-26 ] | ||||||||||||||||||
|
The computations in Histogram_json_hb::range_selectivity go as follows:
Correct so far. Then, we get here:
In get_end_value(idx), it takes this path
and returns "\001". (ISSUE-1) Then, position_in_interval() is called with
which returns 0.0 (this is correct). with sel=0.0, this formmula
returns 0.0 also. | ||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-11-26 ] | ||||||||||||||||||
|
... the problem is marked with ISSUE-1. The code assumes that the values in the first bucket are "spread" between 0 and 1. It ignores the fact that the bucket has ndv=1. |