[MDEV-27243] Estimation for filtered rows less precise with JSON histogram comparing to DOUBLE_PREC (#7) upon TIME comparison Created: 2021-12-12  Updated: 2022-01-19  Resolved: 2022-01-08

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Temporal Types
Affects Version/s: N/A
Fix Version/s: 10.8.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-26519 JSON Histograms: improve histogram co... Closed

 Description   

Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size.

--source include/have_sequence.inc
 
CREATE TABLE t1 (f TIME) ENGINE=MyISAM;
INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000;
 
SET histogram_type= JSON_HB;
ANALYZE TABLE t1 PERSISTENT FOR ALL;
ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';
 
SET histogram_type= DOUBLE_PREC_HB;
ANALYZE TABLE t1 PERSISTENT FOR ALL;
ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';
 
DROP TABLE t1;

JSON_HB:

preview-10.8-MDEV-26519-json-histograms da3231a8

        "table": {
          "table_name": "t1",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 1000,
          "r_rows": 1000,
          "r_table_time_ms": 0.037146099,
          "r_other_time_ms": 0.077310006,
          "filtered": 99.75002289,
          "r_filtered": 50,
          "attached_condition": "t1.f > '00:01:00'"
        }

DOUBLE_PREC_HB:

        "table": {
          "table_name": "t1",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 1000,
          "r_rows": 1000,
          "r_table_time_ms": 0.033997955,
          "r_other_time_ms": 0.07762081,
          "filtered": 50,
          "r_filtered": 50,
          "attached_condition": "t1.f > '00:01:00'"
        }



 Comments   
Comment by Sergei Petrunia [ 2021-12-26 ]

  "histogram_hb": [
    {
      "start": "00:00:00",
      "size": 0.5,
      "ndv": 1
    },
    {
      "start": "02:00:02",
      "size": 0.004,
      "ndv": 4
    },

ANALYZE SELECT * FROM t1 WHERE f > '00:01:00';
filtered=99.7
r_filtered=50

in Histogram_json_hb::range_selectivity():

exclusive_endp=true
equal=false
idx=0

We hit the first bucket...
and then we compute sel= position_in_interval()= 0.0049

min= 0.0049 * (bucket_size=0.5) = 0.002
max=1.0

which gives filtered=99%.

Things go wrong when we consider the singleton bucket to be a general bucket (which spans from 00:00 to 02:00) and call position_in_interval(00:01). This returns some value (which doesn't look perfect btw), but the issue here is that for the singleton bucket we can see that min_endp > bucket_start which means sel=1.0.

Comment by Sergei Petrunia [ 2022-01-08 ]

Fixed by fix for MDEV-27229. Added a testcase.

Generated at Thu Feb 08 09:51:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.