[MDEV-27492] DOUBLE_PREC_HB histogram has poor estimates for BIT columns Created: 2022-01-13  Updated: 2023-12-07

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: 10.11

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-26519 JSON Histograms: improve histogram co... Closed
Relates
relates to MDEV-27493 Inconsistent warnings upon using EITS... Open

 Description   

EDIT : The issue is actually not what is described in the report. See the comments!

CREATE TABLE t (a varchar(1), b bit(64)) ENGINE=MyISAM;
 
INSERT INTO t VALUES
  ('n',x'30'),('v',x'45'),('n',x'01'),('k',x'BE562B1A99001918'),
  ('t',x'30'),('n',x'30'),('n',x'30'),('f',x'21'),('y',x'10'),
  ('n',x'01'),('n',x'02'),('q',x'11'),('i',x'30'),('d',x'E2'),
  ('t',x'61'),('r',x'30'),('q',x'81'),('c',x'02'),('o',x'30'),
  ('v',x'30'),('n',x'02'),('c',x'01'),('g',x'13'),('g',x'22'),
  ('q',x'81'),('n',x'30'),('x',x'6C'),('e',x'15'),('n',x'23'),
  ('z',x'30'),('n',x'01'),('d',x'61'),('y',x'43'),('o',x'F9');
 
set histogram_type= DOUBLE_PREC_HB;
analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT b FROM t WHERE 'w' NOT BETWEEN a AND 'y' AND b < b'1000000';
 
set histogram_type= JSON_HB;
analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT b FROM t WHERE 'w' NOT BETWEEN a AND 'y' AND b < b'1000000';
 
# Cleanup
DROP TABLE t;

DOUBLE_PREC_HB preview-10.8-MDEV-26519-json-histograms 9a86900b

        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 34,
          "r_rows": 34,
          "r_table_time_ms": 0.003460502,
          "r_other_time_ms": 0.006840761,
          "filtered": 5.555588245,
          "r_filtered": 5.882352941,
          "attached_condition": "'w' not between t.a and 'y' and t.b < 0x40"
        }

JSON_HB

        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 34,
          "r_rows": 34,
          "r_table_time_ms": 0.003239832,
          "r_other_time_ms": 0.004062328,
          "filtered": 70.58823395,
          "r_filtered": 5.882352941,
          "attached_condition": "'w' not between t.a and 'y' and t.b < 0x40"
        }

The total count is 34 rows and the result set is 2 rows.

Also reproducible with InnoDB, Aria.



 Comments   
Comment by Elena Stepanova [ 2022-01-13 ]

See also MDEV-27493, it must be related in some way.

Comment by Sergei Petrunia [ 2022-01-14 ]

The first column, a varchar(1), seems to be completely irrelevant.

Comment by Elena Stepanova [ 2022-01-14 ]

It participates in the where clause. Maybe it can be modified to eliminate it, I didn't succeed.

Comment by Sergei Petrunia [ 2022-01-14 ]

Interesting, when I remove conditions on a from the WHERE clause, I get it the other way around: JSON_HB is precise while DOUBLE_PREC_HB is not :

set histogram_type= DOUBLE_PREC_HB;
analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT b FROM t WHERE  b < b'1000000';
 
set histogram_type= JSON_HB;
analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT b FROM t WHERE  b < b'1000000';

DOUBLE_PREC_HB:

        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 34,
          "r_rows": 34,
          "r_table_time_ms": 0.206056495,
          "r_other_time_ms": 0.266735552,
          "filtered": 5.555588245,
          "r_filtered": 73.52941176,
          "attached_condition": "t.b < 0x40"
        }

JSON_HB:

      {
        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 34,
          "r_rows": 34,
          "r_table_time_ms": 0.192904935,
          "r_other_time_ms": 0.238820538,
          "filtered": 70.58823395,
          "r_filtered": 73.52941176,
          "attached_condition": "t.b < 0x40"
        }

Comment by Sergei Petrunia [ 2022-01-14 ]

Running the original query:

set histogram_type= DOUBLE_PREC_HB;
analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT b FROM t WHERE 'w' NOT BETWEEN a AND 'y' AND b < b'1000000';

        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 34,
          "r_rows": 34,
          "r_table_time_ms": 0.214689414,
          "r_other_time_ms": 0.188034214,
          "filtered": 5.555588245,
          "r_filtered": 5.882352941,
          "attached_condition": "'w' not between t.a and 'y' and t.b < 0x40"
        }

Optimizer trace shows:

            "rows_estimation": [
              {
                "selectivity_for_indexes": [],
                "selectivity_for_columns": [
                  {
                    "column_name": "b",
                    "ranges": ["NULL < b < 64"],
                    "selectivity_from_histogram": 0.055555882
                  }
                ],
                "cond_selectivity": 0.055555882
              },
              {
                "table": "t",
                "table_scan": {
                  "rows": 34,
                  "cost": 2.166015625
                }
              }
            ]

Note that it has no estimate for the

'w' not between t.a and 'y'

condition. That condition has selectivity of 11.76%.

Comment by Sergei Petrunia [ 2022-01-14 ]

An idea about why DOUBLE_PREC_HB selectivity is wrong:

MariaDB [j10]>  select min_value,max_value from mysql.column_stats where table_name='t' and db_name=database();
+----------------------+-----------+
| min_value            | max_value |
+----------------------+-----------+
| c                    | z         |
| 13715197108439488792 | 249       |
+----------------------+-----------+

13715197108439488792 is BE562B1A99001918.
249 is F9.

min_value is wrong. Actual ordering:

select hex(b) from t order by b;
+------------------+
| hex(b)           |
+------------------+
| 1                |
| 1                |
| 1                |
| 1                |
| 2                |
| 2                |
| 2                |
| 10               |
| 11               |
| 13               |
| 15               |
| 21               |
| 22               |
| 23               |
| 30               |
| 30               |
| 30               |
| 30               |
| 30               |
| 30               |
| 30               |
| 30               |
| 30               |
| 30               |
| 43               |
| 45               |
| 61               |
| 61               |
| 6C               |
| 81               |
| 81               |
| E2               |
| F9               |
| BE562B1A99001918 |
+------------------+

Comment by Sergei Petrunia [ 2022-01-14 ]

... so, mysql.min_value and max_value are incorrect, and since DOUBLE_PREC_HB histogram uses them for computing estimates, the estimates are incorrect, too.

Generated at Thu Feb 08 09:53:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.