Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27492

DOUBLE_PREC_HB histogram has poor estimates for BIT columns

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Minor
    • Resolution: Unresolved
    • N/A
    • 10.11
    • Optimizer
    • None

    Description

      EDIT : The issue is actually not what is described in the report. See the comments!

      CREATE TABLE t (a varchar(1), b bit(64)) ENGINE=MyISAM;
       
      INSERT INTO t VALUES
        ('n',x'30'),('v',x'45'),('n',x'01'),('k',x'BE562B1A99001918'),
        ('t',x'30'),('n',x'30'),('n',x'30'),('f',x'21'),('y',x'10'),
        ('n',x'01'),('n',x'02'),('q',x'11'),('i',x'30'),('d',x'E2'),
        ('t',x'61'),('r',x'30'),('q',x'81'),('c',x'02'),('o',x'30'),
        ('v',x'30'),('n',x'02'),('c',x'01'),('g',x'13'),('g',x'22'),
        ('q',x'81'),('n',x'30'),('x',x'6C'),('e',x'15'),('n',x'23'),
        ('z',x'30'),('n',x'01'),('d',x'61'),('y',x'43'),('o',x'F9');
       
      set histogram_type= DOUBLE_PREC_HB;
      analyze table t persistent for all;
      ANALYZE FORMAT=JSON SELECT b FROM t WHERE 'w' NOT BETWEEN a AND 'y' AND b < b'1000000';
       
      set histogram_type= JSON_HB;
      analyze table t persistent for all;
      ANALYZE FORMAT=JSON SELECT b FROM t WHERE 'w' NOT BETWEEN a AND 'y' AND b < b'1000000';
       
      # Cleanup
      DROP TABLE t;
      

      DOUBLE_PREC_HB preview-10.8-MDEV-26519-json-histograms 9a86900b

              "table": {
                "table_name": "t",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 34,
                "r_rows": 34,
                "r_table_time_ms": 0.003460502,
                "r_other_time_ms": 0.006840761,
                "filtered": 5.555588245,
                "r_filtered": 5.882352941,
                "attached_condition": "'w' not between t.a and 'y' and t.b < 0x40"
              }
      

      JSON_HB

              "table": {
                "table_name": "t",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 34,
                "r_rows": 34,
                "r_table_time_ms": 0.003239832,
                "r_other_time_ms": 0.004062328,
                "filtered": 70.58823395,
                "r_filtered": 5.882352941,
                "attached_condition": "'w' not between t.a and 'y' and t.b < 0x40"
              }
      

      The total count is 34 rows and the result set is 2 rows.

      Also reproducible with InnoDB, Aria.

      Attachments

        Issue Links

          Activity

            See also MDEV-27493, it must be related in some way.

            elenst Elena Stepanova added a comment - See also MDEV-27493 , it must be related in some way.

            The first column, a varchar(1), seems to be completely irrelevant.

            psergei Sergei Petrunia added a comment - The first column, a varchar(1) , seems to be completely irrelevant.
            elenst Elena Stepanova added a comment - - edited

            It participates in the where clause. Maybe it can be modified to eliminate it, I didn't succeed.

            elenst Elena Stepanova added a comment - - edited It participates in the where clause. Maybe it can be modified to eliminate it, I didn't succeed.

            Interesting, when I remove conditions on a from the WHERE clause, I get it the other way around: JSON_HB is precise while DOUBLE_PREC_HB is not :

            set histogram_type= DOUBLE_PREC_HB;
            analyze table t persistent for all;
            ANALYZE FORMAT=JSON SELECT b FROM t WHERE  b < b'1000000';
             
            set histogram_type= JSON_HB;
            analyze table t persistent for all;
            ANALYZE FORMAT=JSON SELECT b FROM t WHERE  b < b'1000000';
            

            DOUBLE_PREC_HB:

                    "table": {
                      "table_name": "t",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 34,
                      "r_rows": 34,
                      "r_table_time_ms": 0.206056495,
                      "r_other_time_ms": 0.266735552,
                      "filtered": 5.555588245,
                      "r_filtered": 73.52941176,
                      "attached_condition": "t.b < 0x40"
                    }
            

            JSON_HB:

                  {
                    "table": {
                      "table_name": "t",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 34,
                      "r_rows": 34,
                      "r_table_time_ms": 0.192904935,
                      "r_other_time_ms": 0.238820538,
                      "filtered": 70.58823395,
                      "r_filtered": 73.52941176,
                      "attached_condition": "t.b < 0x40"
                    }
            

            psergei Sergei Petrunia added a comment - Interesting, when I remove conditions on a from the WHERE clause, I get it the other way around: JSON_HB is precise while DOUBLE_PREC_HB is not : set histogram_type= DOUBLE_PREC_HB; analyze table t persistent for all ; ANALYZE FORMAT=JSON SELECT b FROM t WHERE b < b '1000000' ; set histogram_type= JSON_HB; analyze table t persistent for all ; ANALYZE FORMAT=JSON SELECT b FROM t WHERE b < b '1000000' ; DOUBLE_PREC_HB: "table": { "table_name": "t", "access_type": "ALL", "r_loops": 1, "rows": 34, "r_rows": 34, "r_table_time_ms": 0.206056495, "r_other_time_ms": 0.266735552, "filtered": 5.555588245, "r_filtered": 73.52941176, "attached_condition": "t.b < 0x40" } JSON_HB: { "table": { "table_name": "t", "access_type": "ALL", "r_loops": 1, "rows": 34, "r_rows": 34, "r_table_time_ms": 0.192904935, "r_other_time_ms": 0.238820538, "filtered": 70.58823395, "r_filtered": 73.52941176, "attached_condition": "t.b < 0x40" }

            Running the original query:

            set histogram_type= DOUBLE_PREC_HB;
            analyze table t persistent for all;
            ANALYZE FORMAT=JSON SELECT b FROM t WHERE 'w' NOT BETWEEN a AND 'y' AND b < b'1000000';
            

                    "table": {
                      "table_name": "t",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 34,
                      "r_rows": 34,
                      "r_table_time_ms": 0.214689414,
                      "r_other_time_ms": 0.188034214,
                      "filtered": 5.555588245,
                      "r_filtered": 5.882352941,
                      "attached_condition": "'w' not between t.a and 'y' and t.b < 0x40"
                    }
            

            Optimizer trace shows:

                        "rows_estimation": [
                          {
                            "selectivity_for_indexes": [],
                            "selectivity_for_columns": [
                              {
                                "column_name": "b",
                                "ranges": ["NULL < b < 64"],
                                "selectivity_from_histogram": 0.055555882
                              }
                            ],
                            "cond_selectivity": 0.055555882
                          },
                          {
                            "table": "t",
                            "table_scan": {
                              "rows": 34,
                              "cost": 2.166015625
                            }
                          }
                        ]
            

            Note that it has no estimate for the

            'w' not between t.a and 'y'
            

            condition. That condition has selectivity of 11.76%.

            psergei Sergei Petrunia added a comment - Running the original query: set histogram_type= DOUBLE_PREC_HB; analyze table t persistent for all ; ANALYZE FORMAT=JSON SELECT b FROM t WHERE 'w' NOT BETWEEN a AND 'y' AND b < b '1000000' ; "table": { "table_name": "t", "access_type": "ALL", "r_loops": 1, "rows": 34, "r_rows": 34, "r_table_time_ms": 0.214689414, "r_other_time_ms": 0.188034214, "filtered": 5.555588245, "r_filtered": 5.882352941, "attached_condition": "'w' not between t.a and 'y' and t.b < 0x40" } Optimizer trace shows: "rows_estimation": [ { "selectivity_for_indexes": [], "selectivity_for_columns": [ { "column_name": "b", "ranges": ["NULL < b < 64"], "selectivity_from_histogram": 0.055555882 } ], "cond_selectivity": 0.055555882 }, { "table": "t", "table_scan": { "rows": 34, "cost": 2.166015625 } } ] Note that it has no estimate for the 'w' not between t.a and 'y' condition. That condition has selectivity of 11.76%.

            An idea about why DOUBLE_PREC_HB selectivity is wrong:

            MariaDB [j10]>  select min_value,max_value from mysql.column_stats where table_name='t' and db_name=database();
            +----------------------+-----------+
            | min_value            | max_value |
            +----------------------+-----------+
            | c                    | z         |
            | 13715197108439488792 | 249       |
            +----------------------+-----------+
            

            13715197108439488792 is BE562B1A99001918.
            249 is F9.

            min_value is wrong. Actual ordering:

            select hex(b) from t order by b;
            +------------------+
            | hex(b)           |
            +------------------+
            | 1                |
            | 1                |
            | 1                |
            | 1                |
            | 2                |
            | 2                |
            | 2                |
            | 10               |
            | 11               |
            | 13               |
            | 15               |
            | 21               |
            | 22               |
            | 23               |
            | 30               |
            | 30               |
            | 30               |
            | 30               |
            | 30               |
            | 30               |
            | 30               |
            | 30               |
            | 30               |
            | 30               |
            | 43               |
            | 45               |
            | 61               |
            | 61               |
            | 6C               |
            | 81               |
            | 81               |
            | E2               |
            | F9               |
            | BE562B1A99001918 |
            +------------------+
            

            psergei Sergei Petrunia added a comment - An idea about why DOUBLE_PREC_HB selectivity is wrong: MariaDB [j10]> select min_value,max_value from mysql.column_stats where table_name='t' and db_name=database(); +----------------------+-----------+ | min_value | max_value | +----------------------+-----------+ | c | z | | 13715197108439488792 | 249 | +----------------------+-----------+ 13715197108439488792 is BE562B1A99001918. 249 is F9. min_value is wrong. Actual ordering: select hex(b) from t order by b; +------------------+ | hex(b) | +------------------+ | 1 | | 1 | | 1 | | 1 | | 2 | | 2 | | 2 | | 10 | | 11 | | 13 | | 15 | | 21 | | 22 | | 23 | | 30 | | 30 | | 30 | | 30 | | 30 | | 30 | | 30 | | 30 | | 30 | | 30 | | 43 | | 45 | | 61 | | 61 | | 6C | | 81 | | 81 | | E2 | | F9 | | BE562B1A99001918 | +------------------+

            ... so, mysql.min_value and max_value are incorrect, and since DOUBLE_PREC_HB histogram uses them for computing estimates, the estimates are incorrect, too.

            psergei Sergei Petrunia added a comment - ... so, mysql.min_value and max_value are incorrect, and since DOUBLE_PREC_HB histogram uses them for computing estimates, the estimates are incorrect, too.

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.