Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27497

DOUBLE_PREC_HB histogram produces wrong estimates for COMPRESSED columns

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL)
    • 10.5, 10.6
    • Optimizer
    • None

    Description

      --source include/have_sequence.inc
       
      CREATE TABLE t (a VARCHAR(16) COMPRESSED COLLATE ucs2_sinhala_ci);
      INSERT INTO t SELECT REPEAT(CHR(seq%26+97),16) FROM seq_1_to_100;
       
      set histogram_type= DOUBLE_PREC_HB;
      analyze table t persistent for all;
      ANALYZE FORMAT=JSON SELECT a FROM t WHERE 'c' BETWEEN 'l' AND a ;
       
      set histogram_type= JSON_HB;
      analyze table t persistent for all;
      ANALYZE FORMAT=JSON SELECT a FROM t WHERE 'c' BETWEEN 'l' AND a ;
       
      # Cleanup
      DROP TABLE t;
      

      DOUBLE_PREC_HB preview-10.8-MDEV-26519-json-histograms 21a81025

              "table": {
                "table_name": "t",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 100,
                "r_rows": 100,
                "r_table_time_ms": 0.006661524,
                "r_other_time_ms": 0.006350519,
                "filtered": 3.846199989,
                "r_filtered": 0,
                "attached_condition": "'c' between 'l' and t.a"
              }
      

      JSON_HB

              "table": {
                "table_name": "t",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 100,
                "r_rows": 100,
                "r_table_time_ms": 0.005778671,
                "r_other_time_ms": 0.005598088,
                "filtered": 93,
                "r_filtered": 0,
                "attached_condition": "'c' between 'l' and t.a"
              }
      

      The actual result set is naturally empty. So, DOUBLE_PREC is close, JSON is far away.

      Reproducible with at least MyISAM, InnoDB, Aria.

      Attachments

        Issue Links

          Activity

            Summarizing the report data:

            DOUBLE_PREC_HB
                      "filtered": 3.846199989,
                      "r_filtered": 0,
             
            JSON
                      "filtered": 93,
                      "r_filtered": 0,
            

            If I try to remove the COMPRESSED attribute, I get this:

            DOUBLE_PREC_HB
                      "filtered": 92.96875,
                      "r_filtered": 0,
             
            JSON_HB
                      "filtered": 93,
                      "r_filtered": 0,
            

            DOUBLE_PREC_HB becomes just as bad as JSON_HB. It looks like the good estimate in the original report has been an accident.

            psergei Sergei Petrunia added a comment - Summarizing the report data: DOUBLE_PREC_HB "filtered": 3.846199989, "r_filtered": 0,   JSON "filtered": 93, "r_filtered": 0, If I try to remove the COMPRESSED attribute, I get this: DOUBLE_PREC_HB "filtered": 92.96875, "r_filtered": 0,   JSON_HB "filtered": 93, "r_filtered": 0, DOUBLE_PREC_HB becomes just as bad as JSON_HB. It looks like the good estimate in the original report has been an accident.

            The column is actually too short to be compressed.
            Status counters confirm this: no [de]compressions happen during the test run:

            MariaDB [test]> show status like '%compressions';
            +-----------------------+-------+
            | Variable_name         | Value |
            +-----------------------+-------+
            | Column_compressions   | 0     |
            | Column_decompressions | 0     |
            +-----------------------+-------+
            2 rows in set (0.003 sec)
            

            psergei Sergei Petrunia added a comment - The column is actually too short to be compressed. Status counters confirm this: no [de] compressions happen during the test run: MariaDB [test]> show status like '%compressions'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Column_compressions | 0 | | Column_decompressions | 0 | +-----------------------+-------+ 2 rows in set (0.003 sec)

            I compare the histograms that were produced when

            • the testcase used COMPRESSED column
            • the testcase didn't use COMPRESSED column.

            The JSON_HB histograms are the same. The DOUBLE_PREC_HB histograms are different.
            Compressed:
            (I've done some manual reformatting to make the differences more apparent)

            MariaDB [test]> select decode_histogram(hist_type,histogram) from save10;
            0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.00000,0.04001,
            0.00000,0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.04001,
            0.00000,0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.04001,
            0.00000,0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.04001,
            0.00000,0.00000,0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.03999,
            0.00000,0.00000,0.00000,0.00000,0.04001,
            0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000 |
            

            Non-compressed:

            MariaDB [test]> select decode_histogram(hist_type,histogram) from save12;
            0.00000,0.00000,0.00000,0.04056,
            0.00000,0.00000,0.00000,0.00000,0.03880,
            0.00000,0.00000,0.00000,0.00000,0.00000,0.02292,
            0.00000,0.00000,0.00000,0.00000,0.05292,
            0.00000,0.00000,0.00000,0.00000,0.08112,
            0.00000,0.00000,0.00000,0.00000,0.01411,
            0.00000,0.00000,0.00000,0.00000,0.05644,
            0.00000,0.00000,0.00000,0.00000,0.04585,
            0.00000,0.00000,0.00000,0.00000,0.03703,
            0.00000,0.00000,0.00000,0.00000,0.02998,
            0.00000,0.00000,0.00000,0.00000,0.00000, 0.02293,
            0.00000,0.00000,0.00000,0.00000,0.07936,
            0.00000,0.00000,0.00000,0.00000,0.01587,
            0.00000,0.00000,0.00000,0.00000,0.05290,
            0.00000,0.00000,0.00000,0.00000,0.06526,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000,0.00000,0.07407,0.00000,0.00000,0.00000,0.00000,0.04233,0.00000,0.00000,0.00000,0.00000,0.00000,0.05115,0.00000,0.00000,0.00000,0.00000,0.06525,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.01587,0.00000,0.00000,0.00000,0.00706,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000 |
            

            psergei Sergei Petrunia added a comment - I compare the histograms that were produced when the testcase used COMPRESSED column the testcase didn't use COMPRESSED column. The JSON_HB histograms are the same. The DOUBLE_PREC_HB histograms are different. Compressed: (I've done some manual reformatting to make the differences more apparent) MariaDB [test]> select decode_histogram(hist_type,histogram) from save10; 0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000 | Non-compressed: MariaDB [test]> select decode_histogram(hist_type,histogram) from save12; 0.00000,0.00000,0.00000,0.04056, 0.00000,0.00000,0.00000,0.00000,0.03880, 0.00000,0.00000,0.00000,0.00000,0.00000,0.02292, 0.00000,0.00000,0.00000,0.00000,0.05292, 0.00000,0.00000,0.00000,0.00000,0.08112, 0.00000,0.00000,0.00000,0.00000,0.01411, 0.00000,0.00000,0.00000,0.00000,0.05644, 0.00000,0.00000,0.00000,0.00000,0.04585, 0.00000,0.00000,0.00000,0.00000,0.03703, 0.00000,0.00000,0.00000,0.00000,0.02998, 0.00000,0.00000,0.00000,0.00000,0.00000, 0.02293, 0.00000,0.00000,0.00000,0.00000,0.07936, 0.00000,0.00000,0.00000,0.00000,0.01587, 0.00000,0.00000,0.00000,0.00000,0.05290, 0.00000,0.00000,0.00000,0.00000,0.06526,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000,0.00000,0.07407,0.00000,0.00000,0.00000,0.00000,0.04233,0.00000,0.00000,0.00000,0.00000,0.00000,0.05115,0.00000,0.00000,0.00000,0.00000,0.06525,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.01587,0.00000,0.00000,0.00000,0.00706,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000 |

            The condition is impossible to satisfy but the optimizer doesn't recognize that.

            The ranges it is estimating and the estimates:

            DOUBLE_PREC_HB, non-COMPRESSED:

                        "rows_estimation": [
                          {
                            "selectivity_for_indexes": [],
                            "selectivity_for_columns": [
                              {
                                "column_name": "a",
                                "ranges": ["c <= a"],
                                "selectivity_from_histogram": 0.9296875
                              }
                            ],
                            "cond_selectivity": 0.9296875
                          },
            

            DOUBLE_PREC_HB, COMPRESSED:

                        "rows_estimation": [
                          {
                            "selectivity_for_indexes": [],
                            "selectivity_for_columns": [
                              {
                                "column_name": "a",
                                "ranges": ["c <= a"],
                                "selectivity_from_histogram": 0.038462
                              }
                            ],
                            "cond_selectivity": 0.038462
                          },
            

            JSON:

                            "selectivity_for_columns": [
                              {
                                "column_name": "a",
                                "ranges": ["c <= a"],
                                "selectivity_from_histogram": 0.93
                              }
                            ],
                            "cond_selectivity": 0.93
                          },
            

            what would be the right selectivity for t.a >= 'c'? The two values that are less are aaaaa and bbbbb.
            sel = (26 - 2)/26.= 0.92, which agrees with the JSON_HB selectivity.

            psergei Sergei Petrunia added a comment - The condition is impossible to satisfy but the optimizer doesn't recognize that. The ranges it is estimating and the estimates: DOUBLE_PREC_HB, non-COMPRESSED: "rows_estimation": [ { "selectivity_for_indexes": [], "selectivity_for_columns": [ { "column_name": "a", "ranges": ["c <= a"], "selectivity_from_histogram": 0.9296875 } ], "cond_selectivity": 0.9296875 }, DOUBLE_PREC_HB, COMPRESSED: "rows_estimation": [ { "selectivity_for_indexes": [], "selectivity_for_columns": [ { "column_name": "a", "ranges": ["c <= a"], "selectivity_from_histogram": 0.038462 } ], "cond_selectivity": 0.038462 }, JSON: "selectivity_for_columns": [ { "column_name": "a", "ranges": ["c <= a"], "selectivity_from_histogram": 0.93 } ], "cond_selectivity": 0.93 }, what would be the right selectivity for t.a >= 'c' ? The two values that are less are aaaaa and bbbbb. sel = (26 - 2)/26.= 0.92, which agrees with the JSON_HB selectivity.

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            julien.fritsch Julien Fritsch added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.
            JIraAutomate JiraAutomate added a comment -

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            JIraAutomate JiraAutomate added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.