Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27243

Estimation for filtered rows less precise with JSON histogram comparing to DOUBLE_PREC (#7) upon TIME comparison

Details

    Description

      Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size.

      --source include/have_sequence.inc
       
      CREATE TABLE t1 (f TIME) ENGINE=MyISAM;
      INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000;
       
      SET histogram_type= JSON_HB;
      ANALYZE TABLE t1 PERSISTENT FOR ALL;
      ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';
       
      SET histogram_type= DOUBLE_PREC_HB;
      ANALYZE TABLE t1 PERSISTENT FOR ALL;
      ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';
       
      DROP TABLE t1;
      

      JSON_HB:

      preview-10.8-MDEV-26519-json-histograms da3231a8

              "table": {
                "table_name": "t1",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 1000,
                "r_rows": 1000,
                "r_table_time_ms": 0.037146099,
                "r_other_time_ms": 0.077310006,
                "filtered": 99.75002289,
                "r_filtered": 50,
                "attached_condition": "t1.f > '00:01:00'"
              }
      

      DOUBLE_PREC_HB:

              "table": {
                "table_name": "t1",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 1000,
                "r_rows": 1000,
                "r_table_time_ms": 0.033997955,
                "r_other_time_ms": 0.07762081,
                "filtered": 50,
                "r_filtered": 50,
                "attached_condition": "t1.f > '00:01:00'"
              }
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            elenst Elena Stepanova made changes -
            Description _Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._

            {code:sql}
            --source include/have_sequence.inc

            CREATE TABLE t1 (f TIME) ENGINE=MyISAM;
            INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000;

            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            SET histogram_type= JSON_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            SET histogram_type= DOUBLE_PREC_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            DROP TABLE t1;
            {code}

            JSON_HB:
            {code:json|title=preview-10.8-MDEV-26519-json-histograms da3231a8}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.037146099,
                      "r_other_time_ms": 0.077310006,
                      "filtered": 99.75002289,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}
            {code:json}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.033997955,
                      "r_other_time_ms": 0.07762081,
                      "filtered": 50,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}
            _Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._

            {code:sql}
            --source include/have_sequence.inc

            CREATE TABLE t1 (f TIME) ENGINE=MyISAM;
            INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000;

            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            SET histogram_type= JSON_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            SET histogram_type= DOUBLE_PREC_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            DROP TABLE t1;
            {code}

            JSON_HB:
            {code:json|title=preview-10.8-MDEV-26519-json-histograms da3231a8}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.037146099,
                      "r_other_time_ms": 0.077310006,
                      "filtered": 99.75002289,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}
            DOUBLE_PREC_HB:
            {code:json}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.033997955,
                      "r_other_time_ms": 0.07762081,
                      "filtered": 50,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}
            elenst Elena Stepanova made changes -
            Description _Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._

            {code:sql}
            --source include/have_sequence.inc

            CREATE TABLE t1 (f TIME) ENGINE=MyISAM;
            INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000;

            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            SET histogram_type= JSON_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            SET histogram_type= DOUBLE_PREC_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            DROP TABLE t1;
            {code}

            JSON_HB:
            {code:json|title=preview-10.8-MDEV-26519-json-histograms da3231a8}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.037146099,
                      "r_other_time_ms": 0.077310006,
                      "filtered": 99.75002289,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}
            DOUBLE_PREC_HB:
            {code:json}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.033997955,
                      "r_other_time_ms": 0.07762081,
                      "filtered": 50,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}
            _Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._

            {code:sql}
            --source include/have_sequence.inc

            CREATE TABLE t1 (f TIME) ENGINE=MyISAM;
            INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000;

            SET histogram_type= JSON_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            SET histogram_type= DOUBLE_PREC_HB;
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00';

            DROP TABLE t1;
            {code}

            JSON_HB:
            {code:json|title=preview-10.8-MDEV-26519-json-histograms da3231a8}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.037146099,
                      "r_other_time_ms": 0.077310006,
                      "filtered": 99.75002289,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}
            DOUBLE_PREC_HB:
            {code:json}
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "r_loops": 1,
                      "rows": 1000,
                      "r_rows": 1000,
                      "r_table_time_ms": 0.033997955,
                      "r_other_time_ms": 0.07762081,
                      "filtered": 50,
                      "r_filtered": 50,
                      "attached_condition": "t1.f > '00:01:00'"
                    }
            {code}

              "histogram_hb": [
                {
                  "start": "00:00:00",
                  "size": 0.5,
                  "ndv": 1
                },
                {
                  "start": "02:00:02",
                  "size": 0.004,
                  "ndv": 4
                },
            

            ANALYZE SELECT * FROM t1 WHERE f > '00:01:00';
            filtered=99.7
            r_filtered=50

            in Histogram_json_hb::range_selectivity():

            exclusive_endp=true
            equal=false
            idx=0
            

            We hit the first bucket...
            and then we compute sel= position_in_interval()= 0.0049

            min= 0.0049 * (bucket_size=0.5) = 0.002
            max=1.0
            

            which gives filtered=99%.

            Things go wrong when we consider the singleton bucket to be a general bucket (which spans from 00:00 to 02:00) and call position_in_interval(00:01). This returns some value (which doesn't look perfect btw), but the issue here is that for the singleton bucket we can see that min_endp > bucket_start which means sel=1.0.

            psergei Sergei Petrunia added a comment - "histogram_hb": [ { "start": "00:00:00", "size": 0.5, "ndv": 1 }, { "start": "02:00:02", "size": 0.004, "ndv": 4 }, ANALYZE SELECT * FROM t1 WHERE f > '00:01:00'; filtered=99.7 r_filtered=50 in Histogram_json_hb::range_selectivity(): exclusive_endp=true equal=false idx=0 We hit the first bucket... and then we compute sel= position_in_interval()= 0.0049 min= 0.0049 * (bucket_size=0.5) = 0.002 max=1.0 which gives filtered=99%. Things go wrong when we consider the singleton bucket to be a general bucket (which spans from 00:00 to 02:00) and call position_in_interval(00:01). This returns some value (which doesn't look perfect btw), but the issue here is that for the singleton bucket we can see that min_endp > bucket_start which means sel=1.0.

            Fixed by fix for MDEV-27229. Added a testcase.

            psergei Sergei Petrunia added a comment - Fixed by fix for MDEV-27229 . Added a testcase.
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.8.1 [ 26815 ]
            Fix Version/s 10.8 [ 26121 ]
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Closed [ 6 ]

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.