Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27229

Estimation for filtered rows less precise with JSON histogram comparing to DOUBLE_PREC (#5)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • N/A
    • 10.8.1
    • Optimizer
    • None

    Description

      --source include/have_sequence.inc
       
      create table t1 (id int, a varchar(8)) engine=MyISAM;
      insert into t1 select seq, 'bar' from seq_1_to_100;
      insert into t1 select id, 'qux' from t1;
       
      set histogram_type=JSON_HB;
      analyze table t1 persistent for all;
       
      explain format=json select COUNT(*) FROM t1 WHERE a > 'foo';
       
      set histogram_type=DOUBLE_PREC_HB;
      analyze table t1 persistent for all;
       
      explain format=json select COUNT(*) FROM t1 WHERE a > 'foo';
       
      # Cleanup
      drop table t1;
      

      The condition filters a half of the rows (obviously).

      With JSON_HB the estimation is quite far off:

      preview-10.8-MDEV-26519-json-histograms 98cb4351

        "query_block": {
          "select_id": 1,
          "nested_loop": [
            {
              "table": {
                "table_name": "t1",
                "access_type": "ALL",
                "rows": 200,
                "filtered": 86.55463409,
                "attached_condition": "t1.a > 'foo'"
              }
            }
          ]
        }
      

      With DOUBLE_PREC the estimation is in this case precise:

        "query_block": {
          "select_id": 1,
          "nested_loop": [
            {
              "table": {
                "table_name": "t1",
                "access_type": "ALL",
                "rows": 200,
                "filtered": 50,
                "attached_condition": "t1.a > 'foo'"
              }
            }
          ]
        }
      

      Attachments

        Issue Links

          Activity

            Raised to critical on procedural reasons.

            elenst Elena Stepanova added a comment - Raised to critical on procedural reasons.

            Investigation:
            The histogram is:

            {
              "target_histogram_size": 254,
              "histogram_hb": [
                {
                  "start": "bar",
                  "size": 0.5,
                  "ndv": 1
                },
                {
                  "start": "qux",
                  "end": "qux",
                  "size": 0.5,
                  "ndv": 1
                }
              ]
            }
            

            We search for

            a > 'foo'

            'foo' is between 'bar' and 'qux'.

            Histogram_json_hb::range_selectivity finds that 'foo' is after buckets[0].start_value and buckets[1].start_value.

            It assumes that buckets[0].size values are unformly distributed between buckets[0].start_value and buckets[1].start_value.

            It computes the position of 'foo' betwen 'bar' and 'qux', which is 0.26, and then returns the selectivity of 1 - (0.5 * 0.26)=0.86

            psergei Sergei Petrunia added a comment - Investigation: The histogram is: { "target_histogram_size": 254, "histogram_hb": [ { "start": "bar", "size": 0.5, "ndv": 1 }, { "start": "qux", "end": "qux", "size": 0.5, "ndv": 1 } ] } We search for a > 'foo' 'foo' is between 'bar' and 'qux'. Histogram_json_hb::range_selectivity finds that 'foo' is after buckets[0].start_value and buckets[1].start_value . It assumes that buckets[0].size values are unformly distributed between buckets[0].start_value and buckets[1].start_value . It computes the position of 'foo' betwen 'bar' and 'qux', which is 0.26, and then returns the selectivity of 1 - (0.5 * 0.26)=0.86

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.