Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26901

Estimation for filtered rows less precise with JSON histogram comparing to DOUBLE_PREC (#4)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • N/A
    • 10.8.1
    • Optimizer
    • None

    Description

      create or replace table t (f int);
      insert into t values
        (7),(5),(0),(5),(112),(9),(9),(7),(5),(9),
        (1),(7),(0),(6),(6),(2),(1),(6),(169),(7);
       
      select f from t where f in (77, 1, 144, 73, 14, 12);
       
      set histogram_type= JSON_HB;
      analyze table t persistent for all;
       
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
       
      set histogram_type= DOUBLE_PREC_HB;
      analyze table t persistent for all;
       
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
       
      # Cleanup
      drop table t;
      

      The result set is 2 rows (out of 20):

      preview-10.7-MDEV-26519-json-histograms 7fa0542dc6

      select f from t where f in (77, 1, 144, 73, 14, 12);
      f
      1
      1
      

      With DOUBLE_PREC_HB, filtered value is 10.98, which is quite good:

      set histogram_type= DOUBLE_PREC_HB;
      analyze table t persistent for all;
      Table	Op	Msg_type	Msg_text
      test.t	analyze	status	Engine-independent statistics collected
      test.t	analyze	status	OK
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
      EXPLAIN
      {
        "query_block": {
          "select_id": 1,
          "table": {
            "table_name": "t",
            "access_type": "ALL",
            "rows": 20,
            "filtered": 10.98332787,
            "attached_condition": "t.f in (77,1,144,73,14,12)"
          }
        }
      }
      

      With JSON_HB, it's 35, which isn't so good:

      set histogram_type= JSON_HB;
      analyze table t persistent for all;
      Table	Op	Msg_type	Msg_text
      test.t	analyze	status	Engine-independent statistics collected
      test.t	analyze	status	OK
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
      EXPLAIN
      {
        "query_block": {
          "select_id": 1,
          "table": {
            "table_name": "t",
            "access_type": "ALL",
            "rows": 20,
            "filtered": 35,
            "attached_condition": "t.f in (77,1,144,73,14,12)"
          }
        }
      }
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            elenst Elena Stepanova made changes -
            Field Original Value New Value

            Raised to critical on procedural reasons. Feel free to demote and remove from "must-do" scope.

            elenst Elena Stepanova added a comment - Raised to critical on procedural reasons. Feel free to demote and remove from "must-do" scope.

            Debugging...

            analyze select f from t where f in (77, 1, 144, 73, 14, 12);
            

            Here, only the value "1" matches a histogram bucket:

                {
                  "start": "1",
                  "size": 0.1,
                  "ndv": 1
                },
            

            The rest of the values do not match a bucket and use this estimate:

                /*
                  The bucket has a single value and it doesn't match! Return a very
                  small value.
                */
                sel= 1.0 / total_rows;
            

            psergei Sergei Petrunia added a comment - Debugging... analyze select f from t where f in (77, 1, 144, 73, 14, 12); Here, only the value "1" matches a histogram bucket: { "start": "1", "size": 0.1, "ndv": 1 }, The rest of the values do not match a bucket and use this estimate: /* The bucket has a single value and it doesn't match! Return a very small value. */ sel= 1.0 / total_rows;

            that is, it seems to be working as intended...

            psergei Sergei Petrunia added a comment - that is, it seems to be working as intended...
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 126624 ] MariaDB v4 [ 143279 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.7 [ 24805 ]
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.8.1 [ 26815 ]
            Fix Version/s 10.8 [ 26121 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]

            ... but it doesn't mean we should keep it this way. Adjusted the code to avoid this problem

            psergei Sergei Petrunia added a comment - ... but it doesn't mean we should keep it this way. Adjusted the code to avoid this problem

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.