Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26901

Estimation for filtered rows less precise with JSON histogram comparing to DOUBLE_PREC (#4)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • N/A
    • 10.8.1
    • Optimizer
    • None

    Description

      create or replace table t (f int);
      insert into t values
        (7),(5),(0),(5),(112),(9),(9),(7),(5),(9),
        (1),(7),(0),(6),(6),(2),(1),(6),(169),(7);
       
      select f from t where f in (77, 1, 144, 73, 14, 12);
       
      set histogram_type= JSON_HB;
      analyze table t persistent for all;
       
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
       
      set histogram_type= DOUBLE_PREC_HB;
      analyze table t persistent for all;
       
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
       
      # Cleanup
      drop table t;
      

      The result set is 2 rows (out of 20):

      preview-10.7-MDEV-26519-json-histograms 7fa0542dc6

      select f from t where f in (77, 1, 144, 73, 14, 12);
      f
      1
      1
      

      With DOUBLE_PREC_HB, filtered value is 10.98, which is quite good:

      set histogram_type= DOUBLE_PREC_HB;
      analyze table t persistent for all;
      Table	Op	Msg_type	Msg_text
      test.t	analyze	status	Engine-independent statistics collected
      test.t	analyze	status	OK
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
      EXPLAIN
      {
        "query_block": {
          "select_id": 1,
          "table": {
            "table_name": "t",
            "access_type": "ALL",
            "rows": 20,
            "filtered": 10.98332787,
            "attached_condition": "t.f in (77,1,144,73,14,12)"
          }
        }
      }
      

      With JSON_HB, it's 35, which isn't so good:

      set histogram_type= JSON_HB;
      analyze table t persistent for all;
      Table	Op	Msg_type	Msg_text
      test.t	analyze	status	Engine-independent statistics collected
      test.t	analyze	status	OK
      explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
      EXPLAIN
      {
        "query_block": {
          "select_id": 1,
          "table": {
            "table_name": "t",
            "access_type": "ALL",
            "rows": 20,
            "filtered": 35,
            "attached_condition": "t.f in (77,1,144,73,14,12)"
          }
        }
      }
      

      Attachments

        Issue Links

          Activity

            Raised to critical on procedural reasons. Feel free to demote and remove from "must-do" scope.

            elenst Elena Stepanova added a comment - Raised to critical on procedural reasons. Feel free to demote and remove from "must-do" scope.

            Debugging...

            analyze select f from t where f in (77, 1, 144, 73, 14, 12);
            

            Here, only the value "1" matches a histogram bucket:

                {
                  "start": "1",
                  "size": 0.1,
                  "ndv": 1
                },
            

            The rest of the values do not match a bucket and use this estimate:

                /*
                  The bucket has a single value and it doesn't match! Return a very
                  small value.
                */
                sel= 1.0 / total_rows;
            

            psergei Sergei Petrunia added a comment - Debugging... analyze select f from t where f in (77, 1, 144, 73, 14, 12); Here, only the value "1" matches a histogram bucket: { "start": "1", "size": 0.1, "ndv": 1 }, The rest of the values do not match a bucket and use this estimate: /* The bucket has a single value and it doesn't match! Return a very small value. */ sel= 1.0 / total_rows;

            that is, it seems to be working as intended...

            psergei Sergei Petrunia added a comment - that is, it seems to be working as intended...

            ... but it doesn't mean we should keep it this way. Adjusted the code to avoid this problem

            psergei Sergei Petrunia added a comment - ... but it doesn't mean we should keep it this way. Adjusted the code to avoid this problem

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.