Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22583

Selectivity for BIT columns in filtered column for EXPLAIN is incorrect

Details

    Description

      SET optimizer_use_condition_selectivity=4;
      SET histogram_size=255;
      SET use_stat_tables='preferably';
      CREATE TABLE t1 (a BIT(32), b INT);
      INSERT INTO t1 VALUES (80, 80), (81, 81), (82, 82);
      ANALYZE TABLE t1 PERSISTENT FOR ALL;
      

      MariaDB [test]> EXPLAIN EXTENDED SELECT * from t1 where t1.a >= 81;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    3 |    99.61 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)
      

      So filtered here shows 99.61 % which is incorrect, selectivity should be ~66%

      Lets try with INT instead of BIT(32)

      MariaDB [test]> EXPLAIN EXTENDED SELECT * from t1 where t1.b >= 81;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    3 |    66.41 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)
      

      Filtered here is ~66%, so this looks correct.

      Attachments

        Issue Links

          Activity

            varun Varun Gupta (Inactive) created issue -
            varun Varun Gupta (Inactive) made changes -
            Field Original Value New Value
            Description {code:sql}
            SET optimizer_use_condition_selectivity=4;
            SET histogram_size=255;
            SET use_stat_tables='preferably';
            CREATE TABLE t1 (a BIT(32), b INT);
            INSERT INTO t1 VALUES (80, 80), (81, 81), (82, 82);
            ANALYZE TABLE t1 PERSISTENT FOR ALL;
            {code}


            {noformat}
            MariaDB [test]> EXPLAIN EXTENDED SELECT * from t1 where t1.a >= 81;
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 3 | 99.61 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            1 row in set, 1 warning (0.00 sec)
            {noformat}

            So filtered here shows 99.61 % which is incorrect, selectivity should be ~66%

            Lets try with INT instead of BIT(32)

            {noformat}
            MariaDB [test]> EXPLAIN EXTENDED SELECT * from t1 where t1.b >= 81;
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 3 | 66.41 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            1 row in set, 1 warning (0.00 sec)
            {noformat}
            Filtered here is ~66%, so this looks correct.

            The value for BIT fields in the statistical fields are stored as strings.

            To get the value from the statistical field we enter the function
            Field_varstring::val_str

            (lldb) p val.ptr()
            (const char *) $5 = 0x000061e000031b6d "80"
            (lldb) p val.length()
            (uint32) $6 = 2
            

            So the statistical field has value 80, stored in string representation

            In the function Field_bit::store we have:

            (lldb) p ptr[6]
            (uchar) $18 = '8'
            (lldb) p ptr[7]
            (uchar) $19 = '0'
            

            So the value 80 is stored as 2 bytes in the byte field which is not correct,
            the value used was 80 so that can fit it one byte itself.
            For BIT fields while getting the values from the statistical fields we should
            get the integral value from the text representation.
            This would make it consistent with the values being stored into the stat tables
            and retrieving the values from the stat tables.

            varun Varun Gupta (Inactive) added a comment - The value for BIT fields in the statistical fields are stored as strings. To get the value from the statistical field we enter the function Field_varstring::val_str (lldb) p val.ptr() ( const char *) $5 = 0x000061e000031b6d "80" (lldb) p val.length() (uint32) $6 = 2 So the statistical field has value 80, stored in string representation In the function Field_bit::store we have: (lldb) p ptr[6] (uchar) $18 = '8' (lldb) p ptr[7] (uchar) $19 = '0' So the value 80 is stored as 2 bytes in the byte field which is not correct, the value used was 80 so that can fit it one byte itself. For BIT fields while getting the values from the statistical fields we should get the integral value from the text representation. This would make it consistent with the values being stored into the stat tables and retrieving the values from the stat tables.
            varun Varun Gupta (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            varun Varun Gupta (Inactive) added a comment - Patch https://github.com/MariaDB/server/commit/fc3da78de15adf5724f4c46873cb9f398d3b38d2
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            varun Varun Gupta (Inactive) made changes -
            Assignee Sergei Petrunia [ psergey ] Igor Babaev [ igor ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.1 [ 16100 ]
            varun Varun Gupta (Inactive) made changes -
            Assignee Igor Babaev [ igor ] Michael Widenius [ monty ]

            I have created a new patch where Varun's idea us used, but done with virtual functions instead of 'if (field_type == Field_bit).

            If Varun accepts this, it's ok to push. However, please fix commit message so that the first row is not split into two rows. Better to have the full MDEV message on the first row

            monty Michael Widenius added a comment - I have created a new patch where Varun's idea us used, but done with virtual functions instead of 'if (field_type == Field_bit). If Varun accepts this, it's ok to push. However, please fix commit message so that the first row is not split into two rows. Better to have the full MDEV message on the first row

            The reason bit fields are different is that they are basically a string type but used by statistics as a numerical type.
            The fix in my patch handles that by creating a specialized Field_bit version of
            Field_bit::store_to_statistical_minmax_field(Field *field, String *val)
            int Field_bit::store_from_statistical_minmax_field(Field *stat_field, String *str)
            that takes care of this.

            monty Michael Widenius added a comment - The reason bit fields are different is that they are basically a string type but used by statistics as a numerical type. The fix in my patch handles that by creating a specialized Field_bit version of Field_bit::store_to_statistical_minmax_field(Field *field, String *val) int Field_bit::store_from_statistical_minmax_field(Field *stat_field, String *str) that takes care of this.
            varun Varun Gupta (Inactive) made changes -
            Assignee Michael Widenius [ monty ] Varun Gupta [ varun ]
            varun Varun Gupta (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            varun Varun Gupta (Inactive) made changes -
            Fix Version/s 10.2.37 [ 25112 ]
            Fix Version/s 10.3.28 [ 25111 ]
            Fix Version/s 10.4.18 [ 25110 ]
            Fix Version/s 10.5.9 [ 25109 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]

            Have pushed my patch to 10.2. Will use monty patch for 10.5 as there will be conflicts when this code is merged upwards.

            varun Varun Gupta (Inactive) added a comment - Have pushed my patch to 10.2. Will use monty patch for 10.5 as there will be conflicts when this code is merged upwards.

            Diff for that is:

            diff --git a/sql/field.cc b/sql/field.cc
            index 45b3a3c703e..7bae5fabc3b 100644
            --- a/sql/field.cc
            +++ b/sql/field.cc
            @@ -9802,6 +9802,22 @@ int Field_bit::store(const char *from, size_t length, CHARSET_INFO *cs)
             }
             
             
            +/* Statistics tables stores bit fields as numbers */
            +
            +int Field_bit::store_to_statistical_minmax_field(Field *field, String *val)
            +{
            +  longlong value= Field_bit::val_int();
            +  return field->store(value, true);
            +}
            +
            +
            +int Field_bit::store_from_statistical_minmax_field(Field *stat_field,
            +                                                   String *str)
            +{
            +  return Field_bit::store(stat_field->val_int(), true);
            +}
            +
            +
             int Field_bit::store(double nr)
             {
               return Field_bit::store((longlong) nr, FALSE);
            diff --git a/sql/field.h b/sql/field.h
            index dfc02149f9d..a63b4b4f5f7 100644
            --- a/sql/field.h
            +++ b/sql/field.h
            @@ -4899,6 +4899,8 @@ class Field_bit :public Field {
               int store(double nr) override;
               int store(longlong nr, bool unsigned_val) override;
               int store_decimal(const my_decimal *) override;
            +  int store_to_statistical_minmax_field(Field *field, String *val) override;
            +  int store_from_statistical_minmax_field(Field *field, String *val) override;
               double val_real() override;
               longlong val_int() override;
               String *val_str(String*, String *) override;
            diff --git a/sql/sql_statistics.cc b/sql/sql_statistics.cc
            index 7b600bd45c4..917e2ffa734 100644
            --- a/sql/sql_statistics.cc
            +++ b/sql/sql_statistics.cc
            @@ -1037,28 +1037,14 @@ class Column_stat: public Stat_table
                     switch (i) {
                     case COLUMN_STAT_MIN_VALUE:
                     {
            -          /*
            -            TODO varun: After MDEV-22583 is fixed, add a function in Field_bit
            -            and move this implementation there
            -          */
            -          if (table_field->type() == MYSQL_TYPE_BIT)
            -            stat_field->store(table_field->collected_stats->min_value->val_int(),true);
            -          else
            -          {
            -            Field *field= table_field->collected_stats->min_value;
            -            field->store_to_statistical_minmax_field(stat_field, &val);
            -          }
            +          Field *field= table_field->collected_stats->min_value;
            +          field->store_to_statistical_minmax_field(stat_field, &val);
                       break;
                     }
                     case COLUMN_STAT_MAX_VALUE:
                     {
            -          if (table_field->type() == MYSQL_TYPE_BIT)
            -            stat_field->store(table_field->collected_stats->max_value->val_int(),true);
            -          else
            -          {
            -            Field *field= table_field->collected_stats->max_value;
            -            field->store_to_statistical_minmax_field(stat_field, &val);
            -          }
            +          Field *field= table_field->collected_stats->max_value;
            +          field->store_to_statistical_minmax_field(stat_field, &val);
                       break;
                     }
            
            

            varun Varun Gupta (Inactive) added a comment - Diff for that is: diff --git a/sql/field.cc b/sql/field.cc index 45b3a3c703e..7bae5fabc3b 100644 --- a/sql/field.cc +++ b/sql/field.cc @@ -9802,6 +9802,22 @@ int Field_bit::store(const char *from, size_t length, CHARSET_INFO *cs) } +/* Statistics tables stores bit fields as numbers */ + +int Field_bit::store_to_statistical_minmax_field(Field *field, String *val) +{ + longlong value= Field_bit::val_int(); + return field->store(value, true); +} + + +int Field_bit::store_from_statistical_minmax_field(Field *stat_field, + String *str) +{ + return Field_bit::store(stat_field->val_int(), true); +} + + int Field_bit::store(double nr) { return Field_bit::store((longlong) nr, FALSE); diff --git a/sql/field.h b/sql/field.h index dfc02149f9d..a63b4b4f5f7 100644 --- a/sql/field.h +++ b/sql/field.h @@ -4899,6 +4899,8 @@ class Field_bit :public Field { int store(double nr) override; int store(longlong nr, bool unsigned_val) override; int store_decimal(const my_decimal *) override; + int store_to_statistical_minmax_field(Field *field, String *val) override; + int store_from_statistical_minmax_field(Field *field, String *val) override; double val_real() override; longlong val_int() override; String *val_str(String*, String *) override; diff --git a/sql/sql_statistics.cc b/sql/sql_statistics.cc index 7b600bd45c4..917e2ffa734 100644 --- a/sql/sql_statistics.cc +++ b/sql/sql_statistics.cc @@ -1037,28 +1037,14 @@ class Column_stat: public Stat_table switch (i) { case COLUMN_STAT_MIN_VALUE: { - /* - TODO varun: After MDEV-22583 is fixed, add a function in Field_bit - and move this implementation there - */ - if (table_field->type() == MYSQL_TYPE_BIT) - stat_field->store(table_field->collected_stats->min_value->val_int(),true); - else - { - Field *field= table_field->collected_stats->min_value; - field->store_to_statistical_minmax_field(stat_field, &val); - } + Field *field= table_field->collected_stats->min_value; + field->store_to_statistical_minmax_field(stat_field, &val); break; } case COLUMN_STAT_MAX_VALUE: { - if (table_field->type() == MYSQL_TYPE_BIT) - stat_field->store(table_field->collected_stats->max_value->val_int(),true); - else - { - Field *field= table_field->collected_stats->max_value; - field->store_to_statistical_minmax_field(stat_field, &val); - } + Field *field= table_field->collected_stats->max_value; + field->store_to_statistical_minmax_field(stat_field, &val); break; }
            dbart Daniel Bartholomew made changes -
            Fix Version/s 10.2.38 [ 25207 ]
            Fix Version/s 10.2.37 [ 25112 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.5.10 [ 25204 ]
            Fix Version/s 10.4.19 [ 25205 ]
            Fix Version/s 10.3.29 [ 25206 ]
            Fix Version/s 10.5.9 [ 25109 ]
            Fix Version/s 10.4.18 [ 25110 ]
            Fix Version/s 10.2.38 [ 25207 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.2.38 [ 25207 ]
            Fix Version/s 10.3.29 [ 25206 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 108778 ] MariaDB v4 [ 157783 ]

            People

              varun Varun Gupta (Inactive)
              varun Varun Gupta (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.