Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22583

Selectivity for BIT columns in filtered column for EXPLAIN is incorrect

Details

    Description

      SET optimizer_use_condition_selectivity=4;
      SET histogram_size=255;
      SET use_stat_tables='preferably';
      CREATE TABLE t1 (a BIT(32), b INT);
      INSERT INTO t1 VALUES (80, 80), (81, 81), (82, 82);
      ANALYZE TABLE t1 PERSISTENT FOR ALL;
      

      MariaDB [test]> EXPLAIN EXTENDED SELECT * from t1 where t1.a >= 81;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    3 |    99.61 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)
      

      So filtered here shows 99.61 % which is incorrect, selectivity should be ~66%

      Lets try with INT instead of BIT(32)

      MariaDB [test]> EXPLAIN EXTENDED SELECT * from t1 where t1.b >= 81;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    3 |    66.41 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)
      

      Filtered here is ~66%, so this looks correct.

      Attachments

        Issue Links

          Activity

            The value for BIT fields in the statistical fields are stored as strings.

            To get the value from the statistical field we enter the function
            Field_varstring::val_str

            (lldb) p val.ptr()
            (const char *) $5 = 0x000061e000031b6d "80"
            (lldb) p val.length()
            (uint32) $6 = 2
            

            So the statistical field has value 80, stored in string representation

            In the function Field_bit::store we have:

            (lldb) p ptr[6]
            (uchar) $18 = '8'
            (lldb) p ptr[7]
            (uchar) $19 = '0'
            

            So the value 80 is stored as 2 bytes in the byte field which is not correct,
            the value used was 80 so that can fit it one byte itself.
            For BIT fields while getting the values from the statistical fields we should
            get the integral value from the text representation.
            This would make it consistent with the values being stored into the stat tables
            and retrieving the values from the stat tables.

            varun Varun Gupta (Inactive) added a comment - The value for BIT fields in the statistical fields are stored as strings. To get the value from the statistical field we enter the function Field_varstring::val_str (lldb) p val.ptr() ( const char *) $5 = 0x000061e000031b6d "80" (lldb) p val.length() (uint32) $6 = 2 So the statistical field has value 80, stored in string representation In the function Field_bit::store we have: (lldb) p ptr[6] (uchar) $18 = '8' (lldb) p ptr[7] (uchar) $19 = '0' So the value 80 is stored as 2 bytes in the byte field which is not correct, the value used was 80 so that can fit it one byte itself. For BIT fields while getting the values from the statistical fields we should get the integral value from the text representation. This would make it consistent with the values being stored into the stat tables and retrieving the values from the stat tables.
            varun Varun Gupta (Inactive) added a comment - Patch https://github.com/MariaDB/server/commit/fc3da78de15adf5724f4c46873cb9f398d3b38d2

            I have created a new patch where Varun's idea us used, but done with virtual functions instead of 'if (field_type == Field_bit).

            If Varun accepts this, it's ok to push. However, please fix commit message so that the first row is not split into two rows. Better to have the full MDEV message on the first row

            monty Michael Widenius added a comment - I have created a new patch where Varun's idea us used, but done with virtual functions instead of 'if (field_type == Field_bit). If Varun accepts this, it's ok to push. However, please fix commit message so that the first row is not split into two rows. Better to have the full MDEV message on the first row

            The reason bit fields are different is that they are basically a string type but used by statistics as a numerical type.
            The fix in my patch handles that by creating a specialized Field_bit version of
            Field_bit::store_to_statistical_minmax_field(Field *field, String *val)
            int Field_bit::store_from_statistical_minmax_field(Field *stat_field, String *str)
            that takes care of this.

            monty Michael Widenius added a comment - The reason bit fields are different is that they are basically a string type but used by statistics as a numerical type. The fix in my patch handles that by creating a specialized Field_bit version of Field_bit::store_to_statistical_minmax_field(Field *field, String *val) int Field_bit::store_from_statistical_minmax_field(Field *stat_field, String *str) that takes care of this.

            Have pushed my patch to 10.2. Will use monty patch for 10.5 as there will be conflicts when this code is merged upwards.

            varun Varun Gupta (Inactive) added a comment - Have pushed my patch to 10.2. Will use monty patch for 10.5 as there will be conflicts when this code is merged upwards.

            Diff for that is:

            diff --git a/sql/field.cc b/sql/field.cc
            index 45b3a3c703e..7bae5fabc3b 100644
            --- a/sql/field.cc
            +++ b/sql/field.cc
            @@ -9802,6 +9802,22 @@ int Field_bit::store(const char *from, size_t length, CHARSET_INFO *cs)
             }
             
             
            +/* Statistics tables stores bit fields as numbers */
            +
            +int Field_bit::store_to_statistical_minmax_field(Field *field, String *val)
            +{
            +  longlong value= Field_bit::val_int();
            +  return field->store(value, true);
            +}
            +
            +
            +int Field_bit::store_from_statistical_minmax_field(Field *stat_field,
            +                                                   String *str)
            +{
            +  return Field_bit::store(stat_field->val_int(), true);
            +}
            +
            +
             int Field_bit::store(double nr)
             {
               return Field_bit::store((longlong) nr, FALSE);
            diff --git a/sql/field.h b/sql/field.h
            index dfc02149f9d..a63b4b4f5f7 100644
            --- a/sql/field.h
            +++ b/sql/field.h
            @@ -4899,6 +4899,8 @@ class Field_bit :public Field {
               int store(double nr) override;
               int store(longlong nr, bool unsigned_val) override;
               int store_decimal(const my_decimal *) override;
            +  int store_to_statistical_minmax_field(Field *field, String *val) override;
            +  int store_from_statistical_minmax_field(Field *field, String *val) override;
               double val_real() override;
               longlong val_int() override;
               String *val_str(String*, String *) override;
            diff --git a/sql/sql_statistics.cc b/sql/sql_statistics.cc
            index 7b600bd45c4..917e2ffa734 100644
            --- a/sql/sql_statistics.cc
            +++ b/sql/sql_statistics.cc
            @@ -1037,28 +1037,14 @@ class Column_stat: public Stat_table
                     switch (i) {
                     case COLUMN_STAT_MIN_VALUE:
                     {
            -          /*
            -            TODO varun: After MDEV-22583 is fixed, add a function in Field_bit
            -            and move this implementation there
            -          */
            -          if (table_field->type() == MYSQL_TYPE_BIT)
            -            stat_field->store(table_field->collected_stats->min_value->val_int(),true);
            -          else
            -          {
            -            Field *field= table_field->collected_stats->min_value;
            -            field->store_to_statistical_minmax_field(stat_field, &val);
            -          }
            +          Field *field= table_field->collected_stats->min_value;
            +          field->store_to_statistical_minmax_field(stat_field, &val);
                       break;
                     }
                     case COLUMN_STAT_MAX_VALUE:
                     {
            -          if (table_field->type() == MYSQL_TYPE_BIT)
            -            stat_field->store(table_field->collected_stats->max_value->val_int(),true);
            -          else
            -          {
            -            Field *field= table_field->collected_stats->max_value;
            -            field->store_to_statistical_minmax_field(stat_field, &val);
            -          }
            +          Field *field= table_field->collected_stats->max_value;
            +          field->store_to_statistical_minmax_field(stat_field, &val);
                       break;
                     }
            
            

            varun Varun Gupta (Inactive) added a comment - Diff for that is: diff --git a/sql/field.cc b/sql/field.cc index 45b3a3c703e..7bae5fabc3b 100644 --- a/sql/field.cc +++ b/sql/field.cc @@ -9802,6 +9802,22 @@ int Field_bit::store(const char *from, size_t length, CHARSET_INFO *cs) } +/* Statistics tables stores bit fields as numbers */ + +int Field_bit::store_to_statistical_minmax_field(Field *field, String *val) +{ + longlong value= Field_bit::val_int(); + return field->store(value, true); +} + + +int Field_bit::store_from_statistical_minmax_field(Field *stat_field, + String *str) +{ + return Field_bit::store(stat_field->val_int(), true); +} + + int Field_bit::store(double nr) { return Field_bit::store((longlong) nr, FALSE); diff --git a/sql/field.h b/sql/field.h index dfc02149f9d..a63b4b4f5f7 100644 --- a/sql/field.h +++ b/sql/field.h @@ -4899,6 +4899,8 @@ class Field_bit :public Field { int store(double nr) override; int store(longlong nr, bool unsigned_val) override; int store_decimal(const my_decimal *) override; + int store_to_statistical_minmax_field(Field *field, String *val) override; + int store_from_statistical_minmax_field(Field *field, String *val) override; double val_real() override; longlong val_int() override; String *val_str(String*, String *) override; diff --git a/sql/sql_statistics.cc b/sql/sql_statistics.cc index 7b600bd45c4..917e2ffa734 100644 --- a/sql/sql_statistics.cc +++ b/sql/sql_statistics.cc @@ -1037,28 +1037,14 @@ class Column_stat: public Stat_table switch (i) { case COLUMN_STAT_MIN_VALUE: { - /* - TODO varun: After MDEV-22583 is fixed, add a function in Field_bit - and move this implementation there - */ - if (table_field->type() == MYSQL_TYPE_BIT) - stat_field->store(table_field->collected_stats->min_value->val_int(),true); - else - { - Field *field= table_field->collected_stats->min_value; - field->store_to_statistical_minmax_field(stat_field, &val); - } + Field *field= table_field->collected_stats->min_value; + field->store_to_statistical_minmax_field(stat_field, &val); break; } case COLUMN_STAT_MAX_VALUE: { - if (table_field->type() == MYSQL_TYPE_BIT) - stat_field->store(table_field->collected_stats->max_value->val_int(),true); - else - { - Field *field= table_field->collected_stats->max_value; - field->store_to_statistical_minmax_field(stat_field, &val); - } + Field *field= table_field->collected_stats->max_value; + field->store_to_statistical_minmax_field(stat_field, &val); break; }

            People

              varun Varun Gupta (Inactive)
              varun Varun Gupta (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.