Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22851

Engine independent index statistics are incorrect for large tables on Windows

Details

    Description

      On Windows systems, the engine independend index statistics are incorrect for large tables.
      The problem can be simulated as follows:

      1. install a new MariaDB 10.4.13 server on Windows.

      2. connect to the server and use the "test" schema.

      3. switch on engine independent statistics.

         SET use_stat_tables='preferably';
      

      4. create a table with about 60000 rows.

        create table mydata as select * from information_schema.columns;
        insert into mydata select * from mydata;  
        insert into mydata select * from mydata;
        insert into mydata select * from mydata;
        insert into mydata select * from mydata;
        insert into mydata select * from mydata;
      

      5. create index on TABLE_CATALOG.

         CREATE INDEX AAA ON mydata(TABLE_CATALOG);
      

      6.

      ANALYZE TABLE mydata;
      

      Since the TABLE_CATALOG column always contains the same value, the expectation is that avg_frequency equals rowcount (about 60000).
      This query returns about 60000.

          SELECT AVG(frq) FROM ( 
            SELECT TABLE_CATALOG,COUNT(*) AS frq FROM mydata GROUP BY TABLE_CATALOG
          )a;
      

      7.

      SELECT * FROM mysql.index_stats; 
      

      However, the table mysql.index_stats shows only about 16000 as avg_frequency for the TABLE_CATALOG column.

      On Linux, the values are as expected.
      The wrong values lead to very bad execution plans for larger tables.
      Furthermore my expectation is that Windows and Linux will behave the same.

      Attachments

        Activity

          Thanks for the report. Reproducible as described.

          create table mydata as select * from information_schema.columns;
          insert into mydata select * from mydata;  
          insert into mydata select * from mydata;
          insert into mydata select * from mydata;
          insert into mydata select * from mydata;
          insert into mydata select * from mydata;
          CREATE INDEX AAA ON mydata(TABLE_CATALOG);
          ANALYZE TABLE mydata PERSISTENT FOR ALL;
          SELECT AVG(frq) FROM (SELECT TABLE_CATALOG,COUNT(*) AS frq FROM mydata GROUP BY TABLE_CATALOG)a;
          SELECT * FROM mysql.index_stats;
          

          10.2.32 Windows x86_64

          MariaDB [test]> SELECT AVG(frq) FROM (SELECT TABLE_CATALOG,COUNT(*) AS frq FROM mydata GROUP BY TABLE_CATALOG)a;
          +------------+
          | AVG(frq)   |
          +------------+
          | 59552.0000 |
          +------------+
          1 row in set (0.016 sec)
           
          MariaDB [test]> SELECT * FROM mysql.index_stats;
          +---------+------------+------------+--------------+---------------+
          | db_name | table_name | index_name | prefix_arity | avg_frequency |
          +---------+------------+------------+--------------+---------------+
          | test    | mydata     | AAA        |            1 |    16602.3270 |
          +---------+------------+------------+--------------+---------------+
          1 row in set (0.000 sec)
          

          10.2.32 Linux

          MariaDB [test]> SELECT AVG(frq) FROM (SELECT TABLE_CATALOG,COUNT(*) AS frq FROM mydata GROUP BY TABLE_CATALOG)a;
          +------------+
          | AVG(frq)   |
          +------------+
          | 59552.0000 |
          +------------+
          1 row in set (0.013 sec)
           
          MariaDB [test]> SELECT * FROM mysql.index_stats;
          +---------+------------+------------+--------------+---------------+
          | db_name | table_name | index_name | prefix_arity | avg_frequency |
          +---------+------------+------------+--------------+---------------+
          | test    | mydata     | AAA        |            1 |    59552.0000 |
          +---------+------------+------------+--------------+---------------+
          1 row in set (0.000 sec)
          

          elenst Elena Stepanova added a comment - Thanks for the report. Reproducible as described. create table mydata as select * from information_schema.columns; insert into mydata select * from mydata; insert into mydata select * from mydata; insert into mydata select * from mydata; insert into mydata select * from mydata; insert into mydata select * from mydata; CREATE INDEX AAA ON mydata(TABLE_CATALOG); ANALYZE TABLE mydata PERSISTENT FOR ALL ; SELECT AVG (frq) FROM ( SELECT TABLE_CATALOG, COUNT (*) AS frq FROM mydata GROUP BY TABLE_CATALOG)a; SELECT * FROM mysql.index_stats; 10.2.32 Windows x86_64 MariaDB [test]> SELECT AVG (frq) FROM ( SELECT TABLE_CATALOG, COUNT (*) AS frq FROM mydata GROUP BY TABLE_CATALOG)a; + ------------+ | AVG (frq) | + ------------+ | 59552.0000 | + ------------+ 1 row in set (0.016 sec)   MariaDB [test]> SELECT * FROM mysql.index_stats; + ---------+------------+------------+--------------+---------------+ | db_name | table_name | index_name | prefix_arity | avg_frequency | + ---------+------------+------------+--------------+---------------+ | test | mydata | AAA | 1 | 16602.3270 | + ---------+------------+------------+--------------+---------------+ 1 row in set (0.000 sec) 10.2.32 Linux MariaDB [test]> SELECT AVG (frq) FROM ( SELECT TABLE_CATALOG, COUNT (*) AS frq FROM mydata GROUP BY TABLE_CATALOG)a; + ------------+ | AVG (frq) | + ------------+ | 59552.0000 | + ------------+ 1 row in set (0.013 sec)   MariaDB [test]> SELECT * FROM mysql.index_stats; + ---------+------------+------------+--------------+---------------+ | db_name | table_name | index_name | prefix_arity | avg_frequency | + ---------+------------+------------+--------------+---------------+ | test | mydata | AAA | 1 | 59552.0000 | + ---------+------------+------------+--------------+---------------+ 1 row in set (0.000 sec)

          Varun, please investigate

          psergei Sergei Petrunia added a comment - Varun, please investigate
          varun Varun Gupta (Inactive) added a comment - - edited

          The issue here:

          The average frequency is defined as
          cardinality(non-null values) / cardinality(distinct values)

          The average frequency for the index (AAA) on table t1:
          cardinality(non-null values) = 59552 (All rows in the table are NOT NULL)
          cardinality(distinct values) = 1 (All values are the same)

          So average_frequency : 59552
          The problem is when we store the average frequency in the internal structures, it is done in this way

            void set_avg_frequency(uint i, double val)
            {
              avg_frequency[i]= (ulong) (val * Scale_factor_avg_frequency);
            }
          

          The Scale_factor_avg_frequency is defined as 100000. So the multiplication would lead to an overflow as ulong is defined as 4 bytes on windows.

          https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/32862b84-f6e6-40f9-85ca-c4faf985b822.

          The solution is to switch avg_frequency to ulonglong so that 8 bytes is used and we don't overflow.

          varun Varun Gupta (Inactive) added a comment - - edited The issue here: The average frequency is defined as cardinality(non-null values) / cardinality(distinct values) The average frequency for the index (AAA) on table t1: cardinality(non-null values) = 59552 (All rows in the table are NOT NULL) cardinality(distinct values) = 1 (All values are the same) So average_frequency : 59552 The problem is when we store the average frequency in the internal structures, it is done in this way void set_avg_frequency(uint i, double val) { avg_frequency[i]= (ulong) (val * Scale_factor_avg_frequency); } The Scale_factor_avg_frequency is defined as 100000. So the multiplication would lead to an overflow as ulong is defined as 4 bytes on windows. https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/32862b84-f6e6-40f9-85ca-c4faf985b822 . The solution is to switch avg_frequency to ulonglong so that 8 bytes is used and we don't overflow.
          varun Varun Gupta (Inactive) added a comment - http://lists.askmonty.org/pipermail/commits/2020-July/014284.html

          Ok to push.

          psergei Sergei Petrunia added a comment - Ok to push.

          People

            varun Varun Gupta (Inactive)
            lars84 Lars Hansen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.