Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25922

Statistics for varchar(N) column of Aria table in information_schema.statistics is wrong with EITS

    XMLWordPrintable

    Details

      Description

      information_schema.statistics table is supposed to reflect current statistics used by the optiumizer, so DBA can check the values there and compare to real data to find out if ANALYZE is needed, and use the data fro troubleshooting slow queries/bad plans.But it semes that engine-independent table statistics for VARCHAR(N) column of Aria tables (at least) is not properly reflected there and is misleading.

      Consider the following primitive test case:

      MariaDB [test]> select @@use_stat_tables, version();
      +------------------------+-----------------+
      | @@use_stat_tables      | version()       |
      +------------------------+-----------------+
      | PREFERABLY_FOR_QUERIES | 10.5.11-MariaDB |
      +------------------------+-----------------+
      1 row in set (0.000 sec)
      
      

      MariaDB [test]> create table ta(id int auto_increment primary key, c1 int, c2 int, key(c1,c2)) engine=Aria;
      Query OK, 0 rows affected (0.122 sec)
       
      MariaDB [test]> insert into ta(c1,c2) values(1,2), (1,3), (1,4);
      Query OK, 3 rows affected (0.063 sec)
      Records: 3  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select table_name, column_name, cardinality from information_schema.statistics where table_schema='test' and table_name='ta';
      +------------+-------------+-------------+
      | table_name | column_name | cardinality |
      +------------+-------------+-------------+
      | ta         | id          |           3 |
      | ta         | c1          |        NULL |
      | ta         | c2          |        NULL |
      +------------+-------------+-------------+
      3 rows in set (0.000 sec)
       
      MariaDB [test]> analyze table ta;
      +---------+---------+----------+----------+one
      | Table   | Op      | Msg_type | Msg_text |
      +---------+---------+----------+----------+
      | test.ta | analyze | status   | OK       |
      +---------+---------+----------+----------+
      1 row in set (0.117 sec)
      

       
      MariaDB [test]> select table_name, column_name, cardinality from information_schema.statistics where table_schema='test' and table_name='ta';
      +------------+-------------+-------------+
      | table_name | column_name | cardinality |
      +------------+-------------+-------------+
      | ta         | id          |           3 |
      | ta         | c1          |           1 |
      | ta         | c2          |           3 |
      +------------+-------------+-------------+
      3 rows in set (0.001 sec)
      

      All good so far. We see proper cardinalities for INT columns. Now let's add VARCHAR(8) one:

      MariaDB [test]> alter table ta add column `system` varchar(8) null;
      Query OK, 3 rows affected (0.427 sec)
      Records: 3  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> update ta set `system` = 'a';
      Query OK, 3 rows affected (0.045 sec)
      Rows matched: 3  Changed: 3  Warnings: 0
       
      MariaDB [test]> select table_name, column_name, cardinality 
      from information_schema.statistics where table_schema='test' and table_name='ta';
      +------------+-------------+-------------+
      | table_name | column_name | cardinality |
      +------------+-------------+-------------+
      | ta         | id          |           3 |
      | ta         | c1          |        NULL |
      | ta         | c2          |        NULL |
      +------------+-------------+-------------+
      3 rows in set (0.000 sec)
       
      MariaDB [test]> alter table ta drop key c1;
      Query OK, 3 rows affected (0.443 sec)
      Records: 3  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select table_name, column_name, cardinality from information_schema.statistics where table_schema='test' and table_name='ta';
      +------------+-------------+-------------+
      | table_name | column_name | cardinality |
      +------------+-------------+-------------+
      | ta         | id          |           3 |
      +------------+-------------+-------------+
      1 row in set (0.001 sec)
      
      

      MariaDB [test]> alter table ta add key c1(c1,c2,`system`);
      Query OK, 3 rows affected (0.425 sec)
      Records: 3  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select table_name, column_name, cardinality 
      from information_schema.statistics where table_schema='test' and table_name='ta';
      +------------+-------------+-------------+
      | table_name | column_name | cardinality |
      +------------+-------------+-------------+
      | ta         | id          |           3 |
      | ta         | c1          |        NULL |
      | ta         | c2          |        NULL |
      | ta         | system      |        NULL |
      +------------+-------------+-------------+
      4 rows in set (0.001 sec)
       
      MariaDB [test]> analyze table ta;
      +---------+---------+----------+----------+one
      | Table   | Op      | Msg_type | Msg_text |
      +---------+---------+----------+----------+
      | test.ta | analyze | status   | OK       |
      +---------+---------+----------+----------+
      1 row in set (0.012 sec)
       
      MariaDB [test]> select table_name, column_name, cardinality from information_schema.statistics where table_schema='test' and table_name='ta';
      +------------+-------------+-------------+
      | table_name | column_name | cardinality |
      +------------+-------------+-------------+
      | ta         | id          |           3 |
      | ta         | c1          |           1 |
      | ta         | c2          |           3 |
      | ta         | system      |           3 |
      +------------+-------------+-------------+
      4 rows in set (0.001 sec)
       
      MariaDB [test]> select count(distinct `system`) from ta;
      +--------------------------+
      | count(distinct `system`) |
      +--------------------------+
      |                        1 |
      +--------------------------+
      1 row in set (0.025 sec)
      

      The cardinality value is wrong for the VARCHAR column only! Now, can we fix that if we collect engine-independent table statistics? Let's try:

      MariaDB [test]> analyze table ta persistent for all;
      +---------+---------+----------+-----------------------------------------+
      | Table   | Op      | Msg_type | Msg_text                                |
      +---------+---------+----------+-----------------------------------------+
      | test.ta | analyze | status   | Engine-independent statistics collected |
      | test.ta | analyze | status   | Table is already up to date             |
      +---------+---------+----------+-----------------------------------------+
      2 rows in set (0.033 sec)
       
      MariaDB [test]> select table_name, column_name, cardinality
       from information_schema.statistics where table_schema='test' and table_name='ta';
      +------------+-------------+-------------+
      | table_name | column_name | cardinality |
      +------------+-------------+-------------+
      | ta         | id          |           3 |
      | ta         | c1          |           1 |
      | ta         | c2          |           3 |
      | ta         | system      |           3 |
      +------------+-------------+-------------+
      4 rows in set (0.001 sec)
      

      Still wrong? But in mysql.column_stats it seems correct, at least average frequency for our problematic column is correct:

      MariaDB [test]> select table_name, column_name, avg_frequency 
      from mysql.column_stats where db_name='test' and table_name='ta';
      +------------+-------------+---------------+
      | table_name | column_name | avg_frequency |
      +------------+-------------+---------------+
      | ta         | c1          |        3.0000 |
      | ta         | c2          |        1.0000 |
      | ta         | id          |        1.0000 |
      | ta         | system      |        3.0000 |
      +------------+-------------+---------------+
      4 rows in set (0.000 sec)
      

      So, I think the bug is in what information_schema.statistcis shows. Correct me if I am wrong.

        Attachments

          Activity

            People

            Assignee:
            psergei Sergei Petrunia
            Reporter:
            valerii Valerii Kravchuk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.