Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35443

During an online table rebuild of an InnoDB statistics table, opt_search_plan_for_table() may degrade to full table scan

Details

    Description

      We've see a server crash due to long semaphore wait. The innodb stats update thread updates stats(dict_stats_save_index_stat()) on a table using InnoDB internal parser, holding dict_operation_lock and dict_sys->mutext, but the plan is full table scan(innodb_index_stats) other than clust index scan. There are more than 200 thousands records in stats table, and there are multiple udpates for the table stats. So it takes more than 900s to finish.

      The reason why stats updates use full table scan is that there is a running SQL: ALTER TABLE mysql.innodb_index_stats ENGINE=InnoDB STATS_PERSISTENT=0. The online DDL set cluster index of innodb_index_stats to ONLINE_INDEX_CREATION.
      callstack:
      prepare_inplace_alter_table_dict
      --row_log_allocate
      ----dict_index_set_online_status

      In function opt_calc_index_goodness():
      343 /* At least for now we don't support using FTS indexes for queries
      344 done through InnoDB's own SQL parser. */
      345 if (dict_index_is_online_ddl(index) || (index->type & DICT_FTS))

      { 346 return(0); 347 }

      In this case we should use the online index.

      Attachments

        Issue Links

          Activity

            It turns out that I had introduced this condition back in MySQL 5.6 when implementing WL#5526 Online ADD INDEX. The condition was almost correct at that point of time, but I forgot to revise it later that year when implementing WL#6255 online table rebuild.

            I believe that the condition dict_index_is_online_ddl(index) needs to be replaced with index->is_committed(). That would also fix a corner case where a secondary index has been fully built, but the ALTER TABLE or CREATE INDEX operation has not been committed yet, possibly because an upgrade of a metadata lock (MDL) to exclusive is in progress. In particular, if this happened during CREATE UNIQUE INDEX, it is possible that the DDL operation will need to be rolled back because some DML operation would introduce a duplicate entry in the not-yet-committed unique index.

            Correcting the condition to index->is_committed() also has the benefit that the condition will never hold on the clustered index of a table, that is, the clustered index will never be disqualified from use in the InnoDB internal SQL interpreter.

            Also the condition on index->type is inaccurate. index->is_btree() would properly disqualify SPATIAL INDEX, which were introduced for InnoDB in MySQL 5.7 and MariaDB Server 10.2.

            marko Marko Mäkelä added a comment - It turns out that I had introduced this condition back in MySQL 5.6 when implementing WL#5526 Online ADD INDEX . The condition was almost correct at that point of time, but I forgot to revise it later that year when implementing WL#6255 online table rebuild. I believe that the condition dict_index_is_online_ddl(index) needs to be replaced with index->is_committed() . That would also fix a corner case where a secondary index has been fully built, but the ALTER TABLE or CREATE INDEX operation has not been committed yet, possibly because an upgrade of a metadata lock (MDL) to exclusive is in progress. In particular, if this happened during CREATE UNIQUE INDEX , it is possible that the DDL operation will need to be rolled back because some DML operation would introduce a duplicate entry in the not-yet-committed unique index. Correcting the condition to index->is_committed() also has the benefit that the condition will never hold on the clustered index of a table, that is, the clustered index will never be disqualified from use in the InnoDB internal SQL interpreter. Also the condition on index->type is inaccurate. index->is_btree() would properly disqualify SPATIAL INDEX , which were introduced for InnoDB in MySQL 5.7 and MariaDB Server 10.2.

            I was curious about the claim about an online table rebuild. I ran the following test with ./mtr --rr:

            --source include/have_innodb.inc
            ALTER TABLE mysql.innodb_index_stats STATS_PERSISTENT=1;
            ALTER TABLE mysql.innodb_index_stats ENGINE=InnoDB STATS_PERSISTENT=0;
            ALTER TABLE mysql.innodb_index_stats STATS_PERSISTENT=0 STATS_AUTO_RECALC=0;
            

            It turns out that for the second ALTER TABLE, with the seemingly redundant ENGINE=InnoDB, the ha_alter_info->handler_flags will include the flag ALTER_RECREATE_TABLE. Before the FORCE keyword was introduced for this purpose, it was customary to write ALTER TABLE tablename ENGINE=InnoDB instead of OPTIMIZE TABLE to have an InnoDB table rebuilt. For all these ALTER TABLE statements, also the flag ALTER_OPTIONS will be set.

            I think that opt_calc_index_goodness() had better also disqualify any indexes that are built on any VIRTUAL columns, because the InnoDB internal SQL parser was not in any way adapted for the introduction of indexed virtual columns (MDEV-5800).

            marko Marko Mäkelä added a comment - I was curious about the claim about an online table rebuild. I ran the following test with ./mtr --rr : --source include/have_innodb.inc ALTER TABLE mysql.innodb_index_stats STATS_PERSISTENT=1; ALTER TABLE mysql.innodb_index_stats ENGINE=InnoDB STATS_PERSISTENT=0; ALTER TABLE mysql.innodb_index_stats STATS_PERSISTENT=0 STATS_AUTO_RECALC=0; It turns out that for the second ALTER TABLE , with the seemingly redundant ENGINE=InnoDB , the ha_alter_info->handler_flags will include the flag ALTER_RECREATE_TABLE . Before the FORCE keyword was introduced for this purpose, it was customary to write ALTER TABLE tablename ENGINE=InnoDB instead of OPTIMIZE TABLE to have an InnoDB table rebuilt. For all these ALTER TABLE statements, also the flag ALTER_OPTIONS will be set. I think that opt_calc_index_goodness() had better also disqualify any indexes that are built on any VIRTUAL columns, because the InnoDB internal SQL parser was not in any way adapted for the introduction of indexed virtual columns ( MDEV-5800 ).

            As far as I can tell, in any version of MariaDB Server, there is no secondary index defined on the tables mysql.innodb_index_stats or mysql.innodb_table_stats. Also in MySQL 5.6 and 5.7, only a PRIMARY KEY had been defined on these tables. I wrote a DEBUG_SYNC test case that would exercise this code in MariaDB Server 10.5:

            diff --git a/mysql-test/suite/innodb/t/stats_persistent.test b/mysql-test/suite/innodb/t/stats_persistent.test
            index 8561298c4d3..870808993f4 100644
            --- a/mysql-test/suite/innodb/t/stats_persistent.test
            +++ b/mysql-test/suite/innodb/t/stats_persistent.test
            @@ -14,14 +14,25 @@ SET DEBUG_SYNC='dict_stats_update_persistent SIGNAL stop WAIT_FOR go';
             --connect(con1, localhost, root)
             SET DEBUG_SYNC='now WAIT_FOR stop';
             
            +--connect(con2, localhost, root)
            +SET DEBUG_SYNC='innodb_inplace_alter_table_enter SIGNAL astop WAIT_FOR ago';
            +send ALTER TABLE mysql.innodb_index_stats FORCE;
            +
            +--connection con1
             --replace_column 1 SUM
             SELECT SUM(DATA_LENGTH+INDEX_LENGTH) FROM information_schema.TABLES WHERE ENGINE='InnoDB';
             
            +SET DEBUG_SYNC='now WAIT_FOR astop';
             SET DEBUG_SYNC='now SIGNAL go';
             --disconnect con1
             
             --connection default
             --reap
            +SET DEBUG_SYNC='now SIGNAL ago';
            +--connection con2
            +reap;
            +--disconnect con2
            +--connection default
             SET DEBUG_SYNC= 'RESET';
             DROP TABLE t1;
             
            

            With the following I can observe with the following that the "goodness" 0 was returned:

            ./mtr --rr innodb.stats_persistent
            rr replay var/log/mysqld.1.rr/latest-trace
            

            break dict_stats_update_persistent
            continue
            break opt_calc_index_goodness thread 2
            continue
            continue
            finish
            next
            next
            next
            

            The penultimate breakpoint would be for the other statistics table mysql.innodb_table_stats.

            Because there only is a clustered index defined for these tables, only one index can be chosen. What makes the difference is the following:

            opt_search_plan_for_table() in storage/innobase/pars/pars0opt.cc

            	n_fields = opt_calc_n_fields_from_goodness(best_goodness);
             
            	if (n_fields == 0) {
            		plan->tuple = NULL;
            		plan->n_exact_match = 0;
            	} else {
            		plan->tuple = dtuple_create(pars_sym_tab_global->heap,
            					    n_fields);
            

            That is, when the incorrect condition is present, a table scan will be executed, instead of searching for a key (database name, table name, index name). Therefore, I can confirm this bug.

            marko Marko Mäkelä added a comment - As far as I can tell, in any version of MariaDB Server, there is no secondary index defined on the tables mysql.innodb_index_stats or mysql.innodb_table_stats . Also in MySQL 5.6 and 5.7, only a PRIMARY KEY had been defined on these tables. I wrote a DEBUG_SYNC test case that would exercise this code in MariaDB Server 10.5: diff --git a/mysql-test/suite/innodb/t/stats_persistent.test b/mysql-test/suite/innodb/t/stats_persistent.test index 8561298c4d3..870808993f4 100644 --- a/mysql-test/suite/innodb/t/stats_persistent.test +++ b/mysql-test/suite/innodb/t/stats_persistent.test @@ -14,14 +14,25 @@ SET DEBUG_SYNC='dict_stats_update_persistent SIGNAL stop WAIT_FOR go'; --connect(con1, localhost, root) SET DEBUG_SYNC='now WAIT_FOR stop'; +--connect(con2, localhost, root) +SET DEBUG_SYNC='innodb_inplace_alter_table_enter SIGNAL astop WAIT_FOR ago'; +send ALTER TABLE mysql.innodb_index_stats FORCE; + +--connection con1 --replace_column 1 SUM SELECT SUM(DATA_LENGTH+INDEX_LENGTH) FROM information_schema.TABLES WHERE ENGINE='InnoDB'; +SET DEBUG_SYNC='now WAIT_FOR astop'; SET DEBUG_SYNC='now SIGNAL go'; --disconnect con1 --connection default --reap +SET DEBUG_SYNC='now SIGNAL ago'; +--connection con2 +reap; +--disconnect con2 +--connection default SET DEBUG_SYNC= 'RESET'; DROP TABLE t1; With the following I can observe with the following that the "goodness" 0 was returned: ./mtr --rr innodb.stats_persistent rr replay var/log/mysqld.1.rr/latest-trace break dict_stats_update_persistent continue break opt_calc_index_goodness thread 2 continue continue finish next next next The penultimate breakpoint would be for the other statistics table mysql.innodb_table_stats . Because there only is a clustered index defined for these tables, only one index can be chosen. What makes the difference is the following: opt_search_plan_for_table() in storage/innobase/pars/pars0opt.cc n_fields = opt_calc_n_fields_from_goodness(best_goodness);   if (n_fields == 0) { plan->tuple = NULL; plan->n_exact_match = 0; } else { plan->tuple = dtuple_create(pars_sym_tab_global->heap, n_fields); That is, when the incorrect condition is present, a table scan will be executed, instead of searching for a key (database name, table name, index name). Therefore, I can confirm this bug.

            Looks to me. Thank you marko for explaining the way to reproduce and debug the issue in rr.

            vlad.lesin Vladislav Lesin added a comment - Looks to me. Thank you marko for explaining the way to reproduce and debug the issue in rr.

            It turns out that in MDEV-33462 in MariaDB Server 10.6.18, the online table rebuild for InnoDB statistics tables was disabled. Therefore, this bug should not affect more recent MariaDB releases.

            In MariaDB Server 10.6, the MariaDB Server 10.5 test case for this bug would trigger the bug MDEV-33575 during the execution of ALTER TABLE mysql.innodb_index_stats.

            marko Marko Mäkelä added a comment - It turns out that in MDEV-33462 in MariaDB Server 10.6.18, the online table rebuild for InnoDB statistics tables was disabled. Therefore, this bug should not affect more recent MariaDB releases. In MariaDB Server 10.6, the MariaDB Server 10.5 test case for this bug would trigger the bug MDEV-33575 during the execution of ALTER TABLE mysql.innodb_index_stats .

            People

              marko Marko Mäkelä
              tiandiwonder Shaohua Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.