Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29250

InnoDB: Failing assertion: table->get_ref_count() == 0

Details

    Description

      The fix of MDEV-27783 appears to have been incomplete. I just got the following crash:

      bb-10.6-release c9803504389e12b79253e12f1d7d78fbdf92395c

      CURRENT_TEST: main.alter_table
      mysqltest: At line 1636: query 'ALTER TABLE ti1 ADD PRIMARY KEY(a)' failed: <Unknown> (2013): Lost connection to server during query
      ...
      2022-08-05 11:17:44 0x7fd40d683640  InnoDB: Assertion failure in file /mariadb/10.6/storage/innobase/dict/dict0dict.cc line 1844
      InnoDB: Failing assertion: table->get_ref_count() == 0
      ...
      #6  0x0000557143293726 in ut_dbg_assertion_failed (
          expr=expr@entry=0x5571437ef3c1 "table->get_ref_count() == 0", 
          file=file@entry=0x5571437ef958 "/mariadb/10.6/storage/innobase/dict/dict0dict.cc", line=line@entry=1844) at /mariadb/10.6/storage/innobase/ut/ut0dbg.cc:60
      #7  0x000055714331011f in dict_sys_t::remove (this=0x557143e9fc40 <dict_sys>, 
          table=table@entry=0x7fd36c11a338, lru=lru@entry=false, 
          keep=keep@entry=false)
          at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1844
      #8  0x0000557143330cb4 in trx_t::commit (this=this@entry=0x7fd40e967680, 
      ...
      Thread 55 (Thread 0x7fd32ffff640 (LWP 2881184)):
      ...
      #8  0x0000557143120b44 in dict_sys_t::lock (this=this@entry=0x557143e9fc40 <dict_sys>, file=file@entry=0x5571437ef958 "/mariadb/10.6/storage/innobase/dict/dict0dict.cc", line=line@entry=225) at /mariadb/10.6/storage/innobase/include/dict0dict.h:1532
      #9  0x000055714330d54f in dict_table_close (table=table@entry=0x7fd36c11a338) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:225
      #10 0x000055714330d5a6 in dict_table_close (table=0x7fd36c11a338, dict_locked=dict_locked@entry=false, thd=0x557145271ac8, mdl=0x7fd31c0016c0) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:253
      #11 0x0000557143235002 in purge_node_t::close_table (this=this@entry=0x557144f2d6f8) at /mariadb/10.6/storage/innobase/include/row0purge.h:229
      #12 0x000055714323509b in purge_node_t::retain_mdl (this=this@entry=0x557144f2d6f8, table_id=57) at /mariadb/10.6/storage/innobase/include/row0purge.h:249
      #13 0x0000557143232392 in row_purge_parse_undo_rec (node=node@entry=0x557144f2d6f8, undo_rec=undo_rec@entry=0x557144f3dc18 "", thr=thr@entry=0x557144f2d508, updated_extern=updated_extern@entry=0x7fd32fffea57) at /mariadb/10.6/storage/innobase/row/row0purge.cc:932
      #14 0x0000557143234c7c in row_purge (node=node@entry=0x557144f2d6f8, undo_rec=undo_rec@entry=0x557144f3dc18 "", thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/row/row0purge.cc:1120
      #15 0x0000557143234d1b in row_purge_step (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/row/row0purge.cc:1172
      #16 0x00005571431da5e1 in que_thr_step (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/que/que0que.cc:653
      #17 0x00005571431da7c2 in que_run_threads_low (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/que/que0que.cc:709
      #18 0x00005571431da87f in que_run_threads (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/que/que0que.cc:729
      #19 0x00005571432610ac in srv_task_execute () at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1656
      #20 0x0000557143261794 in purge_worker_callback () at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1899
      

      I think that a similar problem (crash of a non-debug server) may affect DROP TABLE as well as any table rebuild operation (including OPTIMIZE TABLE and TRUNCATE TABLE) on tables that contain FULLTEXT INDEX.

      Attachments

        Issue Links

          Activity

            So far, I failed to reproduce the crash with the following test, which is based on the failing section of the main.alter_table test:

            --source include/have_innodb.inc
             
            CREATE TABLE ti1(a INT NOT NULL, b INT, c INT) engine=InnoDB;
            ALTER TABLE ti1 ADD COLUMN d VARCHAR(200);
            ALTER TABLE ti1 ADD COLUMN d2 VARCHAR(200);
            ALTER TABLE ti1 ADD COLUMN e ENUM('a', 'b') FIRST;
            ALTER TABLE ti1 ADD COLUMN f INT AFTER a;
            ALTER TABLE ti1 ADD FULLTEXT INDEX ii3 (d), ALGORITHM=INPLACE;
            ALTER TABLE ti1 ADD FULLTEXT INDEX ii4 (d2);
            ALTER TABLE ti1 ADD PRIMARY KEY(a);
            DROP TABLE ti1;
            

            I think that purge is running on the table because of the hidden metadata records for instant ADD COLUMN. I believe that the following test should be equivalent, but also it fails to fail for me:

            --source include/have_innodb.inc
             
            CREATE TABLE ti1(a SERIAL, d VARCHAR(200), FULLTEXT INDEX(d)) ENGINE=InnoDB;
             
            let $N=10;
            while ($N) {
            INSERT INTO ti1 SET a=NULL;
            dec $N;
            }
            let $N=10;
            while ($N) {
            DELETE FROM ti1 LIMIT 1;
            dec $N;
            }
            ALTER TABLE ti1 FORCE, ALGORITHM=COPY;
            DROP TABLE ti1;
            

            marko Marko Mäkelä added a comment - So far, I failed to reproduce the crash with the following test, which is based on the failing section of the main.alter_table test: --source include/have_innodb.inc   CREATE TABLE ti1(a INT NOT NULL , b INT , c INT ) engine=InnoDB; ALTER TABLE ti1 ADD COLUMN d VARCHAR (200); ALTER TABLE ti1 ADD COLUMN d2 VARCHAR (200); ALTER TABLE ti1 ADD COLUMN e ENUM( 'a' , 'b' ) FIRST ; ALTER TABLE ti1 ADD COLUMN f INT AFTER a; ALTER TABLE ti1 ADD FULLTEXT INDEX ii3 (d), ALGORITHM=INPLACE; ALTER TABLE ti1 ADD FULLTEXT INDEX ii4 (d2); ALTER TABLE ti1 ADD PRIMARY KEY (a); DROP TABLE ti1; I think that purge is running on the table because of the hidden metadata records for instant ADD COLUMN . I believe that the following test should be equivalent, but also it fails to fail for me: --source include/have_innodb.inc   CREATE TABLE ti1(a SERIAL, d VARCHAR (200), FULLTEXT INDEX (d)) ENGINE=InnoDB;   let $N=10; while ($N) { INSERT INTO ti1 SET a= NULL ; dec $N; } let $N=10; while ($N) { DELETE FROM ti1 LIMIT 1; dec $N; } ALTER TABLE ti1 FORCE , ALGORITHM=COPY; DROP TABLE ti1;

            With the following code patch, I finally had some success, on the 38th round when running 115 concurrent tests and --innodb-purge-threads=20. This was with the last test variant (CREATE, INSERT, DELETE, ALTER, DROP):

            mysqltest: At line 16: query 'DROP TABLE ti1' failed: <Unknown> (2013): Lost connection to server during query
            

            I think that it could have failed in a similar way during the ALTER TABLE, but I may be mistaken.
            The patch introduces a sleep and disables a condition that could prevent the problem from reproducing with the reduced test.

            diff --git a/storage/innobase/include/row0purge.h b/storage/innobase/include/row0purge.h
            index 34af658cb12..8681b64cee9 100644
            --- a/storage/innobase/include/row0purge.h
            +++ b/storage/innobase/include/row0purge.h
            @@ -238,7 +238,8 @@ struct purge_node_t{
               bool retain_mdl(table_id_t table_id)
               {
                 ut_ad(table_id);
            -    if (last_table_id == table_id && mdl_hold_recs < 100)
            +    std::this_thread::sleep_for(std::chrono::milliseconds(1));
            +    if (0 && last_table_id == table_id && mdl_hold_recs < 100)
                 {
                   ut_ad(table);
                   mdl_hold_recs++;
            

            marko Marko Mäkelä added a comment - With the following code patch, I finally had some success, on the 38th round when running 115 concurrent tests and --innodb-purge-threads=20 . This was with the last test variant ( CREATE , INSERT , DELETE , ALTER , DROP ): mysqltest: At line 16: query 'DROP TABLE ti1' failed: <Unknown> (2013): Lost connection to server during query I think that it could have failed in a similar way during the ALTER TABLE , but I may be mistaken. The patch introduces a sleep and disables a condition that could prevent the problem from reproducing with the reduced test. diff --git a/storage/innobase/include/row0purge.h b/storage/innobase/include/row0purge.h index 34af658cb12..8681b64cee9 100644 --- a/storage/innobase/include/row0purge.h +++ b/storage/innobase/include/row0purge.h @@ -238,7 +238,8 @@ struct purge_node_t{ bool retain_mdl(table_id_t table_id) { ut_ad(table_id); - if (last_table_id == table_id && mdl_hold_recs < 100) + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + if (0 && last_table_id == table_id && mdl_hold_recs < 100) { ut_ad(table); mdl_hold_recs++;

            Patch is in bb-10.6-MDEV-29250

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.6- MDEV-29250

            The revised patch looks correct to me. OK to push after addressing my comments, and after this has passed stress testing.

            marko Marko Mäkelä added a comment - The revised patch looks correct to me. OK to push after addressing my comments, and after this has passed stress testing.

            origin/bb-10.6-MDEV-29250 d3d8e91f51ed289d43ac91eb1d8ba9d11c904cb5 2022-08-18T16:24:26+05:30
            performed quite well in RQG testing.
            Especially remarkable is the clear smaller amount of failing crash recovery tests compared to
            origin/10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f 2022-08-16T21:33:45+05:30
            In case the first time seen problem
                 mysqld: /data/Server/bb-10.6-MDEV-29250/storage/innobase/include/dict0mem.h:1348: void dict_index_t::init_change_cols(unsigned int): Assertion `n_fields > n_cols' failed.
            is not caused by the modifications for MDEV-29250 I vote for pushing MDEV-29250.
            

            mleich Matthias Leich added a comment - origin/bb-10.6-MDEV-29250 d3d8e91f51ed289d43ac91eb1d8ba9d11c904cb5 2022-08-18T16:24:26+05:30 performed quite well in RQG testing. Especially remarkable is the clear smaller amount of failing crash recovery tests compared to origin/10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f 2022-08-16T21:33:45+05:30 In case the first time seen problem mysqld: /data/Server/bb-10.6-MDEV-29250/storage/innobase/include/dict0mem.h:1348: void dict_index_t::init_change_cols(unsigned int): Assertion `n_fields > n_cols' failed. is not caused by the modifications for MDEV-29250 I vote for pushing MDEV-29250.

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.