Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17540

Server crashes in dict_table_get_first_index after TRUNCATE TABLE

Details

    Description

      Note: Run the test case with --repeat=N if it doesn't fail right away. It currently fails for me within 1-2 attempts.

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (a BIT(14)) ENGINE=InnoDB;
      INSERT INTO t1 VALUES
          (b'01110110101011'),(b'01100111111000'),(b'00001011110100'),
          (b'01110110111010'),(b'10001010101011'),(b'01100111001111');
       
      CREATE TABLE t2 (
          pk INT DEFAULT 1,
          b YEAR,
          c BIT(14),
          d YEAR AS (b),
          e BIT(14) AS (c),
          UNIQUE(pk),
          KEY(e)
      ) ENGINE=InnoDB;
       
      REPLACE INTO t2 (c) SELECT a FROM t1;
      TRUNCATE TABLE t2;
       
      DROP TABLE t1, t2;
      

      10.3 e8dd18a4

      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x000056012a8c38ec in dict_table_get_first_index (table=0x0) at /data/src/10.3/storage/innobase/include/dict0dict.ic:210
      210		ut_ad(table->magic_n == DICT_TABLE_MAGIC_N);
      [Current thread is 1 (Thread 0x7fd4237fe700 (LWP 17470))]
       
      Thread 1 (Thread 0x7fd4237fe700 (LWP 17470)):
      #0  0x000056012a8c38ec in dict_table_get_first_index (table=0x0) at /data/src/10.3/storage/innobase/include/dict0dict.ic:210
      #1  0x000056012a8c7afb in row_purge_upd_exist_or_extern_func (thr=0x56012d5351a0, node=0x56012d535628, undo_rec=0x56012d5361f8 "") at /data/src/10.3/storage/innobase/row/row0purge.cc:904
      #2  0x000056012a8c8841 in row_purge_record_func (node=0x56012d535628, undo_rec=0x56012d5361f8 "", thr=0x56012d5351a0, updated_extern=false) at /data/src/10.3/storage/innobase/row/row0purge.cc:1206
      #3  0x000056012a8c89db in row_purge (node=0x56012d535628, undo_rec=0x56012d5361f8 "", thr=0x56012d5351a0) at /data/src/10.3/storage/innobase/row/row0purge.cc:1250
      #4  0x000056012a8c8bd9 in row_purge_step (thr=0x56012d5351a0) at /data/src/10.3/storage/innobase/row/row0purge.cc:1309
      #5  0x000056012a84cf6d in que_thr_step (thr=0x56012d5351a0) at /data/src/10.3/storage/innobase/que/que0que.cc:1042
      #6  0x000056012a84d1a3 in que_run_threads_low (thr=0x56012d5351a0) at /data/src/10.3/storage/innobase/que/que0que.cc:1104
      #7  0x000056012a84d395 in que_run_threads (thr=0x56012d5351a0) at /data/src/10.3/storage/innobase/que/que0que.cc:1144
      #8  0x000056012a9116ba in srv_task_execute () at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2469
      #9  0x000056012a91185f in srv_worker_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2517
      #10 0x00007fd4498b14a4 in start_thread (arg=0x7fd4237fe700) at pthread_create.c:456
      #11 0x00007fd447df9d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
      

      ASAN also fails with SEGV or with heap-use-after-free.
      Non-debug build crashes differently:

      10.3 e8dd18a4 non-debug

      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x000055d9b755e3a8 in mem_heap_free (heap=0x7f8908007130) at /data/src/10.3/storage/innobase/include/mem0mem.ic:426
      426		while (block != NULL) {
      [Current thread is 1 (Thread 0x7f891affd700 (LWP 17593))]
       
      #0  0x000055d9b755e3a8 in mem_heap_free (heap=0x7f8908007130) at /data/src/10.3/storage/innobase/include/mem0mem.ic:426
      #1  row_purge_upd_exist_or_extern_func (undo_rec=0x55d9b93dc5a0 "", node=0x55d9b93dba70) at /data/src/10.3/storage/innobase/row/row0purge.cc:900
      #2  row_purge_record_func (node=node@entry=0x55d9b93dba70, undo_rec=undo_rec@entry=0x55d9b93dc5a0 "", thr=thr@entry=0x55d9b93db938, updated_extern=<optimized out>) at /data/src/10.3/storage/innobase/row/row0purge.cc:1206
      #3  0x000055d9b755eba4 in row_purge (thr=0x55d9b93db938, undo_rec=0x55d9b93dc5a0 "", node=0x55d9b93dba70) at /data/src/10.3/storage/innobase/row/row0purge.cc:1250
      #4  row_purge_step (thr=thr@entry=0x55d9b93db938) at /data/src/10.3/storage/innobase/row/row0purge.cc:1309
      #5  0x000055d9b751ed47 in que_thr_step (thr=0x55d9b93db938) at /data/src/10.3/storage/innobase/que/que0que.cc:1042
      #6  que_run_threads_low (thr=0x55d9b93db938) at /data/src/10.3/storage/innobase/que/que0que.cc:1104
      #7  que_run_threads (thr=thr@entry=0x55d9b93db938) at /data/src/10.3/storage/innobase/que/que0que.cc:1144
      #8  0x000055d9b7584c68 in srv_task_execute () at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2469
      #9  srv_worker_thread (arg=<optimized out>) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2517
      #10 0x00007f89450344a4 in start_thread (arg=0x7f891affd700) at pthread_create.c:456
      #11 0x00007f894357cd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
      

      Not reproducible on 10.2.

      Note: Till recently, the same test case on a debug build would cause an assertion failure:

      10.3 ab7b9cf9 (10.3.14 debug build)

      mysqld: /data/src/10.3-bug/storage/innobase/include/dict0dict.ic:219: dict_index_t* dict_table_get_first_index(const dict_table_t*): Assertion `table' failed.
      190509  0:49:07 [ERROR] mysqld got signal 6 ;
       
      #6  0x00007f43d361be67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5628480c2331 "table", file=file@entry=0x5628480c22c0 "/data/src/10.3-bug/storage/innobase/include/dict0dict.ic", line=line@entry=219, function=function@entry=0x5628480c3580 <_ZZL26dict_table_get_first_indexPK12dict_table_tE19__PRETTY_FUNCTION__> "dict_index_t* dict_table_get_first_index(const dict_table_t*)") at assert.c:92
      #7  0x00007f43d361bf12 in __GI___assert_fail (assertion=0x5628480c2331 "table", file=0x5628480c22c0 "/data/src/10.3-bug/storage/innobase/include/dict0dict.ic", line=219, function=0x5628480c3580 <_ZZL26dict_table_get_first_indexPK12dict_table_tE19__PRETTY_FUNCTION__> "dict_index_t* dict_table_get_first_index(const dict_table_t*)") at assert.c:101
      #8  0x000056284799b9b3 in dict_table_get_first_index (table=0x0) at /data/src/10.3-bug/storage/innobase/include/dict0dict.ic:219
      #9  0x000056284799fe1d in row_purge_upd_exist_or_extern_func (thr=0x56284ad05dc8, node=0x56284ad05e80, undo_rec=0x56284ad06348 "") at /data/src/10.3-bug/storage/innobase/row/row0purge.cc:904
      #10 0x00005628479a0b69 in row_purge_record_func (node=0x56284ad05e80, undo_rec=0x56284ad06348 "", thr=0x56284ad05dc8, updated_extern=false) at /data/src/10.3-bug/storage/innobase/row/row0purge.cc:1206
      #11 0x00005628479a0d03 in row_purge (node=0x56284ad05e80, undo_rec=0x56284ad06348 "", thr=0x56284ad05dc8) at /data/src/10.3-bug/storage/innobase/row/row0purge.cc:1250
      #12 0x00005628479a0f37 in row_purge_step (thr=0x56284ad05dc8) at /data/src/10.3-bug/storage/innobase/row/row0purge.cc:1311
      #13 0x0000562847921b38 in que_thr_step (thr=0x56284ad05dc8) at /data/src/10.3-bug/storage/innobase/que/que0que.cc:1042
      #14 0x0000562847921d70 in que_run_threads_low (thr=0x56284ad05dc8) at /data/src/10.3-bug/storage/innobase/que/que0que.cc:1104
      #15 0x0000562847921f64 in que_run_threads (thr=0x56284ad05dc8) at /data/src/10.3-bug/storage/innobase/que/que0que.cc:1144
      #16 0x00005628479ec75c in srv_task_execute () at /data/src/10.3-bug/storage/innobase/srv/srv0srv.cc:2449
      #17 0x00005628479ec901 in srv_worker_thread (arg=0x0) at /data/src/10.3-bug/storage/innobase/srv/srv0srv.cc:2497
      #18 0x00007f43d51904a4 in start_thread (arg=0x7f43b3fff700) at pthread_create.c:456
      #19 0x00007f43d36d8d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
      

      Attachments

        Issue Links

          Activity

            I was able to repeat on the first try, and I’ll test further with a loop around the ‘interesting part’ of the test:

            --disable_query_log
            let $n=100;
            while ($n) {
            REPLACE INTO t2 (c) SELECT a FROM t1;
            TRUNCATE TABLE t2;
            dec $n;
            }
            --enable_query_log
            

            The cause appears to be that row_purge_poss_sec() is assigning node->table=NULL, because the table was closed and was not able to be reopened due to the TRUNCATE operation having changed the InnoDB-internal table name. In this case, I think that our best option is to skip the purge for the remaining indexes:

            diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc
            index 2fc465e7726..91b01fda105 100644
            --- a/storage/innobase/row/row0purge.cc
            +++ b/storage/innobase/row/row0purge.cc
            @@ -892,6 +892,9 @@ row_purge_upd_exist_or_extern_func(
             				heap, ROW_BUILD_FOR_PURGE);
             			row_purge_remove_sec_if_poss(node, node->index, entry);
             			mem_heap_empty(heap);
            +			if (!node->table) {
            +				return;
            +			}
             		}
             
             		node->index = dict_table_get_next_index(node->index);
            

            With this, the test passes. The patch should be applicable to 10.2 as well, although the simple test case ought to depend on MDEV-12288 and thus should not repeat the crash on 10.2. I will try to develop a better test case.

            marko Marko Mäkelä added a comment - I was able to repeat on the first try, and I’ll test further with a loop around the ‘interesting part’ of the test: --disable_query_log let $n=100; while ($n) { REPLACE INTO t2 (c) SELECT a FROM t1; TRUNCATE TABLE t2; dec $n; } --enable_query_log The cause appears to be that row_purge_poss_sec() is assigning node->table=NULL , because the table was closed and was not able to be reopened due to the TRUNCATE operation having changed the InnoDB-internal table name. In this case, I think that our best option is to skip the purge for the remaining indexes: diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc index 2fc465e7726..91b01fda105 100644 --- a/storage/innobase/row/row0purge.cc +++ b/storage/innobase/row/row0purge.cc @@ -892,6 +892,9 @@ row_purge_upd_exist_or_extern_func( heap, ROW_BUILD_FOR_PURGE); row_purge_remove_sec_if_poss(node, node->index, entry); mem_heap_empty(heap); + if (!node->table) { + return; + } } node->index = dict_table_get_next_index(node->index); With this, the test passes. The patch should be applicable to 10.2 as well, although the simple test case ought to depend on MDEV-12288 and thus should not repeat the crash on 10.2. I will try to develop a better test case.

            I believe that this should not be limited to TRUNCATE TABLE. The purge crash could occur after a table-rebuilding ALTER TABLE (or OPTIMIZE TABLE) as well. The prerequisite is that at the time the DDL operation was executed, the table contained indexed virtual columns, and the history of some committed transactions for the table had not been purged.

            marko Marko Mäkelä added a comment - I believe that this should not be limited to TRUNCATE TABLE . The purge crash could occur after a table-rebuilding ALTER TABLE (or OPTIMIZE TABLE ) as well. The prerequisite is that at the time the DDL operation was executed, the table contained indexed virtual columns, and the history of some committed transactions for the table had not been purged.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.