Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22770

trx_undo_report_rename() fails to release page latches

Details

    Description

      USE test;
      SET SQL_MODE='';
      SET @@GLOBAL.innodb_trx_rseg_n_slots_debug=1;
      CREATE TABLE t (b VARCHAR(10) NOT NULL UNIQUE) ENGINE=InnoDB;
      INSERT INTO t VALUES (12662),(54592);
      SET GLOBAL innodb_monitor_enable='buffer_flush_batches';
      INSERT INTO t VALUES (2822.75);
      CREATE TABLE t2(a INT NOT NULL PRIMARY KEY, b INT) ENGINE=InnoDB SELECT * FROM t LOCK IN SHARE MODE;
      INSERT INTO t VALUES (25215);
      CREATE TEMPORARY TABLE m(a INT) ENGINE=INNODB;
      SELECT SLEEP(5);
      SELECT SLEEP(5);
      # Then exit CLI, and shutdown using mysqladmin shutdown
      

      Leads to:

      10.5.4 8569dac1ec9f6853a0b2f3ea9bcbda67644ead24

      2020-06-02 10:09:01 0 [Note] /test/MD260520-mariadb-10.5.4-linux-x86_64-dbg/bin/mysqld: ready for connections.
      Version: '10.5.4-MariaDB-debug'  socket: '/test/MD260520-mariadb-10.5.4-linux-x86_64-dbg/socket.sock'  port: 13236  MariaDB Server
      2020-06-02 10:09:16 4 [Warning] InnoDB: Cannot find a free slot for an undo log. Do you have too many active transactions running concurrently?
      2020-06-02 10:09:16 4 [Warning] InnoDB: Unable to delete statistics for table test.t2: Too many concurrent transactions. They can be deleted later using DELETE FROM mysql.innodb_index_stats WHERE database_name = 'test' AND table_name = 't2'; DELETE FROM mysql.innodb_table_stats WHERE database_name = 'test' AND table_name = 't2';
      2020-06-02 10:09:16 4 [Note] InnoDB: Deferring DROP TABLE `test`.`t2`; renaming to test/#sql-ib21
      2020-06-02 10:09:16 4 [Warning] InnoDB: Cannot find a free slot for an undo log. Do you have too many active transactions running concurrently?
      ########################################
      DEADLOCK of threads detected!
      rw-lock 0x1553ab400ff8 --Thread 23449040844544 has waited at trx0rseg.ic line 44 for 0.00 seconds the semaphore:
      X-lock on RW-latch at 0x1553ab400ff8 created in file buf0buf.cc line 1325
      a writer (thread id 23449576269568) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: 0
      Last time write locked in file trx0rseg.ic line 44
      Locked: thread 23449576269568 file trx0rseg.ic line 44  X-LOCK
      Locked: thread 23449576269568 file trx0rseg.ic line 44  X-LOCK
      2020-06-02 10:09:26 4 [Note] InnoDB: Mutex 0x562fdc4be6c0 owned by thread 23449040844544 file /test/10.5_dbg/storage/innobase/dict/dict0stats.cc line 2434
      --Thread 23449576269568 has waited at dict0dict.cc line 316 for 0.00 seconds the semaphore:
      Mutex at 0x562fdc4be6c0, Mutex DICT_SYS created dict0dict.cc:1016, lock var 2
      Last time reserved in file /test/10.5_dbg/storage/innobase/dict/dict0stats.cc line 2434
      2020-06-02 10:09:26 4 [ERROR] [FATAL] InnoDB: ######################################## Deadlock Detected!
      200602 10:09:26 [ERROR] mysqld got signal 6 ;
      

      10.5.4 8569dac1ec9f6853a0b2f3ea9bcbda67644ead24

      Core was generated by `/test/MD260520-mariadb-10.5.4-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
      Program terminated with signal SIGABRT, Aborted.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
      [Current thread is 1 (Thread 0x154acf08f700 (LWP 76783))]
      (gdb) bt
      (gdb) (gdb) #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
      #1  0x0000558a24bbcd7a in my_write_core (sig=sig@entry=6) at /test/10.5_dbg/mysys/stacktrace.c:518
      #2  0x0000558a24362385 in handle_fatal_signal (sig=6) at /test/10.5_dbg/sql/signal_handler.cc:330
      #3  <signal handler called>
      #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
      #5  0x0000154acd7d3801 in __GI_abort () at abort.c:79
      #6  0x0000558a249795f6 in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /test/10.5_dbg/storage/innobase/ut/ut0ut.cc:601
      #7  0x0000558a249182e0 in sync_array_wait_event (arr=arr@entry=0x154acced3218, cell=@0x154acf08e738: 0x154accf810d8) at /test/10.5_dbg/storage/innobase/sync/sync0arr.cc:463
      #8  0x0000558a2473edb8 in TTASEventMutex<GenericPolicy>::enter (line=<optimized out>, filename=<optimized out>, max_delay=<optimized out>, max_spins=60, this=<optimized out>) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:477
      #9  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x558a2588a6c0 <dict_sys>, n_spins=30, n_delay=4, name=0x558a2506b0b0 "/test/10.5_dbg/storage/innobase/dict/dict0dict.cc", line=316) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:594
      #10 0x0000558a24a4b007 in dict_table_close (table=0x154aac057098, dict_locked=dict_locked@entry=false, try_drop=try_drop@entry=false, thd=thd@entry=0x0, mdl=mdl@entry=0x0) at /test/10.5_dbg/storage/innobase/dict/dict0dict.cc:316
      #11 0x0000558a24899ac0 in row_prebuilt_free (prebuilt=0x154aac17c098, dict_locked=dict_locked@entry=0) at /test/10.5_dbg/storage/innobase/row/row0mysql.cc:1041
      #12 0x0000558a24711ad5 in ha_innobase::close (this=0x154aac1550a0) at /test/10.5_dbg/storage/innobase/handler/ha_innodb.cc:6171
      #13 0x0000558a243695d8 in handler::ha_close (this=0x154aac1550a0) at /test/10.5_dbg/sql/handler.cc:2975
      #14 0x0000558a241b957d in closefrm (table=table@entry=0x154aac173088) at /test/10.5_dbg/sql/table.cc:4271
      #15 0x0000558a242c806d in THD::close_temporary_table (this=this@entry=0x154aac015088, table=table@entry=0x154aac173088) at /test/10.5_dbg/sql/temporary_tables.cc:1240
      #16 0x0000558a242c8ca0 in THD::free_temporary_table (this=this@entry=0x154aac015088, table=table@entry=0x154aac173088) at /test/10.5_dbg/sql/temporary_tables.cc:1490
      #17 0x0000558a242c8e00 in THD::close_temporary_tables (this=this@entry=0x154aac015088) at /test/10.5_dbg/sql/temporary_tables.cc:531
      #18 0x0000558a2405603e in THD::cleanup (this=this@entry=0x154aac015088) at /test/10.5_dbg/sql/sql_class.cc:1543
      #19 0x0000558a23fb4c6b in unlink_thd (thd=0x154aac015088) at /test/10.5_dbg/sql/mysqld.cc:2600
      #20 0x0000558a2420ac98 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x154aadc453a8, put_in_cache=put_in_cache@entry=true) at /test/10.5_dbg/sql/sql_connect.cc:1422
      #21 0x0000558a2420ae5b in handle_one_connection (arg=arg@entry=0x154aadc453a8) at /test/10.5_dbg/sql/sql_connect.cc:1313
      #22 0x0000558a2466a14e in pfs_spawn_thread (arg=0x154accc45888) at /test/10.5_dbg/storage/perfschema/pfs.cc:2201
      #23 0x0000154ace4b66db in start_thread (arg=0x154acf08f700) at pthread_create.c:463
      #24 0x0000154acd8b488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Bug confirmed present in:
      MariaDB: 10.5.4 (dbg)

      Bug confirmed not present in:
      MariaDB: 10.1.46 (dbg), 10.1.46 (opt), 10.2.33 (dbg), 10.2.33 (opt), 10.3.24 (dbg), 10.3.24 (opt), 10.4.14 (dbg), 10.4.14 (opt), 10.5.4 (opt)
      MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

      MariaDB 10.4.14 (dbg) does not crash in any way;

      10.4.14 ea7830eef48333e28f98a9b91f05a95735b465a3

      2020-06-02 10:15:25 0 [Note] /test/MD260520-mariadb-10.4.14-linux-x86_64-dbg/bin/mysqld: ready for connections.
      Version: '10.4.14-MariaDB-debug'  socket: '/test/MD260520-mariadb-10.4.14-linux-x86_64-dbg/socket.sock'  port: 14979  MariaDB Server
      2020-06-02 10:15:26 9 [Note] InnoDB: Deferring DROP TABLE `test`.`t2`; renaming to test/#sql-ib21
      2020-06-02 10:16:10 0 [Note] /test/MD260520-mariadb-10.4.14-linux-x86_64-dbg/bin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
      2020-06-02 10:16:10 0 [Note] InnoDB: FTS optimize thread exiting.
      2020-06-02 10:16:10 0 [Note] Event Scheduler: Purging the queue. 0 events
      2020-06-02 10:16:11 0 [Note] InnoDB: Starting shutdown...
      2020-06-02 10:16:11 0 [Note] InnoDB: Dumping buffer pool(s) to /test/MD260520-mariadb-10.4.14-linux-x86_64-dbg/data/ib_buffer_pool
      2020-06-02 10:16:11 0 [Note] InnoDB: Buffer pool(s) dump completed at 200602 10:16:11
      2020-06-02 10:16:12 0 [Note] InnoDB: Shutdown completed; log sequence number 76351; transaction id 64
      2020-06-02 10:16:12 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
      2020-06-02 10:16:12 0 [Note] /test/MD260520-mariadb-10.4.14-linux-x86_64-dbg/bin/mysqld: Shutdown complete
      

      Attachments

        Issue Links

          Activity

            I can repeat this with both settings of innodb_stats_persistent. I debugged it with rr replay as follows:

            1. logs_empty_and_mark_files_at_shutdown() was blocked by buf_page_cleaner_is_active
            2. buf_flush_page() was blocked by a page latch on the TRX_SYS page (0:6)
            3. watch -l block.debug_list.count revealed the problem:

            diff --git a/storage/innobase/trx/trx0rec.cc b/storage/innobase/trx/trx0rec.cc
            index 940c90ee3e3..b282bb177af 100644
            --- a/storage/innobase/trx/trx0rec.cc
            +++ b/storage/innobase/trx/trx0rec.cc
            @@ -1934,10 +1934,9 @@ dberr_t trx_undo_report_rename(trx_t* trx, const dict_table_t* table)
             				}
             			}
             		}
            -
            -		mtr.commit();
             	}
             
            +	mtr.commit();
             	return err;
             }
            

            The bug was introduced to the logging of RENAME operations (MDEV-14717) by MDEV-15090. I did not test 10.3 or 10.4 (and did not investigate why the test did not fail there), but I verified the above fix on 10.5.

            marko Marko Mäkelä added a comment - I can repeat this with both settings of innodb_stats_persistent . I debugged it with rr replay as follows: logs_empty_and_mark_files_at_shutdown() was blocked by buf_page_cleaner_is_active buf_flush_page() was blocked by a page latch on the TRX_SYS page (0:6) watch -l block.debug_list.count revealed the problem: diff --git a/storage/innobase/trx/trx0rec.cc b/storage/innobase/trx/trx0rec.cc index 940c90ee3e3..b282bb177af 100644 --- a/storage/innobase/trx/trx0rec.cc +++ b/storage/innobase/trx/trx0rec.cc @@ -1934,10 +1934,9 @@ dberr_t trx_undo_report_rename(trx_t* trx, const dict_table_t* table) } } } - - mtr.commit(); } + mtr.commit(); return err; } The bug was introduced to the logging of RENAME operations ( MDEV-14717 ) by MDEV-15090 . I did not test 10.3 or 10.4 (and did not investigate why the test did not fail there), but I verified the above fix on 10.5.

            Double checked 10.3.24 (dbg) and 10.4.14 (dbg) manually, and confirmed they do not crash and that shutdown completes cleanly.

            Roel Roel Van de Paar added a comment - Double checked 10.3.24 (dbg) and 10.4.14 (dbg) manually, and confirmed they do not crash and that shutdown completes cleanly.

            Roel, in your test, when the duplicate key error handling for the CREATE TABLE t2…SELECT invokes ha_innobase::delete_table(), the transaction is still open and the work-around mentioned in MDEV-21602 should kick in.

            A more reliable version of the test might invoke RENAME TABLE directly while innodb_trx_rseg_n_slots_debug=1 is in effect.

            A possible explanation why the test might not be effective before 10.5 might be MDEV-16678.

            marko Marko Mäkelä added a comment - Roel , in your test, when the duplicate key error handling for the CREATE TABLE t2…SELECT invokes ha_innobase::delete_table() , the transaction is still open and the work-around mentioned in MDEV-21602 should kick in. A more reliable version of the test might invoke RENAME TABLE directly while innodb_trx_rseg_n_slots_debug=1 is in effect. A possible explanation why the test might not be effective before 10.5 might be MDEV-16678 .

            People

              marko Marko Mäkelä
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.