Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8147

Assertion `m_lock_type == 2' failed in handler::ha_close() during parallel replication

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0(EOL), 10.1(EOL)
    • 10.0.20, 10.1.5
    • Replication
    • None

    Description

      Note: Unfortunately I can't check if the problem is reproducible on 10.0, because the binlog uses syntax introduced in 10.1 with table encryption.

      Stack trace is from 10.1 commit 8c54182e + patch http://lists.askmonty.org/pipermail/commits/2015-May/007819.html ; the binlog was produced on 10.1 commit 2f25c653 + the same patch + patch for MDEV-8045.

      sql/handler.cc:2579: int handler::ha_close(): Assertion `m_lock_type == 2' failed.
      150512 17:14:41 [ERROR] mysqld got signal 6 ;
       
      #6  0x00007f0d3fb15311 in *__GI___assert_fail (assertion=0x7f0d42e723ce "m_lock_type == 2", file=<optimized out>, line=2579, function=0x7f0d42e76170 "int handler::ha_close()") at assert.c:81
      #7  0x00007f0d4267c967 in handler::ha_close (this=0x7f0d14cae888) at 10.1-patched/sql/handler.cc:2579
      #8  0x00007f0d425344e4 in closefrm (table=0x7f0d0e82c070, free_share=false) at 10.1-patched/sql/table.cc:3012
      #9  0x00007f0d423e08b0 in close_temporary (table=0x7f0d0e82c070, free_share=true, delete_table=false) at 10.1-patched/sql/sql_base.cc:1812
      #10 0x00007f0d42566527 in Relay_log_info::close_temporary_tables (this=0x7f0d2d488c40) at 10.1-patched/sql/rpl_rli.cc:1069
      #11 0x00007f0d4276d5fa in Start_log_event_v3::do_apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.cc:4728
      #12 0x00007f0d4276e217 in Format_description_log_event::do_apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.cc:5103
      #13 0x00007f0d423b3a3a in Log_event::apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.h:1347
      #14 0x00007f0d423a95af in apply_event_and_update_pos (ev=0x7f0d1409c1b0, thd=0x7f0d14049070, rgi=0x7f0d14020800, rpt=0x0) at 10.1-patched/sql/slave.cc:3274
      #15 0x00007f0d425e0843 in rpt_handle_event (qev=0x7f0d146d6570, rpt=0x0) at 10.1-patched/sql/rpl_parallel.cc:49
      #16 0x00007f0d425e635c in rpl_parallel::do_event (this=0x7f0d2d48bf40, serial_rgi=0x7f0d14020800, ev=0x7f0d1409c1b0, event_size=244) at 10.1-patched/sql/rpl_parallel.cc:2357
      #17 0x00007f0d423a9bbf in exec_relay_log_event (thd=0x7f0d14049070, rli=0x7f0d2d488c40, serial_rgi=0x7f0d14020800) at 10.1-patched/sql/slave.cc:3525
      #18 0x00007f0d423ad0d0 in handle_slave_sql (arg=0x7f0d2d487000) at 10.1-patched/sql/slave.cc:4677
      #19 0x00007f0d41b30b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
      #20 0x00007f0d3fbc595d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

      To reproduce:

      • start master with all default options + --log-bin=mysql-bin --server-id=1, with the attached mysql-bin.* files in the datadir;
      • start slave with all default options + --server-id=2 --plugin-load-add=file_key_management.so --file_key_management_filename=<your basedir>/mysql-test/std_data/keys.txt --slave_parallel_threads=8;
      • set up replication (either old-fashioned or GTID, doesn't matter);
      • wait.

      Attachments

        1. mysql-bin.000001
          349 kB
        2. mysql-bin.index
          0.0 kB
        3. mysql-bin.state
          0.0 kB

        Activity

          knielsen Kristian Nielsen added a comment - - edited

          Hm, unfortunately it did not reproduce for me in like 5 or 10 runs (the slave just replicated to completion without errors).
          EDIT: Ok, it reproduces fine for me. I was doing START SLAVE UNTIL <end of binlog>, but the assertion happens only on the first event in the following binlog

          knielsen Kristian Nielsen added a comment - - edited Hm, unfortunately it did not reproduce for me in like 5 or 10 runs (the slave just replicated to completion without errors). EDIT: Ok, it reproduces fine for me. I was doing START SLAVE UNTIL <end of binlog>, but the assertion happens only on the first event in the following binlog

          Here is an MTR test case. It triggers the assertion reliably for me by
          adding the below sleep in the code.

          It looks like the wait_for_workers_idle() function returns too early. It
          does wait_for_prior_commit(), at which point all transactions are committed
          (at least written into binlog), but table locks may not yet be
          released. This allows the close_temporary_tables() for a restart FD event to
          run ahead and touch not yet closed temporary tables.

          --source include/have_partition.inc
          --source include/have_innodb.inc
          --source include/master-slave.inc
           
          --connection server_2
          SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
          --source include/stop_slave.inc
          SET GLOBAL slave_parallel_threads=8;
           
          --connection server_1
          CREATE TABLE E (
          			pk INTEGER AUTO_INCREMENT,
          			col_int_nokey INTEGER /*! NULL */,
          			col_int_key INTEGER /*! NULL */,
           
          			col_date_key DATE /*! NULL */,
          			col_date_nokey DATE /*! NULL */,
           
          			col_time_key TIME /*! NULL */,
          			col_time_nokey TIME /*! NULL */,
           
          			col_datetime_key DATETIME /*! NULL */,
          			col_datetime_nokey DATETIME /*! NULL */,
           
          			col_varchar_key VARCHAR(1) /*! NULL */,
          			col_varchar_nokey VARCHAR(1) /*! NULL */,
           
          			PRIMARY KEY (pk),
          			KEY (col_int_key),
          			KEY (col_date_key),
          			KEY (col_time_key),
          			KEY (col_datetime_key),
          			KEY (col_varchar_key, col_int_key)
          		)  ENGINE=InnoDB;
           
          ALTER TABLE `E` PARTITION BY KEY() PARTITIONS 5;
          ALTER TABLE `E` REMOVE PARTITIONING;
          --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
          wait
          EOF
          --shutdown_server 30
          --source include/wait_until_disconnected.inc
          --connection default
          --source include/wait_until_disconnected.inc
          --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
          restart:
          EOF
          --enable_reconnect
          --source include/wait_until_connected_again.inc
          --connection server_1
          --enable_reconnect
          --source include/wait_until_connected_again.inc
          CREATE TABLE t1 (a INT PRIMARY KEY);
          --save_master_pos
           
          --connection server_2
          --source include/start_slave.inc
          --sync_with_master
           
           
          # Clean up.
          --connection server_2
          --source include/stop_slave.inc
          SET GLOBAL slave_parallel_threads=@old_parallel_threads;
          --source include/start_slave.inc
           
          --connection server_1
          DROP TABLE `E`;
          DROP TABLE t1;
          --source include/rpl_end.inc

          diff --git a/sql/sql_class.cc b/sql/sql_class.cc
          index 7de565d..fbb2716 100644
          --- a/sql/sql_class.cc
          +++ b/sql/sql_class.cc
          @@ -6994,6 +6994,7 @@ wait_for_commit::wakeup_subsequent_commits2(int wakeup_error)
               a mutex), so no extra explicit barrier is needed here.
             */
             wakeup_subsequent_commits_running= false;
          +my_sleep(21000);
           }
           
           

          knielsen Kristian Nielsen added a comment - Here is an MTR test case. It triggers the assertion reliably for me by adding the below sleep in the code. It looks like the wait_for_workers_idle() function returns too early. It does wait_for_prior_commit(), at which point all transactions are committed (at least written into binlog), but table locks may not yet be released. This allows the close_temporary_tables() for a restart FD event to run ahead and touch not yet closed temporary tables. --source include/have_partition.inc --source include/have_innodb.inc --source include/master-slave.inc   --connection server_2 SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads; --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=8;   --connection server_1 CREATE TABLE E ( pk INTEGER AUTO_INCREMENT, col_int_nokey INTEGER /*! NULL */, col_int_key INTEGER /*! NULL */,   col_date_key DATE /*! NULL */, col_date_nokey DATE /*! NULL */,   col_time_key TIME /*! NULL */, col_time_nokey TIME /*! NULL */,   col_datetime_key DATETIME /*! NULL */, col_datetime_nokey DATETIME /*! NULL */,   col_varchar_key VARCHAR(1) /*! NULL */, col_varchar_nokey VARCHAR(1) /*! NULL */,   PRIMARY KEY (pk), KEY (col_int_key), KEY (col_date_key), KEY (col_time_key), KEY (col_datetime_key), KEY (col_varchar_key, col_int_key) ) ENGINE=InnoDB;   ALTER TABLE `E` PARTITION BY KEY() PARTITIONS 5; ALTER TABLE `E` REMOVE PARTITIONING; --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect wait EOF --shutdown_server 30 --source include/wait_until_disconnected.inc --connection default --source include/wait_until_disconnected.inc --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect restart: EOF --enable_reconnect --source include/wait_until_connected_again.inc --connection server_1 --enable_reconnect --source include/wait_until_connected_again.inc CREATE TABLE t1 (a INT PRIMARY KEY); --save_master_pos   --connection server_2 --source include/start_slave.inc --sync_with_master     # Clean up. --connection server_2 --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=@old_parallel_threads; --source include/start_slave.inc   --connection server_1 DROP TABLE `E`; DROP TABLE t1; --source include/rpl_end.inc diff --git a/sql/sql_class.cc b/sql/sql_class.cc index 7de565d..fbb2716 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -6994,6 +6994,7 @@ wait_for_commit::wakeup_subsequent_commits2(int wakeup_error) a mutex), so no extra explicit barrier is needed here. */ wakeup_subsequent_commits_running= false; +my_sleep(21000); }
          knielsen Kristian Nielsen added a comment - http://lists.askmonty.org/pipermail/commits/2015-May/007945.html

          Pushed to 10.0 and 10.1

          knielsen Kristian Nielsen added a comment - Pushed to 10.0 and 10.1

          People

            knielsen Kristian Nielsen
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.