Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22216

Assertion `xid_seqno > wsrep_seqno' failed in trx_rseg_update_wsrep_checkpoint upon concurrent DML/DDL

Details

    Description

      Note: Even with the injection below, the test is far from deterministic, the injection only makes the failure more probable. There is still a lot of race involved. On a rotating disk it fails fairly quickly for me (within a minute, usually less), but doesn't fail either on SSD or in shm. I suppose really fast rotating disks can also make reproducing problematic. It seems to be a very good case for debugging through an rr profile.

      10.4 edc3899d9 with provided injection

      mysqld: /data/src/10.4-bug/storage/innobase/trx/trx0rseg.cc:102: void trx_rseg_update_wsrep_checkpoint(trx_rsegf_t*, const XID*, mtr_t*): Assertion `xid_seqno > wsrep_seqno' failed.
      200410 22:01:26 [ERROR] mysqld got signal 6 ;
       
      #15 0x00007f86ceea6102 in __GI___assert_fail (assertion=0x55fcc5daf66c "xid_seqno > wsrep_seqno", file=0x55fcc5daf5a0 "/data/src/10.4-bug/storage/innobase/trx/trx0rseg.cc", line=102, function=0x55fcc5dafde0 <trx_rseg_update_wsrep_checkpoint(unsigned char*, xid_t const*, mtr_t*)::__PRETTY_FUNCTION__> "void trx_rseg_update_wsrep_checkpoint(trx_rsegf_t*, const XID*, mtr_t*)") at assert.c:101
      #16 0x000055fcc55ebc6f in trx_rseg_update_wsrep_checkpoint (rseg_header=0x7f868c664026 "", xid=0x55fcc7b7cfd0, mtr=0x7f86880ad5d0) at /data/src/10.4-bug/storage/innobase/trx/trx0rseg.cc:102
      #17 0x000055fcc55cd60c in trx_purge_add_undo_to_history (trx=0x7f8698bf8390, undo=@0x7f8698bf93b0: 0x7f86142bcb30, mtr=0x7f86880ad5d0) at /data/src/10.4-bug/storage/innobase/trx/trx0purge.cc:281
      #18 0x000055fcc55f7be7 in trx_write_serialisation_history (trx=0x7f8698bf8390, mtr=0x7f86880ad5d0) at /data/src/10.4-bug/storage/innobase/trx/trx0trx.cc:1135
      #19 0x000055fcc55fa015 in trx_commit_low (trx=0x7f8698bf8390, mtr=0x7f86880ad5d0) at /data/src/10.4-bug/storage/innobase/trx/trx0trx.cc:1574
      #20 0x000055fcc55fa175 in trx_commit (trx=0x7f8698bf8390) at /data/src/10.4-bug/storage/innobase/trx/trx0trx.cc:1642
      #21 0x000055fcc55fa65a in trx_commit_for_mysql (trx=0x7f8698bf8390) at /data/src/10.4-bug/storage/innobase/trx/trx0trx.cc:1788
      #22 0x000055fcc53843bc in innobase_commit_low (trx=0x7f8698bf8390) at /data/src/10.4-bug/storage/innobase/handler/ha_innodb.cc:4401
      #23 0x000055fcc53847e6 in innobase_commit_ordered_2 (trx=0x7f8698bf8390, thd=0x7f8614000d50) at /data/src/10.4-bug/storage/innobase/handler/ha_innodb.cc:4523
      #24 0x000055fcc5384ab2 in innobase_commit_ordered (hton=0x55fcc7b57f30, thd=0x7f8614000d50, all=false) at /data/src/10.4-bug/storage/innobase/handler/ha_innodb.cc:4583
      #25 0x000055fcc527a356 in TC_LOG::run_commit_ordered (this=0x55fcc65b4e00 <mysql_bin_log>, thd=0x7f8614000d50, all=false) at /data/src/10.4-bug/sql/log.cc:8944
      #26 0x000055fcc5277ff2 in MYSQL_BIN_LOG::trx_group_commit_leader (this=0x55fcc65b4e00 <mysql_bin_log>, leader=0x7f86880adfe0) at /data/src/10.4-bug/sql/log.cc:8178
      #27 0x000055fcc5276995 in MYSQL_BIN_LOG::write_transaction_to_binlog_events (this=0x55fcc65b4e00 <mysql_bin_log>, entry=0x7f86880adfe0) at /data/src/10.4-bug/sql/log.cc:7767
      #28 0x000055fcc5275e81 in MYSQL_BIN_LOG::write_transaction_to_binlog (this=0x55fcc65b4e00 <mysql_bin_log>, thd=0x7f8614000d50, cache_mngr=0x7f86141f9320, end_ev=0x7f86880ae180, all=false, using_stmt_cache=true, using_trx_cache=true) at /data/src/10.4-bug/sql/log.cc:7415
      #29 0x000055fcc5264def in binlog_flush_cache (thd=0x7f8614000d50, cache_mngr=0x7f86141f9320, end_ev=0x7f86880ae180, all=false, using_stmt=true, using_trx=true) at /data/src/10.4-bug/sql/log.cc:1786
      #30 0x000055fcc5265472 in binlog_commit_flush_xid_caches (thd=0x7f8614000d50, cache_mngr=0x7f86141f9320, all=false, xid=42) at /data/src/10.4-bug/sql/log.cc:1894
      #31 0x000055fcc527c8f9 in MYSQL_BIN_LOG::log_and_order (this=0x55fcc65b4e00 <mysql_bin_log>, thd=0x7f8614000d50, xid=42, all=false, need_prepare_ordered=false, need_commit_ordered=true) at /data/src/10.4-bug/sql/log.cc:9727
      #32 0x000055fcc514292f in ha_commit_trans (thd=0x7f8614000d50, all=false) at /data/src/10.4-bug/sql/handler.cc:1624
      #33 0x000055fcc4f7faf3 in trans_commit_stmt (thd=0x7f8614000d50) at /data/src/10.4-bug/sql/transaction.cc:436
      #34 0x000055fcc4df10ce in mysql_execute_command (thd=0x7f8614000d50) at /data/src/10.4-bug/sql/sql_parse.cc:6158
      #35 0x000055fcc4df6135 in mysql_parse (thd=0x7f8614000d50, rawbuf=0x7f8614014288 "INSERT /* QNO 21 CON_ID 17 */ INTO t1 (k) VALUES (0)", length=52, parser_state=0x7f86880b0190, is_com_multi=false, is_next_command=false) at /data/src/10.4-bug/sql/sql_parse.cc:7900
      #36 0x000055fcc4df5735 in wsrep_mysql_parse (thd=0x7f8614000d50, rawbuf=0x7f8614014288 "INSERT /* QNO 21 CON_ID 17 */ INTO t1 (k) VALUES (0)", length=52, parser_state=0x7f86880b0190, is_com_multi=false, is_next_command=false) at /data/src/10.4-bug/sql/sql_parse.cc:7704
      #37 0x000055fcc4de12fe in dispatch_command (command=COM_QUERY, thd=0x7f8614000d50, packet=0x7f8614008751 "INSERT /* QNO 21 CON_ID 17 */ INTO t1 (k) VALUES (0)", packet_length=52, is_com_multi=false, is_next_command=false) at /data/src/10.4-bug/sql/sql_parse.cc:1827
      #38 0x000055fcc4ddfad3 in do_command (thd=0x7f8614000d50) at /data/src/10.4-bug/sql/sql_parse.cc:1359
      #39 0x000055fcc4f66a9c in do_handle_one_connection (connect=0x55fcc8823e90) at /data/src/10.4-bug/sql/sql_connect.cc:1412
      #40 0x000055fcc4f667eb in handle_one_connection (arg=0x55fcc8823e90) at /data/src/10.4-bug/sql/sql_connect.cc:1316
      #41 0x00007f86cf720fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
      #42 0x00007f86cef6f4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      All threads' stack traces are attached.

      injection

      index 39035dc545e..c0d53381be0 100644
      --- a/storage/innobase/trx/trx0purge.cc
      +++ b/storage/innobase/trx/trx0purge.cc
      @@ -277,6 +277,7 @@ trx_purge_add_undo_to_history(const trx_t* trx, trx_undo_t*& undo, mtr_t* mtr)
       
       #ifdef WITH_WSREP
              if (wsrep_is_wsrep_xid(trx->xid)) {
      +        DBUG_EXECUTE_IF("sleep_before_update_checkpoint",my_sleep(100000););
                      trx_rseg_update_wsrep_checkpoint(rseg_header, trx->xid, mtr);
              }
       #endif
      

      To reproduce:

      • apply the debug injection above and build with -DCMAKE_BUILD_TYPE=Debug;
      • Run the RQG test (put real paths in the command line instead of placeholders):

        git clone https://github.com/MariaDB/randgen --branch mdev22216 rqg-mdev22216
        cd rqg-mdev22216
        . ./mdev22216.cmd --basedir=<basedir> --vardir=<vardir on the slowest disk that you have> --mysqld=--wsrep-provider=<location of Galera-4 library>
        

      The grammar is already in the RQG branch and in command-line parameters. For the record, the contents of the grammar:

      thread1_init:
          CREATE DATABASE IF NOT EXISTS db
          ; CREATE TABLE IF NOT EXISTS t1 (
              id INT NOT NULL auto_increment,
              k INT,
              PRIMARY KEY (id)
          ) ENGINE=InnoDB
          ; SET debug_dbug= '+d,sleep_before_update_checkpoint'
      ;
       
      thread1:
          INSERT INTO t1 (k) VALUES (0);
       
      thread2:
          CREATE TABLE db.t SELECT id FROM t1 LIMIT 0
          ; DROP TABLE IF EXISTS db.t
      ;
       
      thread3:
          CREATE DATABASE IF NOT EXISTS db
          ; DROP TABLE IF EXISTS x
      ;
      

      Reproducible on at least 10.4 and 10.5. I don't have information about earlier versions or about the effect on non-debug builds.

      rr-packed profile created on Debian 10 Buster: ftp://ftp.askmonty.org/public/rr-mdev22216-debian10-x86_64.tar.gz

      Attachments

        Activity

          Also see MDEV-21026

          Roel Roel Van de Paar added a comment - Also see MDEV-21026

          Could be related, on some reason I didn't find MDEV-21026 before. There are some noticeable differences though. Judging by the stack trace, MDEV-21026 happens upon executing replicated events (or rather, rolling back after an attempt to execute replicated events). This one happens upon writing to the binlog – in fact, replication isn't involved at all, it is a one-node setup. I'm not sure whether the root cause is the same. Maybe it is, or maybe it's one of those generic assertions which tend to fail on various unrelated occasions.

          elenst Elena Stepanova added a comment - Could be related, on some reason I didn't find MDEV-21026 before. There are some noticeable differences though. Judging by the stack trace, MDEV-21026 happens upon executing replicated events (or rather, rolling back after an attempt to execute replicated events). This one happens upon writing to the binlog – in fact, replication isn't involved at all, it is a one-node setup. I'm not sure whether the root cause is the same. Maybe it is, or maybe it's one of those generic assertions which tend to fail on various unrelated occasions.

          I could not reproduce with my external SSD using USB-C. In my understanding this is fixed in MDEV-24327.

          jplindst Jan Lindström (Inactive) added a comment - I could not reproduce with my external SSD using USB-C. In my understanding this is fixed in MDEV-24327 .

          The reasonable way to make sure whether it's the fix or the environment that made it disappear would be to try reproducing it in the same environment on a revision before the fix.

          elenst Elena Stepanova added a comment - The reasonable way to make sure whether it's the fix or the environment that made it disappear would be to try reproducing it in the same environment on a revision before the fix.

          People

            jplindst Jan Lindström (Inactive)
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.