Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32257

Assertion `thd->is_killed()' failed in XID_cache_element::acquired_to_recovered (opt) or acquire_xid (dbg) on XA ROLLBACK or XA COMMIT

Details

    Description

      When the following SQL is executed in a binlogging server:

      SET pseudo_slave_mode=1;
      XA START 'a';
      XA END 'a';
      XA PREPARE 'a';
      XA ROLLBACK 'a';
      

      The server will log into the binary log only the XA ROLLBACK part:

      | master-bin.000001 |   4 | Format_desc       |         1 |         256 | Server ver: 10.6.16-MariaDB-debug-log, Binlog ver: 4 |
      | master-bin.000001 | 256 | Start_encryption  |         1 |         296 |                                                      |
      | master-bin.000001 | 296 | Gtid_list         |         1 |         325 | []                                                   |
      | master-bin.000001 | 325 | Binlog_checkpoint |         1 |         369 | master-bin.000001                                    |
      | master-bin.000001 | 369 | Gtid              |         1 |         412 | GTID 0-1-1                                           |
      | master-bin.000001 | 412 | Query             |         1 |         502 | XA ROLLBACK X'61',X'',1   
      

      Which should not been done. This type of read-only XA should not produce any events, as well as its even prepared XID should not be recovered after reconnect or the server crash-recovery.

      Previous info: this also leads to a crash on the slave in a standard master/slave setup:

      10.6.14-9-enterprise-MDEV-30165-30619-31949 57b39cb8e1e4b374e0e138b286a0f9cb0669e401 (Optimized)

      2023-09-26 13:17:23 6 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './relaylog.000001' position: 4; GTID position ''
      2023-09-26 13:18:01 69 [ERROR] XA COMMIT of GTID 0-1-%could not complete after having been logged into binary log
      

      10.6.14-9-enterprise-MDEV-30165-30619-31949 57b39cb8e1e4b374e0e138b286a0f9cb0669e401 (Optimized)

      Core was generated by `/test/PATCH_EMD260923-mariadb-10.6.14-9-linux-x86_64-opt/bin/mariadbd --no-defa'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  std::__atomic_base<int>::fetch_or (__m=std::memory_order_relaxed,
          __i=536870912, this=0x0) at /usr/include/c++/11/bits/atomic_base.h:646
      646         fetch_or(__int_type __i,
      [Current thread is 1 (Thread 0x154f3d4b1640 (LWP 1079862))]
      (gdb) bt
      #0  std::__atomic_base<int>::fetch_or (__m=std::memory_order_relaxed, __i=536870912, this=0x0) at /usr/include/c++/11/bits/atomic_base.h:646
      #1  XID_cache_element::acquired_to_recovered (this=0x0) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/xa.cc:119
      #2  trans_xa_rollback (thd=thd@entry=0x154e08000c68) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/xa.cc:1081
      #3  0x00005577e38265e4 in mysql_execute_command (thd=0x154e08000c68, is_called_from_prepared_stmt=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/sql_parse.cc:5897
      #4  0x00005577e3829156 in mysql_parse (thd=0x154e08000c68, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x154f3d4b08f0) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/sql_parse.cc:8063
      #5  0x00005577e3bacac3 in Query_log_event::do_apply_event (this=0x154f0005e648, rgi=0x154f0005d1d0, query_arg=0x154f0005e7b3 "XA ROLLBACK X'61',X'',1", q_len_arg=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/sql_class.h:242
      #6  0x00005577e375e086 in Log_event::apply_event (rgi=0x154f0005d1d0, this=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/log_event.h:1499
      #7  apply_event_and_update_pos_apply (ev=ev@entry=0x154f0005e648, thd=thd@entry=0x154e08000c68, rgi=rgi@entry=0x154f0005d1d0, reason=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/slave.cc:3892
      #8  0x00005577e3764962 in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x154f0005e648, thd=thd@entry=0x154e08000c68, rgi=rgi@entry=0x154f0005d1d0) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/slave.cc:4078
      #9  0x00005577e39b4148 in rpt_handle_event (qev=0x154f0005cd78, rpt=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/rpl_parallel.cc:134
      #10 0x00005577e39b874c in handle_rpl_parallel_thread (arg=arg@entry=0x154f000233b8) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/sql/rpl_parallel.cc:1476
      #11 0x00005577e3cf5e91 in pfs_spawn_thread (arg=0x154f000349d8) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_opt/storage/perfschema/pfs.cc:2201
      #12 0x0000154f63094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #13 0x0000154f63126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      Debug:

      10.6.14-9-enterprise-MDEV-30165-30619-31949 57b39cb8e1e4b374e0e138b286a0f9cb0669e401 (Debug)

      2023-09-26 13:11:31 6 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './relaylog.000001' position: 4; GTID position ''
      mariadbd: /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/log.cc:2043: bool acquire_xid(THD*): Assertion `thd->is_killed()' failed.
      

      10.6.14-9-enterprise-MDEV-30165-30619-31949 57b39cb8e1e4b374e0e138b286a0f9cb0669e401 (Debug)

      Core was generated by `/test/PATCH_EMD260923-mariadb-10.6.14-9-linux-x86_64-dbg/bin/mariadbd --no-defa'.
      Program terminated with signal SIGABRT, Aborted.
      #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22889389626944)
          at ./nptl/pthread_kill.c:44
      [Current thread is 1 (Thread 0x14d159f01640 (LWP 3999773))]
      (gdb) bt
      #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22889389626944) at ./nptl/pthread_kill.c:44
      #1  __pthread_kill_internal (signo=6, threadid=22889389626944) at ./nptl/pthread_kill.c:78
      #2  __GI___pthread_kill (threadid=22889389626944, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
      #3  0x000014d17f442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
      #4  0x000014d17f4287f3 in __GI_abort () at ./stdlib/abort.c:79
      #5  0x000014d17f42871b in __assert_fail_base (fmt=0x14d17f5dd150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d688113b8d "thd->is_killed()", file=0x55d688114498 "/test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/log.cc", line=2043, function=<optimized out>) at ./assert/assert.c:92
      #6  0x000014d17f439e96 in __GI___assert_fail (assertion=0x55d688113b8d "thd->is_killed()", file=0x55d688114498 "/test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/log.cc", line=2043, function=0x55d688113b9e "bool acquire_xid(THD*)") at ./assert/assert.c:101
      #7  0x000055d6877845ac in acquire_xid (thd=thd@entry=0x14cff8000f98) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/log.cc:2043
      #8  0x000055d687798a03 in binlog_rollback_by_xid (hton=0x55d6893c0988, xid=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/log.cc:2130
      #9  0x000055d68764a3e9 in ha_commit_or_rollback_by_xid (xid=<optimized out>, commit=commit@entry=false) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/handler.cc:2355
      #10 0x000055d6875b28a5 in wsrep_ha_commit_or_rollback_by_xid (thd=thd@entry=0x14cff8000f98, commit=commit@entry=false) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/xa.cc:1445
      #11 0x000055d6875b5e54 in trans_xa_rollback (thd=thd@entry=0x14cff8000f98) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/xa.cc:1063
      #12 0x000055d68737c134 in mysql_execute_command (thd=thd@entry=0x14cff8000f98, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/sql_parse.cc:5897
      #13 0x000055d68737e1f3 in mysql_parse (thd=0x14cff8000f98, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14d159f007f0) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/sql_parse.cc:8063
      #14 0x000055d6877b50b6 in Query_log_event::do_apply_event (this=0x14d10006bc98, rgi=0x14d100066ab0, query_arg=0x14d10006be03 "XA ROLLBACK X'61',X'',1", q_len_arg=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/sql_class.h:242
      #15 0x000055d6877b5a0c in Query_log_event::do_apply_event (this=<optimized out>, rgi=<optimized out>) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/log_event_server.cc:1605
      #16 0x000055d6872a0714 in Log_event::apply_event (rgi=0x14d100066ab0, this=0x14d10006bc98) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/log_event.h:1499
      #17 apply_event_and_update_pos_apply (ev=ev@entry=0x14d10006bc98, thd=thd@entry=0x14cff8000f98, rgi=rgi@entry=0x14d100066ab0, reason=reason@entry=0) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/slave.cc:3892
      #18 0x000055d6872aa7d9 in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x14d10006bc98, thd=thd@entry=0x14cff8000f98, rgi=rgi@entry=0x14d100066ab0) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/slave.cc:4078
      #19 0x000055d687555383 in rpt_handle_event (qev=qev@entry=0x14d100066658, rpt=rpt@entry=0x14d100027e50) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/rpl_parallel.cc:134
      #20 0x000055d687559b08 in handle_rpl_parallel_thread (arg=arg@entry=0x14d100027e50) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/sql/rpl_parallel.cc:1476
      #21 0x000055d687919e72 in pfs_spawn_thread (arg=0x14d100039578) at /test/git-bisect/10.6.14-9-enterprise-MDEV-30165-30619-31949_dbg/storage/perfschema/pfs.cc:2201
      #22 0x000014d17f494b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #23 0x000014d17f526a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      Attachments

        Issue Links

          Activity

            The patch cb8f96142 looks good to me.

            bnestere Brandon Nesterenko added a comment - The patch cb8f96142 looks good to me.

            I see various commits relating to this ticket in bb-10.6-MDEV-31949. Is this ticket (all crashes and potentially the Unknown XID Error_code) deemed fully resolved in that branch? Thank you.

            Roel Roel Van de Paar added a comment - I see various commits relating to this ticket in bb-10.6-MDEV- 31949 . Is this ticket (all crashes and potentially the Unknown XID Error_code) deemed fully resolved in that branch? Thank you.
            Elkin Andrei Elkin added a comment -

            Roel, first I agree MDEV-31949 exposed this bug in its own and valuable way. I did not mean to dismiss your original collected evidence of crashes, just they were expected (implied to myself at least) in MDEV-31949.
            The bug is fixed in 10.6. The changes in MDEV-31949 can (and should) be viewed as pre-merge of the 10.6 commit.

            Thanks for the testing!

            Elkin Andrei Elkin added a comment - Roel , first I agree MDEV-31949 exposed this bug in its own and valuable way. I did not mean to dismiss your original collected evidence of crashes, just they were expected (implied to myself at least) in MDEV-31949 . The bug is fixed in 10.6. The changes in MDEV-31949 can (and should) be viewed as pre-merge of the 10.6 commit. Thanks for the testing!

            Elkin Understood, thank you.

            Not sure I understood the last line (The changes in...) - I would like to ensure that the bb-10.6-MDEV-31949 branch contains all fixes for this ticket MDEV-32257. Can you confirm the same yes/no? Thank you

            Roel Roel Van de Paar added a comment - Elkin Understood, thank you. Not sure I understood the last line ( The changes in... ) - I would like to ensure that the bb-10.6- MDEV-31949 branch contains all fixes for this ticket MDEV-32257 . Can you confirm the same yes/no? Thank you
            Elkin Andrei Elkin added a comment -

            Roel, to your question, the server segfault is tacked by the 31949. The impossible logging of the "orphan" XA-rollback - is done in the 10.6.
            In hindsight perhaps it'd be more proper to have two tickets reported, with the (bug) 31949 requiring the (bug) 10.6 (though the 31949 can be also reached by other path:s like the slave misconfiguration).

            Elkin Andrei Elkin added a comment - Roel , to your question, the server segfault is tacked by the 31949. The impossible logging of the "orphan" XA-rollback - is done in the 10.6. In hindsight perhaps it'd be more proper to have two tickets reported, with the (bug) 31949 requiring the (bug) 10.6 (though the 31949 can be also reached by other path:s like the slave misconfiguration).

            People

              Elkin Andrei Elkin
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.