Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25114

Crash: WSREP: invalid state ROLLED_BACK (FATAL)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 10.3.28, 10.2(EOL), 10.4(EOL), 10.5
    • 10.2.41, 10.3.32, 10.4.22, 10.5.13
    • Galera

    Description

      About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:

      2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
               at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
      2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039
      210310 18:22:48 [ERROR] mysqld got signal 6 ;
      

      Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to MDEV-25111, which we also encountered first time right after updating to 10.3.28.

      Attachments

        Issue Links

          Activity

            emaijala Ere Maijala added a comment -

            This wasn't an isolated occurrence. Another server crashed a couple of hours later:

            2021-03-10 21:01:10 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
            at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
            2021-03-10 21:01:10 0 [ERROR] WSREP: cancel commit bad exit: 7 34182537224
            210310 21:01:10 [ERROR] mysqld got signal 6 ;

            emaijala Ere Maijala added a comment - This wasn't an isolated occurrence. Another server crashed a couple of hours later: 2021-03-10 21:01:10 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735 2021-03-10 21:01:10 0 [ERROR] WSREP: cancel commit bad exit: 7 34182537224 210310 21:01:10 [ERROR] mysqld got signal 6 ;
            Peter2121 Peter2121 added a comment -

            Same issue on FreeBSD 12.2 (ZFS, jailed). Downgraded back to 10.3.27.

            Peter2121 Peter2121 added a comment - Same issue on FreeBSD 12.2 (ZFS, jailed). Downgraded back to 10.3.27.
            festr Martin Vit added a comment - - edited

            same issue here -

            2021-03-23 11:11:32 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
            at galera/src/replicator_smm.cpp:abort_trx():727
            2021-03-23 11:11:32 0 [ERROR] WSREP: cancel commit bad exit: 7 21859814393
            210323 11:11:32 [ERROR] mysqld got signal 6 ;

            it happened when doing ALTER table - adding partitions to a table (with intensive INSERTs)

            festr Martin Vit added a comment - - edited same issue here - 2021-03-23 11:11:32 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL) at galera/src/replicator_smm.cpp:abort_trx():727 2021-03-23 11:11:32 0 [ERROR] WSREP: cancel commit bad exit: 7 21859814393 210323 11:11:32 [ERROR] mysqld got signal 6 ; it happened when doing ALTER table - adding partitions to a table (with intensive INSERTs)
            stemid Stefan Midjich added a comment - - edited

            I just did an upgrade from 10.2 to 10.3 and by chance hit this release 10.3.28 and immediately the nodes started crashing. 3 node galera cluster, in total they've crashed over 10 times in the 15 hours since the upgrade finished.

            I'm afraid it's urgent for us to downgrade to 10.3.27 so I have no more diagnostics info to give you.

            OS: CentOS 7

            Log is same as previous reporters.

            As far as I know we do not have any ALTER TABLE use.

            Very heavy write use and a very large DB, 330GB.

            stemid Stefan Midjich added a comment - - edited I just did an upgrade from 10.2 to 10.3 and by chance hit this release 10.3.28 and immediately the nodes started crashing. 3 node galera cluster, in total they've crashed over 10 times in the 15 hours since the upgrade finished. I'm afraid it's urgent for us to downgrade to 10.3.27 so I have no more diagnostics info to give you. OS: CentOS 7 Log is same as previous reporters. As far as I know we do not have any ALTER TABLE use. Very heavy write use and a very large DB, 330GB.
            seppo Seppo Jaakola added a comment -

            I can reproduce similar crash with mtr test suite, the smoking gun seems to be the new bg_wsrep_kill_trx () background killer execution, debug log messages before assert give:

            2021-04-21 11:40:37 2 [Note] WSREP: Winning thread: 
               THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 1500
               SQL: INSERT INTO c VALUES (1, 0)^E<E5>^?`^S^B
            2021-04-21 11:40:37 2 [Note] WSREP: Victim thread: 
               THD: 259, mode: local, state: executing, conflict: cert failure, seqno: -1
               SQL: COMMIT
            2021-04-21 11:40:37 0 [Note] WSREP: BF kill (1, seqno: 1500), victim: (259) trx: 3743
            2021-04-21 11:40:37 0 [Note] WSREP: Aborting query: COMMIT conf 7 trx: -1
            2021-04-21 11:40:37 0 [Note] WSREP: victim 3743 in state 7
            2021-04-21 11:40:37 259 [Note] WSREP: cleanup transaction for LOCAL_STATE: COMMIT
            2021-04-21 11:40:37 259 [Note] WSREP: COMMIT failed, MDL released: 259
            2021-04-21 11:40:37 259 [Note] WSREP: cert failure, thd: 259 is_AC: 0, retry: 0 - 1 SQL: COMMIT
            2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB:  fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1502
            2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB:  fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1503
            2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB:  fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1504
            2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB:  fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1505
            2021-04-21 11:40:37 259 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15, mask = 1
            2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: dbug=d,apply_monitor_slave_enter_sync
            2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: dbug=
            2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: dbug=d,local_monitor_enter_sync
            2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: signal=apply_monitor_slave_enter_sync
            2021-04-21 11:40:37 2 [Note] WSREP: cluster conflict due to high priority abort for threads:
            2021-04-21 11:40:37 2 [Note] WSREP: Winning thread: 
               THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 1509
               SQL: UPDATE c SET f2 = 1 WHERE f1 = 1^E<E5>^?`^S^B
            2021-04-21 11:40:37 2 [Note] WSREP: Victim thread: 
               THD: 259, mode: local, state: committing, conflict: no conflict, seqno: 1510
               SQL: COMMIT
            2021-04-21 11:40:37 0 [Note] WSREP: BF kill (1, seqno: 1509), victim: (259) trx: 3765
            2021-04-21 11:40:37 0 [Note] WSREP: Aborting query: COMMIT conf 0 trx: 3765
            2021-04-21 11:40:37 259 [Note] WSREP: commit failed for reason: 3 conf 0
            2021-04-21 11:40:37 0 [Note] WSREP: kill query for: 259
            2021-04-21 11:40:37 0 [Note] WSREP: kill trx QUERY_COMMITTING for 3765
            2021-04-21 11:40:37 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
            

            seppo Seppo Jaakola added a comment - I can reproduce similar crash with mtr test suite, the smoking gun seems to be the new bg_wsrep_kill_trx () background killer execution, debug log messages before assert give: 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Winning thread: THD: 2 , mode: applier, state: executing, conflict: no conflict, seqno: 1500 SQL: INSERT INTO c VALUES ( 1 , 0 )^E<E5>^?`^S^B 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Victim thread: THD: 259 , mode: local, state: executing, conflict: cert failure, seqno: - 1 SQL: COMMIT 2021 - 04 - 21 11 : 40 : 37 0 [Note] WSREP: BF kill ( 1 , seqno: 1500 ), victim: ( 259 ) trx: 3743 2021 - 04 - 21 11 : 40 : 37 0 [Note] WSREP: Aborting query: COMMIT conf 7 trx: - 1 2021 - 04 - 21 11 : 40 : 37 0 [Note] WSREP: victim 3743 in state 7 2021 - 04 - 21 11 : 40 : 37 259 [Note] WSREP: cleanup transaction for LOCAL_STATE: COMMIT 2021 - 04 - 21 11 : 40 : 37 259 [Note] WSREP: COMMIT failed, MDL released: 259 2021 - 04 - 21 11 : 40 : 37 259 [Note] WSREP: cert failure, thd: 259 is_AC: 0 , retry: 0 - 1 SQL: COMMIT 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab: 1502 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab: 1503 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab: 1504 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab: 1505 2021 - 04 - 21 11 : 40 : 37 259 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15 , mask = 1 2021 - 04 - 21 11 : 40 : 37 260 [Note] WSREP: refresh_provider_options: dbug=d,apply_monitor_slave_enter_sync 2021 - 04 - 21 11 : 40 : 37 260 [Note] WSREP: refresh_provider_options: dbug= 2021 - 04 - 21 11 : 40 : 37 260 [Note] WSREP: refresh_provider_options: dbug=d,local_monitor_enter_sync 2021 - 04 - 21 11 : 40 : 37 260 [Note] WSREP: refresh_provider_options: signal=apply_monitor_slave_enter_sync 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: cluster conflict due to high priority abort for threads: 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Winning thread: THD: 2 , mode: applier, state: executing, conflict: no conflict, seqno: 1509 SQL: UPDATE c SET f2 = 1 WHERE f1 = 1 ^E<E5>^?`^S^B 2021 - 04 - 21 11 : 40 : 37 2 [Note] WSREP: Victim thread: THD: 259 , mode: local, state: committing, conflict: no conflict, seqno: 1510 SQL: COMMIT 2021 - 04 - 21 11 : 40 : 37 0 [Note] WSREP: BF kill ( 1 , seqno: 1509 ), victim: ( 259 ) trx: 3765 2021 - 04 - 21 11 : 40 : 37 0 [Note] WSREP: Aborting query: COMMIT conf 0 trx: 3765 2021 - 04 - 21 11 : 40 : 37 259 [Note] WSREP: commit failed for reason: 3 conf 0 2021 - 04 - 21 11 : 40 : 37 0 [Note] WSREP: kill query for : 259 2021 - 04 - 21 11 : 40 : 37 0 [Note] WSREP: kill trx QUERY_COMMITTING for 3765 2021 - 04 - 21 11 : 40 : 37 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
            seppo Seppo Jaakola added a comment - - edited

            This is probably a regression of MDEV-23328 , which brought in the background killer thread. Galera suite mtr tests can reproduce this issue, e.g.: galera.MW-369

            stack trace of crashing thread:

            #0  __pthread_kill (threadid=<optimized out>, signo=6)
                at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
            #1  0x000055b6444b5731 in my_write_core (sig=6)
                at /home/seppo/work/wsrep/mariadb-server/mysys/stacktrace.c:386
            #2  0x000055b643c1e72f in handle_fatal_signal (sig=6)
                at /home/seppo/work/wsrep/mariadb-server/sql/signal_handler.cc:355
            #3  <signal handler called>
            #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #5  0x00007f7e5fc8a859 in __GI_abort () at abort.c:79
            #6  0x000055b643e7833e in bg_wsrep_kill_trx (void_arg=0x7f7e3c0c9360)
                at /home/seppo/work/wsrep/mariadb-server/storage/innobase/handler/ha_innodb.cc:18978
            #7  0x000055b6438ec1c4 in handle_manager (arg=0x0)
                at /home/seppo/work/wsrep/mariadb-server/sql/sql_manager.cc:112
            #8  0x000055b644443337 in pfs_spawn_thread (arg=0x55b6479e4b20)
                at /home/seppo/work/wsrep/mariadb-server/storage/perfschema/pfs.cc:1869
            #9  0x00007f7e5fe62609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #10 0x00007f7e5fd87293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            
            

            seppo Seppo Jaakola added a comment - - edited This is probably a regression of MDEV-23328 , which brought in the background killer thread. Galera suite mtr tests can reproduce this issue, e.g.: galera.MW-369 stack trace of crashing thread: # 0 __pthread_kill (threadid=<optimized out>, signo= 6 ) at ../sysdeps/unix/sysv/linux/pthread_kill.c: 56 # 1 0x000055b6444b5731 in my_write_core (sig= 6 ) at /home/seppo/work/wsrep/mariadb-server/mysys/stacktrace.c: 386 # 2 0x000055b643c1e72f in handle_fatal_signal (sig= 6 ) at /home/seppo/work/wsrep/mariadb-server/sql/signal_handler.cc: 355 # 3 <signal handler called> # 4 __GI_raise (sig=sig @entry = 6 ) at ../sysdeps/unix/sysv/linux/raise.c: 50 # 5 0x00007f7e5fc8a859 in __GI_abort () at abort.c: 79 # 6 0x000055b643e7833e in bg_wsrep_kill_trx (void_arg= 0x7f7e3c0c9360 ) at /home/seppo/work/wsrep/mariadb-server/storage/innobase/handler/ha_innodb.cc: 18978 # 7 0x000055b6438ec1c4 in handle_manager (arg= 0x0 ) at /home/seppo/work/wsrep/mariadb-server/sql/sql_manager.cc: 112 # 8 0x000055b644443337 in pfs_spawn_thread (arg= 0x55b6479e4b20 ) at /home/seppo/work/wsrep/mariadb-server/storage/perfschema/pfs.cc: 1869 # 9 0x00007f7e5fe62609 in start_thread (arg=<optimized out>) at pthread_create.c: 477 # 10 0x00007f7e5fd87293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S: 95
            ralf.gebhardt Ralf Gebhardt added a comment - - edited

            serg, this task has been discussed in a call with seppo. This seems to be a regression related to changes done for "kill query", and it is hard for Galera to deal with a kill query as such. So the question came up if kill query can be supported at all for Galera. What is your opinion, should we disallow kill query for Galera?

            ralf.gebhardt Ralf Gebhardt added a comment - - edited serg , this task has been discussed in a call with seppo . This seems to be a regression related to changes done for "kill query", and it is hard for Galera to deal with a kill query as such. So the question came up if kill query can be supported at all for Galera. What is your opinion, should we disallow kill query for Galera?

            How can KILL be disallowed for galera? Simply if a server is part of a cluster, then KILL is not allowed? That seems to be a bit too drastic, don't you think?

            serg Sergei Golubchik added a comment - How can KILL be disallowed for galera? Simply if a server is part of a cluster, then KILL is not allowed? That seems to be a bit too drastic, don't you think?

            Unfortunately, PR https://github.com/MariaDB/server/pull/1843 is not correct based on pquery QA executed by ramesh. There is at least two bugs found on this version.

            • Bug 1:

              (rr) where
              #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
              #1  0x0000145a0820e859 in __GI_abort () at abort.c:79
              #2  0x0000145a0820e729 in __assert_fail_base (fmt=0x145a083a4588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
                  assertion=0x55c620df5f79 "!is_owned()", 
                  file=0x55c620df5f88 "/test/mtest/10.3_dbg/storage/innobase/include/sync0policy.ic", line=64, 
                  function=<optimized out>) at assert.c:92
              #3  0x0000145a0821ff36 in __GI___assert_fail (assertion=assertion@entry=0x55c620df5f79 "!is_owned()", 
                  file=file@entry=0x55c620df5f88 "/test/mtest/10.3_dbg/storage/innobase/include/sync0policy.ic", 
                  line=line@entry=64, 
                  function=function@entry=0x55c620df5fc8 "void MutexDebug<Mutex>::enter(const Mutex*, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101
              #4  0x000055c6205aa5c1 in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x55c6214cf510 <lock_sys+80>, 
                  mutex=0x55c6214cf500 <lock_sys+64>, 
                  name=name@entry=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", 
                  line=line@entry=5186) at /test/mtest/10.3_dbg/storage/innobase/include/sync0types.h:979
              #5  0x000055c620599e94 in GenericPolicy<TTASEventMutex<GenericPolicy> >::enter (line=5186, 
                  filename=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", mutex=..., 
                  this=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/include/sync0policy.h:339
              #6  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (line=5186, 
                  name=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", n_delay=4, n_spins=30, 
                  this=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/include/ib0mutex.h:590
              #7  innobase_kill_query (thd=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:5186
              #8  0x000055c620358b27 in kill_handlerton (thd=thd@entry=0x7d1080001fe0, plugin=<optimized out>, 
                  level=level@entry=0x2a1c4ce28d8c) at /test/mtest/10.3_dbg/sql/handler.cc:841
              #9  0x000055c6200dc552 in plugin_foreach_with_mask (thd=thd@entry=0x7d1080001fe0, 
                  func=func@entry=0x55c620358aae <kill_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, 
                  state_mask=state_mask@entry=8, arg=arg@entry=0x2a1c4ce28d8c) at /test/mtest/10.3_dbg/sql/sql_plugin.cc:2435
              #10 0x000055c620359f34 in ha_kill_query (thd=thd@entry=0x7d1080001fe0, level=<optimized out>)
              --Type <RET> for more, q to quit, c to continue without paging-- 
                  at /test/mtest/10.3_dbg/sql/handler.cc:849
              #11 0x000055c62006d27e in THD::awake_no_mutex (this=this@entry=0x7d1080001fe0, 
                  state_to_set=state_to_set@entry=KILL_QUERY) at /test/mtest/10.3_dbg/include/mysql/psi/mysql_thread.h:717
              #12 0x000055c6202a3022 in THD::awake (state_to_set=KILL_QUERY, this=0x7d1080001fe0)
                  at /test/mtest/10.3_dbg/sql/sql_class.h:3332
              #13 wsrep_thd_awake (thd=thd@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', kill_signal=kill_signal@entry=4)
                  at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2750
              #14 0x000055c620596fd7 in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x145a18001250, 
                  bf_trx=bf_trx@entry=0x49602d628380, victim_trx=victim_trx@entry=0x49602d60d0d0, signal=signal@entry=1, 
                  kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19071
              #15 0x000055c62059cea0 in wsrep_abort_transaction (bf_thd=0x145a18001250, victim_thd=<optimized out>, 
                  signal=1 '\001', kill_signal=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19170
              #16 0x000055c62036956d in ha_abort_transaction (bf_thd=bf_thd@entry=0x145a18001250, 
                  victim_thd=victim_thd@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', kill_signal=kill_signal@entry=4)
                  at /test/mtest/10.3_dbg/sql/handler.cc:6769
              #17 0x000055c6202b0945 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x145a18001250, 
                  victim_thd_ptr=victim_thd_ptr@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', 
                  kill_signal=kill_signal@entry=KILL_QUERY) at /test/mtest/10.3_dbg/sql/wsrep_thd.cc:828
              #18 0x000055c6202a0ccf in wsrep_grant_mdl_exception (requestor_ctx=requestor_ctx@entry=0x145a18001370, 
                  ticket=ticket@entry=0x7d10800083d0, key=key@entry=0x7d1080068630)
                  at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2049
              #19 0x000055c620206939 in MDL_lock::can_grant_lock (this=this@entry=0x7d1080068630, type_arg=MDL_EXCLUSIVE, 
                  requestor_ctx=requestor_ctx@entry=0x145a18001370, ignore_lock_priority=ignore_lock_priority@entry=false)
                  at /test/mtest/10.3_dbg/sql/mdl.cc:1594
              #20 0x000055c620207f66 in MDL_context::try_acquire_lock_impl (this=this@entry=0x145a18001370, 
                  mdl_request=mdl_request@entry=0x145a18012188, out_ticket=out_ticket@entry=0x2a1c4ce292d8)
                  at /test/mtest/10.3_dbg/sql/mdl.cc:1925
              #21 0x000055c620208179 in MDL_context::acquire_lock (this=this@entry=0x145a18001370, mdl_request=0x145a18012188, 
              --Type <RET> for more, q to quit, c to continue without paging--
                  lock_wait_timeout=lock_wait_timeout@entry=0) at /test/mtest/10.3_dbg/sql/mdl.cc:2069
              #22 0x000055c620208ce8 in MDL_context::acquire_locks (this=this@entry=0x145a18001370, 
                  mdl_requests=mdl_requests@entry=0x2a1c4ce29430, lock_wait_timeout=0) at /test/mtest/10.3_dbg/sql/mdl.cc:2268
              #23 0x000055c62004a15b in lock_table_names (thd=thd@entry=0x145a18001250, options=..., 
                  tables_start=0x145a18011d20, tables_end=<optimized out>, lock_wait_timeout=<optimized out>, 
                  flags=flags@entry=0) at /test/mtest/10.3_dbg/sql/sql_base.cc:3923
              #24 0x000055c62004cd3e in open_tables (thd=thd@entry=0x145a18001250, options=..., 
                  start=start@entry=0x2a1c4ce297c8, counter=counter@entry=0x2a1c4ce297dc, flags=flags@entry=0, 
                  prelocking_strategy=prelocking_strategy@entry=0x2a1c4ce29878) at /test/mtest/10.3_dbg/sql/sql_lex.h:1585
              #25 0x000055c62004e279 in open_and_lock_tables (thd=thd@entry=0x145a18001250, options=..., tables=<optimized out>, 
                  tables@entry=0x145a18011d20, derived=derived@entry=false, flags=flags@entry=0, 
                  prelocking_strategy=prelocking_strategy@entry=0x2a1c4ce29878) at /test/mtest/10.3_dbg/sql/sql_base.cc:5109
              #26 0x000055c6201886cc in open_and_lock_tables (flags=0, derived=false, tables=0x145a18011d20, options=..., 
                  thd=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_base.h:495
              #27 mysql_create_table (thd=thd@entry=0x145a18001250, create_table=create_table@entry=0x145a18011d20, 
                  create_info=create_info@entry=0x2a1c4ce29a00, alter_info=alter_info@entry=0x2a1c4ce29940)
                  at /test/mtest/10.3_dbg/sql/sql_table.cc:5282
              #28 0x000055c62018a0c5 in Sql_cmd_create_table_like::execute (this=0x145a18011d00, thd=0x145a18001250)
                  at /test/mtest/10.3_dbg/sql/sql_table.cc:11378
              #29 0x000055c6200ce17c in mysql_execute_command (thd=thd@entry=0x145a18001250)
                  at /test/mtest/10.3_dbg/sql/sql_parse.cc:6075
              #30 0x000055c6200d017f in mysql_parse (thd=thd@entry=0x145a18001250, 
                  rawbuf=rawbuf@entry=0x145a18011ae8 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592)", 
                  length=length@entry=186, parser_state=parser_state@entry=0x2a1c4ce2b530, 
                  is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
                  at /test/mtest/10.3_dbg/sql/sql_parse.cc:7870
              #31 0x000055c6200d183e in wsrep_mysql_parse (thd=thd@entry=0x145a18001250, 
              --Type <RET> for more, q to quit, c to continue without paging--
                  rawbuf=0x145a18011ae8 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592)", length=186, 
                  parser_state=parser_state@entry=0x2a1c4ce2b530, is_com_multi=is_com_multi@entry=false, 
                  is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7662
              #32 0x000055c6200d2959 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x145a18001250, 
                  packet=packet@entry=0x145a18009301 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592);", 
                  packet_length=packet_length@entry=187, is_com_multi=is_com_multi@entry=false, 
                  is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_class.h:1152
              #33 0x000055c6200d5ace in do_command (thd=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_parse.cc:1398
              #34 0x000055c6201fdd54 in do_handle_one_connection (connect=connect@entry=0x55c6231056e0)
                  at /test/mtest/10.3_dbg/sql/sql_connect.cc:1403
              #35 0x000055c6201fdf8b in handle_one_connection (arg=0x55c6231056e0)
                  at /test/mtest/10.3_dbg/sql/sql_connect.cc:1308
              #36 0x000055c621e06609 in start_thread (arg=<optimized out>) at pthread_create.c:477
              #37 0x0000145a0830b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
              

            • Bug 2:

              (rr) p file
              $2 = 0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc"
              (rr) p line
              $3 = 2664
              (rr) p mp->file
              $4 = 0x562c3f0aea20 "/test/mtest/10.3_dbg/sql/sql_class.cc"
              (rr) p mp->line
              $5 = 1980
              (rr) where
              #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
              #1  0x00001fd34b9dd859 in __GI_abort () at abort.c:79
              #2  0x0000562c3f0158be in safe_mutex_lock (mp=mp@entry=0x1d60100028b8, my_flags=my_flags@entry=0, 
                  file=file@entry=0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc", line=line@entry=2664)
                  at /test/mtest/10.3_dbg/mysys/thr_mutex.c:265
              #3  0x0000562c3e70c51f in inline_mysql_mutex_lock (src_line=2664, 
                  src_file=0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc", that=0x1d60100028b8)
                  at /test/mtest/10.3_dbg/include/mysql/psi/mysql_thread.h:717
              #4  wsrep_thd_LOCK (thd=thd@entry=0x1d6010000d90) at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2664
              #5  0x0000562c3ea029c1 in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x1455c8000d90, 
                  bf_trx=bf_trx@entry=0x30e64a067468, victim_trx=victim_trx@entry=0x30e64a064240, signal=signal@entry=0, 
                  kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:18937
              #6  0x0000562c3ea08ea0 in wsrep_abort_transaction (bf_thd=0x1455c8000d90, victim_thd=<optimized out>, 
                  signal=0 '\000', kill_signal=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19170
              #7  0x0000562c3e7d556d in ha_abort_transaction (bf_thd=bf_thd@entry=0x1455c8000d90, 
                  victim_thd=victim_thd@entry=0x1d6010000d90, signal=signal@entry=0 '\000', kill_signal=kill_signal@entry=4)
                  at /test/mtest/10.3_dbg/sql/handler.cc:6769
              #8  0x0000562c3e71c945 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x1455c8000d90, 
                  victim_thd_ptr=victim_thd_ptr@entry=0x1d6010000d90, signal=signal@entry=0 '\000', 
                  kill_signal=kill_signal@entry=KILL_QUERY) at /test/mtest/10.3_dbg/sql/wsrep_thd.cc:828
              #9  0x0000562c3e4ce83c in THD::notify_shared_lock (this=0x1455c8000d90, ctx_in_use=<optimized out>, 
                  needs_thr_lock_abort=<optimized out>) at /test/mtest/10.3_dbg/sql/sql_class.cc:2002
              #10 0x0000562c3e674325 in MDL_lock::notify_conflicting_locks (ctx=0x1455c8000eb0, this=0xde604090870)
                  at /test/mtest/10.3_dbg/sql/mdl.h:848
              #11 MDL_context::acquire_lock (this=this@entry=0x1455c8000eb0, mdl_request=0x1455c8011c18, 
                  lock_wait_timeout=lock_wait_timeout@entry=86400) at /test/mtest/10.3_dbg/sql/mdl.cc:2114
              #12 0x0000562c3e674ce8 in MDL_context::acquire_locks (this=this@entry=0x1455c8000eb0, 
                  mdl_requests=mdl_requests@entry=0x77463330c5b0, lock_wait_timeout=86400)
              --Type <RET> for more, q to quit, c to continue without paging--
                  at /test/mtest/10.3_dbg/sql/mdl.cc:2268
              #13 0x0000562c3e4b615b in lock_table_names (thd=thd@entry=0x1455c8000d90, options=..., 
                  tables_start=0x1455c80117b0, tables_end=<optimized out>, lock_wait_timeout=<optimized out>, 
                  flags=flags@entry=0) at /test/mtest/10.3_dbg/sql/sql_base.cc:3923
              #14 0x0000562c3e4b8d3e in open_tables (thd=thd@entry=0x1455c8000d90, options=..., 
                  start=start@entry=0x77463330c948, counter=counter@entry=0x77463330c95c, flags=flags@entry=0, 
                  prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_lex.h:1585
              #15 0x0000562c3e4ba279 in open_and_lock_tables (thd=thd@entry=0x1455c8000d90, options=..., tables=<optimized out>, 
                  tables@entry=0x1455c80117b0, derived=derived@entry=false, flags=flags@entry=0, 
                  prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_base.cc:5109
              #16 0x0000562c3e4ba418 in open_and_lock_tables (prelocking_strategy=0x77463330ca90, flags=0, derived=false, 
                  tables=0x1455c80117b0, thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_base.h:263
              #17 open_n_lock_single_table (thd=thd@entry=0x1455c8000d90, table_l=table_l@entry=0x1455c80117b0, 
                  lock_type=lock_type@entry=TL_READ_NO_INSERT, flags=flags@entry=0, 
                  prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_base.cc:4952
              #18 0x0000562c3e602e23 in open_n_lock_single_table (flags=0, lock_type=TL_READ_NO_INSERT, table_l=0x1455c80117b0, 
                  thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_base.h:482
              #19 mysql_create_or_drop_trigger (thd=thd@entry=0x1455c8000d90, tables=<optimized out>, create=create@entry=true)
                  at /test/mtest/10.3_dbg/sql/sql_trigger.cc:558
              #20 0x0000562c3e539b22 in mysql_execute_command (thd=thd@entry=0x1455c8000d90)
                  at /test/mtest/10.3_dbg/sql/sql_parse.cc:5913
              #21 0x0000562c3e53c17f in mysql_parse (thd=thd@entry=0x1455c8000d90, 
                  rawbuf=rawbuf@entry=0x1455c8011628 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col", length=length@entry=80, parser_state=parser_state@entry=0x77463330e530, 
                  is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
                  at /test/mtest/10.3_dbg/sql/sql_parse.cc:7870
              #22 0x0000562c3e53d83e in wsrep_mysql_parse (thd=thd@entry=0x1455c8000d90, 
                  rawbuf=0x1455c8011628 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col", 
              --Type <RET> for more, q to quit, c to continue without paging--
                  length=80, parser_state=parser_state@entry=0x77463330e530, is_com_multi=is_com_multi@entry=false, 
                  is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7662
              #23 0x0000562c3e53e959 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x1455c8000d90, 
                  packet=packet@entry=0x1455c8008e41 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col;", packet_length=packet_length@entry=81, is_com_multi=is_com_multi@entry=false, 
                  is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_class.h:1152
              #24 0x0000562c3e541ace in do_command (thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_parse.cc:1398
              #25 0x0000562c3e669d54 in do_handle_one_connection (connect=connect@entry=0x562c42177370)
                  at /test/mtest/10.3_dbg/sql/sql_connect.cc:1403
              #26 0x0000562c3e669f8b in handle_one_connection (arg=0x562c42177370)
                  at /test/mtest/10.3_dbg/sql/sql_connect.cc:1308
              #27 0x0000562c403d1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
              #28 0x00001fd34bada293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
              

            From these Bug 2 seems to be easy to fix. However, bug 1 looks complicated.

            jplindst Jan Lindström (Inactive) added a comment - Unfortunately, PR https://github.com/MariaDB/server/pull/1843 is not correct based on pquery QA executed by ramesh . There is at least two bugs found on this version. Bug 1: (rr) where #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x0000145a0820e859 in __GI_abort () at abort.c:79 #2 0x0000145a0820e729 in __assert_fail_base (fmt=0x145a083a4588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55c620df5f79 "!is_owned()", file=0x55c620df5f88 "/test/mtest/10.3_dbg/storage/innobase/include/sync0policy.ic", line=64, function=<optimized out>) at assert.c:92 #3 0x0000145a0821ff36 in __GI___assert_fail (assertion=assertion@entry=0x55c620df5f79 "!is_owned()", file=file@entry=0x55c620df5f88 "/test/mtest/10.3_dbg/storage/innobase/include/sync0policy.ic", line=line@entry=64, function=function@entry=0x55c620df5fc8 "void MutexDebug<Mutex>::enter(const Mutex*, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101 #4 0x000055c6205aa5c1 in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x55c6214cf510 <lock_sys+80>, mutex=0x55c6214cf500 <lock_sys+64>, name=name@entry=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", line=line@entry=5186) at /test/mtest/10.3_dbg/storage/innobase/include/sync0types.h:979 #5 0x000055c620599e94 in GenericPolicy<TTASEventMutex<GenericPolicy> >::enter (line=5186, filename=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", mutex=..., this=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/include/sync0policy.h:339 #6 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (line=5186, name=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", n_delay=4, n_spins=30, this=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/include/ib0mutex.h:590 #7 innobase_kill_query (thd=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:5186 #8 0x000055c620358b27 in kill_handlerton (thd=thd@entry=0x7d1080001fe0, plugin=<optimized out>, level=level@entry=0x2a1c4ce28d8c) at /test/mtest/10.3_dbg/sql/handler.cc:841 #9 0x000055c6200dc552 in plugin_foreach_with_mask (thd=thd@entry=0x7d1080001fe0, func=func@entry=0x55c620358aae <kill_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=8, arg=arg@entry=0x2a1c4ce28d8c) at /test/mtest/10.3_dbg/sql/sql_plugin.cc:2435 #10 0x000055c620359f34 in ha_kill_query (thd=thd@entry=0x7d1080001fe0, level=<optimized out>) --Type <RET> for more, q to quit, c to continue without paging-- at /test/mtest/10.3_dbg/sql/handler.cc:849 #11 0x000055c62006d27e in THD::awake_no_mutex (this=this@entry=0x7d1080001fe0, state_to_set=state_to_set@entry=KILL_QUERY) at /test/mtest/10.3_dbg/include/mysql/psi/mysql_thread.h:717 #12 0x000055c6202a3022 in THD::awake (state_to_set=KILL_QUERY, this=0x7d1080001fe0) at /test/mtest/10.3_dbg/sql/sql_class.h:3332 #13 wsrep_thd_awake (thd=thd@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2750 #14 0x000055c620596fd7 in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x145a18001250, bf_trx=bf_trx@entry=0x49602d628380, victim_trx=victim_trx@entry=0x49602d60d0d0, signal=signal@entry=1, kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19071 #15 0x000055c62059cea0 in wsrep_abort_transaction (bf_thd=0x145a18001250, victim_thd=<optimized out>, signal=1 '\001', kill_signal=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19170 #16 0x000055c62036956d in ha_abort_transaction (bf_thd=bf_thd@entry=0x145a18001250, victim_thd=victim_thd@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/sql/handler.cc:6769 #17 0x000055c6202b0945 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x145a18001250, victim_thd_ptr=victim_thd_ptr@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', kill_signal=kill_signal@entry=KILL_QUERY) at /test/mtest/10.3_dbg/sql/wsrep_thd.cc:828 #18 0x000055c6202a0ccf in wsrep_grant_mdl_exception (requestor_ctx=requestor_ctx@entry=0x145a18001370, ticket=ticket@entry=0x7d10800083d0, key=key@entry=0x7d1080068630) at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2049 #19 0x000055c620206939 in MDL_lock::can_grant_lock (this=this@entry=0x7d1080068630, type_arg=MDL_EXCLUSIVE, requestor_ctx=requestor_ctx@entry=0x145a18001370, ignore_lock_priority=ignore_lock_priority@entry=false) at /test/mtest/10.3_dbg/sql/mdl.cc:1594 #20 0x000055c620207f66 in MDL_context::try_acquire_lock_impl (this=this@entry=0x145a18001370, mdl_request=mdl_request@entry=0x145a18012188, out_ticket=out_ticket@entry=0x2a1c4ce292d8) at /test/mtest/10.3_dbg/sql/mdl.cc:1925 #21 0x000055c620208179 in MDL_context::acquire_lock (this=this@entry=0x145a18001370, mdl_request=0x145a18012188, --Type <RET> for more, q to quit, c to continue without paging-- lock_wait_timeout=lock_wait_timeout@entry=0) at /test/mtest/10.3_dbg/sql/mdl.cc:2069 #22 0x000055c620208ce8 in MDL_context::acquire_locks (this=this@entry=0x145a18001370, mdl_requests=mdl_requests@entry=0x2a1c4ce29430, lock_wait_timeout=0) at /test/mtest/10.3_dbg/sql/mdl.cc:2268 #23 0x000055c62004a15b in lock_table_names (thd=thd@entry=0x145a18001250, options=..., tables_start=0x145a18011d20, tables_end=<optimized out>, lock_wait_timeout=<optimized out>, flags=flags@entry=0) at /test/mtest/10.3_dbg/sql/sql_base.cc:3923 #24 0x000055c62004cd3e in open_tables (thd=thd@entry=0x145a18001250, options=..., start=start@entry=0x2a1c4ce297c8, counter=counter@entry=0x2a1c4ce297dc, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x2a1c4ce29878) at /test/mtest/10.3_dbg/sql/sql_lex.h:1585 #25 0x000055c62004e279 in open_and_lock_tables (thd=thd@entry=0x145a18001250, options=..., tables=<optimized out>, tables@entry=0x145a18011d20, derived=derived@entry=false, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x2a1c4ce29878) at /test/mtest/10.3_dbg/sql/sql_base.cc:5109 #26 0x000055c6201886cc in open_and_lock_tables (flags=0, derived=false, tables=0x145a18011d20, options=..., thd=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_base.h:495 #27 mysql_create_table (thd=thd@entry=0x145a18001250, create_table=create_table@entry=0x145a18011d20, create_info=create_info@entry=0x2a1c4ce29a00, alter_info=alter_info@entry=0x2a1c4ce29940) at /test/mtest/10.3_dbg/sql/sql_table.cc:5282 #28 0x000055c62018a0c5 in Sql_cmd_create_table_like::execute (this=0x145a18011d00, thd=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_table.cc:11378 #29 0x000055c6200ce17c in mysql_execute_command (thd=thd@entry=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_parse.cc:6075 #30 0x000055c6200d017f in mysql_parse (thd=thd@entry=0x145a18001250, rawbuf=rawbuf@entry=0x145a18011ae8 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592)", length=length@entry=186, parser_state=parser_state@entry=0x2a1c4ce2b530, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7870 #31 0x000055c6200d183e in wsrep_mysql_parse (thd=thd@entry=0x145a18001250, --Type <RET> for more, q to quit, c to continue without paging-- rawbuf=0x145a18011ae8 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592)", length=186, parser_state=parser_state@entry=0x2a1c4ce2b530, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7662 #32 0x000055c6200d2959 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x145a18001250, packet=packet@entry=0x145a18009301 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592);", packet_length=packet_length@entry=187, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_class.h:1152 #33 0x000055c6200d5ace in do_command (thd=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_parse.cc:1398 #34 0x000055c6201fdd54 in do_handle_one_connection (connect=connect@entry=0x55c6231056e0) at /test/mtest/10.3_dbg/sql/sql_connect.cc:1403 #35 0x000055c6201fdf8b in handle_one_connection (arg=0x55c6231056e0) at /test/mtest/10.3_dbg/sql/sql_connect.cc:1308 #36 0x000055c621e06609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #37 0x0000145a0830b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Bug 2: (rr) p file $2 = 0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc" (rr) p line $3 = 2664 (rr) p mp->file $4 = 0x562c3f0aea20 "/test/mtest/10.3_dbg/sql/sql_class.cc" (rr) p mp->line $5 = 1980 (rr) where #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00001fd34b9dd859 in __GI_abort () at abort.c:79 #2 0x0000562c3f0158be in safe_mutex_lock (mp=mp@entry=0x1d60100028b8, my_flags=my_flags@entry=0, file=file@entry=0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc", line=line@entry=2664) at /test/mtest/10.3_dbg/mysys/thr_mutex.c:265 #3 0x0000562c3e70c51f in inline_mysql_mutex_lock (src_line=2664, src_file=0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc", that=0x1d60100028b8) at /test/mtest/10.3_dbg/include/mysql/psi/mysql_thread.h:717 #4 wsrep_thd_LOCK (thd=thd@entry=0x1d6010000d90) at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2664 #5 0x0000562c3ea029c1 in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x1455c8000d90, bf_trx=bf_trx@entry=0x30e64a067468, victim_trx=victim_trx@entry=0x30e64a064240, signal=signal@entry=0, kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:18937 #6 0x0000562c3ea08ea0 in wsrep_abort_transaction (bf_thd=0x1455c8000d90, victim_thd=<optimized out>, signal=0 '\000', kill_signal=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19170 #7 0x0000562c3e7d556d in ha_abort_transaction (bf_thd=bf_thd@entry=0x1455c8000d90, victim_thd=victim_thd@entry=0x1d6010000d90, signal=signal@entry=0 '\000', kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/sql/handler.cc:6769 #8 0x0000562c3e71c945 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x1455c8000d90, victim_thd_ptr=victim_thd_ptr@entry=0x1d6010000d90, signal=signal@entry=0 '\000', kill_signal=kill_signal@entry=KILL_QUERY) at /test/mtest/10.3_dbg/sql/wsrep_thd.cc:828 #9 0x0000562c3e4ce83c in THD::notify_shared_lock (this=0x1455c8000d90, ctx_in_use=<optimized out>, needs_thr_lock_abort=<optimized out>) at /test/mtest/10.3_dbg/sql/sql_class.cc:2002 #10 0x0000562c3e674325 in MDL_lock::notify_conflicting_locks (ctx=0x1455c8000eb0, this=0xde604090870) at /test/mtest/10.3_dbg/sql/mdl.h:848 #11 MDL_context::acquire_lock (this=this@entry=0x1455c8000eb0, mdl_request=0x1455c8011c18, lock_wait_timeout=lock_wait_timeout@entry=86400) at /test/mtest/10.3_dbg/sql/mdl.cc:2114 #12 0x0000562c3e674ce8 in MDL_context::acquire_locks (this=this@entry=0x1455c8000eb0, mdl_requests=mdl_requests@entry=0x77463330c5b0, lock_wait_timeout=86400) --Type <RET> for more, q to quit, c to continue without paging-- at /test/mtest/10.3_dbg/sql/mdl.cc:2268 #13 0x0000562c3e4b615b in lock_table_names (thd=thd@entry=0x1455c8000d90, options=..., tables_start=0x1455c80117b0, tables_end=<optimized out>, lock_wait_timeout=<optimized out>, flags=flags@entry=0) at /test/mtest/10.3_dbg/sql/sql_base.cc:3923 #14 0x0000562c3e4b8d3e in open_tables (thd=thd@entry=0x1455c8000d90, options=..., start=start@entry=0x77463330c948, counter=counter@entry=0x77463330c95c, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_lex.h:1585 #15 0x0000562c3e4ba279 in open_and_lock_tables (thd=thd@entry=0x1455c8000d90, options=..., tables=<optimized out>, tables@entry=0x1455c80117b0, derived=derived@entry=false, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_base.cc:5109 #16 0x0000562c3e4ba418 in open_and_lock_tables (prelocking_strategy=0x77463330ca90, flags=0, derived=false, tables=0x1455c80117b0, thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_base.h:263 #17 open_n_lock_single_table (thd=thd@entry=0x1455c8000d90, table_l=table_l@entry=0x1455c80117b0, lock_type=lock_type@entry=TL_READ_NO_INSERT, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_base.cc:4952 #18 0x0000562c3e602e23 in open_n_lock_single_table (flags=0, lock_type=TL_READ_NO_INSERT, table_l=0x1455c80117b0, thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_base.h:482 #19 mysql_create_or_drop_trigger (thd=thd@entry=0x1455c8000d90, tables=<optimized out>, create=create@entry=true) at /test/mtest/10.3_dbg/sql/sql_trigger.cc:558 #20 0x0000562c3e539b22 in mysql_execute_command (thd=thd@entry=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_parse.cc:5913 #21 0x0000562c3e53c17f in mysql_parse (thd=thd@entry=0x1455c8000d90, rawbuf=rawbuf@entry=0x1455c8011628 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col", length=length@entry=80, parser_state=parser_state@entry=0x77463330e530, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7870 #22 0x0000562c3e53d83e in wsrep_mysql_parse (thd=thd@entry=0x1455c8000d90, rawbuf=0x1455c8011628 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col", --Type <RET> for more, q to quit, c to continue without paging-- length=80, parser_state=parser_state@entry=0x77463330e530, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7662 #23 0x0000562c3e53e959 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x1455c8000d90, packet=packet@entry=0x1455c8008e41 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col;", packet_length=packet_length@entry=81, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_class.h:1152 #24 0x0000562c3e541ace in do_command (thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_parse.cc:1398 #25 0x0000562c3e669d54 in do_handle_one_connection (connect=connect@entry=0x562c42177370) at /test/mtest/10.3_dbg/sql/sql_connect.cc:1403 #26 0x0000562c3e669f8b in handle_one_connection (arg=0x562c42177370) at /test/mtest/10.3_dbg/sql/sql_connect.cc:1308 #27 0x0000562c403d1609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #28 0x00001fd34bada293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 From these Bug 2 seems to be easy to fix. However, bug 1 looks complicated.
            seppo Seppo Jaakola added a comment -

            jplindst both these crashes happen when resolving MDL conflicts. The PR is not changing MDL conflict resolving execution, other than passing the kill signal as new argument, and from the stack traces above, it seems that MDL conflict protesting high priority transaction is passing the correct KILL_QUERY signal.

            The test carried out here, therefore, is probably surfacing new issue, which you would encounter also with 10.3 HEAD version.
            As you seem to have rr traces there, please dig out the BF victim thread's status. Here is only shown the BF thread trace, which is only half of the picture.

            seppo Seppo Jaakola added a comment - jplindst both these crashes happen when resolving MDL conflicts. The PR is not changing MDL conflict resolving execution, other than passing the kill signal as new argument, and from the stack traces above, it seems that MDL conflict protesting high priority transaction is passing the correct KILL_QUERY signal. The test carried out here, therefore, is probably surfacing new issue, which you would encounter also with 10.3 HEAD version. As you seem to have rr traces there, please dig out the BF victim thread's status. Here is only shown the BF thread trace, which is only half of the picture.

            For bug2 I agree this could be repeatable with 10.3 HEAD and in there victim has following information

            (rr) p thd->wsrep_query_state
            $1 = QUERY_EXEC
            (rr) p thd->wsrep_conflict_state
            $2 = NO_CONFLICT
            (rr) p thd->wsrep_exec_mode
            $3 = LOCAL_STATE
            (rr) p victim_trx->state
            $4 = TRX_STATE_ACTIVE
            (rr) p bf_trx
            $5 = (const trx_t * const) 0x30e64a067468
            (rr) p bf_trx->state
            $6 = TRX_STATE_NOT_STARTED
            (rr) p bf_thd
            $7 = (THD * const) 0x1455c8000d90
            (rr) p bf_thd->wsrep_query_state
            $8 = QUERY_EXEC
            (rr) p bf_thd->wsrep_conflict_state
            $9 = NO_CONFLICT
            (rr) p bf_thd->wsrep_exec_mode
            $10 = TOTAL_ORDER
            

            jplindst Jan Lindström (Inactive) added a comment - For bug2 I agree this could be repeatable with 10.3 HEAD and in there victim has following information (rr) p thd->wsrep_query_state $1 = QUERY_EXEC (rr) p thd->wsrep_conflict_state $2 = NO_CONFLICT (rr) p thd->wsrep_exec_mode $3 = LOCAL_STATE (rr) p victim_trx->state $4 = TRX_STATE_ACTIVE (rr) p bf_trx $5 = (const trx_t * const) 0x30e64a067468 (rr) p bf_trx->state $6 = TRX_STATE_NOT_STARTED (rr) p bf_thd $7 = (THD * const) 0x1455c8000d90 (rr) p bf_thd->wsrep_query_state $8 = QUERY_EXEC (rr) p bf_thd->wsrep_conflict_state $9 = NO_CONFLICT (rr) p bf_thd->wsrep_exec_mode $10 = TOTAL_ORDER

            For bug1 I do not agree as 10.3 HEAD does not contain wsrep_innobase_kill_one_trx function. In this case victim has following state:

            (rr) p trx->id
            $1 = 280
            (rr) p trx->state
            $2 = TRX_STATE_ACTIVE
            (rr) p thd->wsrep_query_state
            value has been optimized out
            (rr) p trx->mysql_thd->wsrep_query_state
            $3 = QUERY_EXEC
            (rr) p trx->mysql_thd->wsrep_exec_mode
            $4 = LOCAL_STATE
            (rr) p trx->mysql_thd->wsrep_conflict_state
            $5 = NO_CONFLICT
            (rr) p trx->lock.wait_lock
            $7 = (ib_lock_t *) 0x0
            

            jplindst Jan Lindström (Inactive) added a comment - For bug1 I do not agree as 10.3 HEAD does not contain wsrep_innobase_kill_one_trx function. In this case victim has following state: (rr) p trx->id $1 = 280 (rr) p trx->state $2 = TRX_STATE_ACTIVE (rr) p thd->wsrep_query_state value has been optimized out (rr) p trx->mysql_thd->wsrep_query_state $3 = QUERY_EXEC (rr) p trx->mysql_thd->wsrep_exec_mode $4 = LOCAL_STATE (rr) p trx->mysql_thd->wsrep_conflict_state $5 = NO_CONFLICT (rr) p trx->lock.wait_lock $7 = (ib_lock_t *) 0x0
            Mark Anstice Mark Anstice added a comment -

            One of my nodes from My 3-node 10.3.28 cluster crashed in this way yesterday, same messages logged in the syslog:
            Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: 2021-06-06 21:23:44 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
            Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: #011 at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
            Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: 2021-06-06 21:23:44 0 [ERROR] WSREP: cancel commit bad exit: 7 370256912
            Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: 210606 21:23:44 [ERROR] mysqld got signal 6 ;

            We have been running 10.3.28 for just over 2 weeks and this is the first time we have seen this crash, we upgraded from 10.3.9 (which has been very stable for the last 3 years).

            Mark Anstice Mark Anstice added a comment - One of my nodes from My 3-node 10.3.28 cluster crashed in this way yesterday, same messages logged in the syslog: Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld [29212] : 2021-06-06 21:23:44 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL) Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld [29212] : #011 at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735 Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld [29212] : 2021-06-06 21:23:44 0 [ERROR] WSREP: cancel commit bad exit: 7 370256912 Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld [29212] : 210606 21:23:44 [ERROR] mysqld got signal 6 ; We have been running 10.3.28 for just over 2 weeks and this is the first time we have seen this crash, we upgraded from 10.3.9 (which has been very stable for the last 3 years).
            seppo Seppo Jaakola added a comment -

            Looks that the PR is already merged, and review done. However, assigned for review just to get the state of this issue correct in Jira

            seppo Seppo Jaakola added a comment - Looks that the PR is already merged, and review done. However, assigned for review just to get the state of this issue correct in Jira
            basos Vasilis G added a comment - - edited

            Hello,
            Just to comment that this affects 10.3.29 (running on Debian 10) that we recently upgraded on 2 galera clusters. Nodes keep crashing at random times with the exact error message noted in this issue.

            basos Vasilis G added a comment - - edited Hello, Just to comment that this affects 10.3.29 (running on Debian 10) that we recently upgraded on 2 galera clusters. Nodes keep crashing at random times with the exact error message noted in this issue.
            jplindst Jan Lindström (Inactive) added a comment - - edited 10.2 : https://github.com/MariaDB/server/commit/9a9f68c862f8045340c1dbc380a15b1eed68c38e 10.3 : https://github.com/MariaDB/server/commit/08edfb7edd8a6aa6c8bab6429f551eb39d03b0f5 10.4 : https://github.com/MariaDB/server/commit/7d26c225150e81fa7092d81aecf79ee381d8d77f 10.5 : https://github.com/MariaDB/server/commit/63e0087b86d6c38e6023d414bad569144334f0b4 10.6: Not needed

            I submitted some review comments for the 10.3 version (2 commits), identifying possible problems. I noticed that the 10.4 and 10.5 versions include some changes to row_upd_sec_index_entry() that are not mentioned in the commit message. It is not clear how they are related to these changes.

            Given that most changes are outside InnoDB and this is replacing MDEV-23328 with another implementation, I think that the final review must be conducted by serg, who authored that change. Ideally, the test galera.galera_bf_kill_debug would not be removed.

            marko Marko Mäkelä added a comment - I submitted some review comments for the 10.3 version (2 commits), identifying possible problems. I noticed that the 10.4 and 10.5 versions include some changes to row_upd_sec_index_entry() that are not mentioned in the commit message. It is not clear how they are related to these changes. Given that most changes are outside InnoDB and this is replacing MDEV-23328 with another implementation, I think that the final review must be conducted by serg , who authored that change. Ideally, the test galera.galera_bf_kill_debug would not be removed.

            serg Can you please review server changes

            jplindst Jan Lindström (Inactive) added a comment - serg Can you please review server changes

            Shall I review the PR? At the moment the last comment is yours “Unfortunately, this is not correct.” and the status is “Changes requested”

            Also, was it tested on 10.6 ? In 10.6 InnoDB mutexes are covered by server's deadlock detector. You might want to put this fix into 10.6 and run all galera tests on a debug build. To be sure it didn't introduce any deadlocks.

            serg Sergei Golubchik added a comment - Shall I review the PR? At the moment the last comment is yours “Unfortunately, this is not correct.” and the status is “Changes requested” Also, was it tested on 10.6 ? In 10.6 InnoDB mutexes are covered by server's deadlock detector. You might want to put this fix into 10.6 and run all galera tests on a debug build. To be sure it didn't introduce any deadlocks.

            serg Please review. I have applied review comments and other fixes on above commits. These fixes are tested by Ramesh using pquery. There is no need to apply this to 10.6 as this background killing is already removed there.

            jplindst Jan Lindström (Inactive) added a comment - serg Please review. I have applied review comments and other fixes on above commits. These fixes are tested by Ramesh using pquery. There is no need to apply this to 10.6 as this background killing is already removed there.
            jplindst Jan Lindström (Inactive) added a comment - - edited
            • branch : bb-10.2-galera
            • commit : 60bcd225cc39f6e9072ba30e3c79c4dbf4277766
            • https://github.com/MariaDB/server/commit/60bcd225cc39f6e9072ba30e3c79c4dbf4277766
            • http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-galera
            • Issues:
              • test case: rpl.rpl_parallel_conflicts
                • safe_mutex: Trying to unlock mutex LOCK_thd_data that wasn't locked at /home/jan/mysql/10.2/sql/slave.cc, line 323
                • Last used at /home/jan/mysql/10.2/sql/sql_class.h, line: 4144
              • test case : galera.MW-286 (BAD)

                 
                Thread 1 (Thread 0x7f92c8afe640 (LWP 1675562)):
                #0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
                #1  0x00005647f29ba3fb in my_write_core (sig=6) at /home/jan/mysql/10.2/mysys/stacktrace.c:382
                #2  0x00005647f2260fe2 in handle_fatal_signal (sig=6) at /home/jan/mysql/10.2/sql/signal_handler.cc:355
                #3  <signal handler called>
                #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
                #5  0x00007f92d8c8d864 in __GI_abort () at abort.c:79
                #6  0x00007f92d8c8d749 in __assert_fail_base (fmt=0x7f92d8e16f78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5647f2b7ef0d "!is_owned()", file=0x5647f2b7eed0 "/home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic", line=64, function=<optimized out>) at assert.c:92
                #7  0x00007f92d8c9f3d6 in __GI___assert_fail (assertion=0x5647f2b7ef0d "!is_owned()", file=0x5647f2b7eed0 "/home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic", line=64, function=0x5647f2b7ee58 "void MutexDebug<Mutex>::enter(const Mutex*, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101
                #8  0x00005647f24947dc in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f060, mutex=0x7f92c800f050, name=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic:64
                #9  0x00005647f24930f4 in GenericPolicy<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f060, mutex=..., filename=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/sync0policy.h:346
                #10 0x00005647f249120f in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f050, n_spins=30, n_delay=4, name=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/ib0mutex.h:635
                #11 0x00005647f24691cb in innobase_kill_query (thd=0x7f9264000d90) at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:5263
                #12 0x00005647f2263830 in kill_handlerton (thd=0x7f9264000d90, plugin=0x7f92b403ac50, level=0x7f92c8afbe74) at /home/jan/mysql/10.2/sql/handler.cc:843
                #13 0x00005647f1ff6bb3 in plugin_foreach_with_mask (thd=0x7f9264000d90, func=0x5647f22637aa <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7f92c8afbe74) at /home/jan/mysql/10.2/sql/sql_plugin.cc:2411
                #14 0x00005647f226388e in ha_kill_query (thd=0x7f9264000d90, level=THD_ABORT_SOFTLY) at /home/jan/mysql/10.2/sql/handler.cc:850
                #15 0x00005647f1f904f2 in THD::awake (this=0x7f9264000d90, state_to_set=KILL_QUERY) at /home/jan/mysql/10.2/sql/sql_class.cc:1733
                #16 0x00005647f21c34b6 in wsrep_thd_awake (thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2764
                #17 0x00005647f248534b in wsrep_innobase_kill_one_trx (bf_thd=0x7f92b4001fe0, bf_trx=0x7f92abc00120, victim_trx=0x7f92abc01228, signal=1) at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:19722
                #18 0x00005647f2485795 in wsrep_abort_transaction (hton=0x5647f479e730, bf_thd=0x7f92b4001fe0, victim_thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:19797
                #19 0x00005647f22716fb in ha_abort_transaction (bf_thd=0x7f92b4001fe0, victim_thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/handler.cc:6326
                #20 0x00005647f21d2ddc in wsrep_abort_thd (bf_thd_ptr=0x7f92b4001fe0, victim_thd_ptr=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/wsrep_thd.cc:837
                #21 0x00005647f21c1679 in wsrep_grant_mdl_exception (requestor_ctx=0x7f92b4002100, ticket=0x7f926418abc0, key=0x7f9264158f80) at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2057
                #22 0x00005647f213d493 in MDL_lock::can_grant_lock (this=0x7f9264158f80, type_arg=MDL_SHARED_WRITE, requestor_ctx=0x7f92b4002100, ignore_lock_priority=false) at /home/jan/mysql/10.2/sql/mdl.cc:1592
                #23 0x00005647f213dd34 in MDL_context::try_acquire_lock_impl (this=0x7f92b4002100, mdl_request=0x7f92b48d50c8, out_ticket=0x7f92c8afc308) at /home/jan/mysql/10.2/sql/mdl.cc:1923
                #24 0x00005647f213e0c0 in MDL_context::acquire_lock (this=0x7f92b4002100, mdl_request=0x7f92b48d50c8, lock_wait_timeout=86400) at /home/jan/mysql/10.2/sql/mdl.cc:2067
                #25 0x00005647f1f664c5 in open_table_get_mdl_lock (thd=0x7f92b4001fe0, ot_ctx=0x7f92c8afc9d0, mdl_request=0x7f92b48d50c8, flags=0, mdl_ticket=0x7f92c8afc630) at /home/jan/mysql/10.2/sql/sql_base.cc:1461
                #26 0x00005647f1f66e90 in open_table (thd=0x7f92b4001fe0, table_list=0x7f92b48d4c70, ot_ctx=0x7f92c8afc9d0) at /home/jan/mysql/10.2/sql/sql_base.cc:1760
                #27 0x00005647f1f6a46c in open_and_process_table (thd=0x7f92b4001fe0, tables=0x7f92b48d4c70, counter=0x7f92c8afca64, flags=0, prelocking_strategy=0x7f92c8afcae8, has_prelocking_list=false, ot_ctx=0x7f92c8afc9d0) at /home/jan/mysql/10.2/sql/sql_base.cc:3589
                #28 0x00005647f1f6b370 in open_tables (thd=0x7f92b4001fe0, options=..., start=0x7f92c8afca48, counter=0x7f92c8afca64, flags=0, prelocking_strategy=0x7f92c8afcae8) at /home/jan/mysql/10.2/sql/sql_base.cc:4062
                #29 0x00005647f1f6c9c0 in open_and_lock_tables (thd=0x7f92b4001fe0, options=..., tables=0x7f92b48d4c70, derived=false, flags=0, prelocking_strategy=0x7f92c8afcae8) at /home/jan/mysql/10.2/sql/sql_base.cc:4861
                #30 0x00005647f1f3251b in open_and_lock_tables (thd=0x7f92b4001fe0, tables=0x7f92b48d4c70, derived=false, flags=0) at /home/jan/mysql/10.2/sql/sql_base.h:508
                #31 0x00005647f239a995 in Rows_log_event::do_apply_event (this=0x7f92b4028840, rgi=0x7f92b4021740) at /home/jan/mysql/10.2/sql/log_event.cc:10854
                #32 0x00005647f1f22f47 in Log_event::apply_event (this=0x7f92b4028840, rgi=0x7f92b4021740) at /home/jan/mysql/10.2/sql/log_event.h:1462
                #33 0x00005647f21cf5bf in wsrep_apply_events (thd=0x7f92b4001fe0, events_buf=0x7f92cbd57150, buf_len=493589) at /home/jan/mysql/10.2/sql/wsrep_applier.cc:164
                #34 0x00005647f21cfb8b in wsrep_apply_cb (ctx=0x7f92b4001fe0, buf=0x7f92cbd57150, buf_len=502024, flags=1, meta=0x7f92c8afd350) at /home/jan/mysql/10.2/sql/wsrep_applier.cc:267
                #35 0x00007f92d3cc99cc in galera::TrxHandle::apply (this=0x7f92b403e060, recv_ctx=0x7f92b4001fe0, apply_cb=0x5647f21cf87f <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, meta=...) at /home/jan/mysql/mariadb-galeralib/galera/src/trx_handle.cpp:316
                #36 0x00007f92d3cd08e3 in apply_trx_ws (recv_ctx=0x7f92b4001fe0, apply_cb=0x5647f21cf87f <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, commit_cb=0x5647f21cfeb4 <wsrep_commit_cb(void*, uint32_t, wsrep_trx_meta_t const*, wsrep_bool_t*, bool)>, trx=..., meta=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:34
                #37 0x00007f92d3cd3b15 in galera::ReplicatorSMM::apply_trx (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0, trx=0x7f92b403e060) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:455
                #38 0x00007f92d3cd82a6 in galera::ReplicatorSMM::process_trx (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0, trx=0x7f92b403e060) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:1259
                #39 0x00007f92d3d13781 in galera::GcsActionSource::dispatch (this=0x5647f47376d8, recv_ctx=0x7f92b4001fe0, act=..., exit_loop=@0x7f92c8afda7b: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:115
                #40 0x00007f92d3d13fb9 in galera::GcsActionSource::process (this=0x5647f47376d8, recv_ctx=0x7f92b4001fe0, exit_loop=@0x7f92c8afda7b: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:180
                #41 0x00007f92d3cd3141 in galera::ReplicatorSMM::async_recv (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:363
                #42 0x00007f92d3ca86b3 in galera_recv (gh=0x5647f47040f0, recv_ctx=0x7f92b4001fe0) at /home/jan/mysql/mariadb-galeralib/galera/src/wsrep_provider.cpp:244
                #43 0x00005647f21d1df8 in wsrep_replication_process (thd=0x7f92b4001fe0) at /home/jan/mysql/10.2/sql/wsrep_thd.cc:486
                #44 0x00005647f21c1ba4 in start_wsrep_THD (arg=0x5647f476aaf0) at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2169
                #45 0x00005647f295c085 in pfs_spawn_thread (arg=0x5647f47595f0) at /home/jan/mysql/10.2/storage/perfschema/pfs.cc:1869
                #46 0x00007f92d91de450 in start_thread (arg=0x7f92c8afe640) at pthread_create.c:473
                #47 0x00007f92d8d7ed53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
                

              • Problem found with pquery:

                (rr) where
                #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
                #1  0x000073fe62f61859 in __GI_abort () at abort.c:79
                #2  0x0000562f2a5ffabe in safe_mutex_lock (mp=mp@entry=0x146fb4002c28, my_flags=my_flags@entry=0, 
                    file=file@entry=0x562f2a6d7ec0 "/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc", line=line@entry=2689) at /test/mtest/10.2_dbg/mysys/thr_mutex.c:264
                #3  0x0000562f29e70f47 in inline_mysql_mutex_lock (src_line=2689, src_file=0x562f2a6d7ec0 "/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc", that=0x146fb4002c28)
                    at /test/mtest/10.2_dbg/include/mysql/psi/mysql_thread.h:688
                #4  wsrep_thd_LOCK (thd=thd@entry=0x146fb4001230) at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2689
                #5  0x0000562f2a0e351b in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x146fb8001260, bf_trx=bf_trx@entry=0x1d1f69a22120, 
                    victim_trx=victim_trx@entry=0x1d1f69a3cae8, signal=signal@entry=0) at /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19607
                #6  0x0000562f2a0f3b96 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=0x146fb8001260, victim_thd=<optimized out>, signal=0 '\000')
                    at /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19797
                #7  0x0000562f29efaafb in ha_abort_transaction (bf_thd=bf_thd@entry=0x146fb8001260, victim_thd=victim_thd@entry=0x146fb4001230, signal=signal@entry=0 '\000')
                    at /test/mtest/10.2_dbg/sql/handler.cc:6326
                #8  0x0000562f29e80b87 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x146fb8001260, victim_thd_ptr=victim_thd_ptr@entry=0x146fb4001230, 
                    signal=signal@entry=0 '\000') at /test/mtest/10.2_dbg/sql/wsrep_thd.cc:837
                #9  0x0000562f29c9b55a in THD::notify_shared_lock (this=0x146fb8001260, ctx_in_use=<optimized out>, needs_thr_lock_abort=<optimized out>)
                    at /test/mtest/10.2_dbg/sql/sql_class.cc:1897
                #10 0x0000562f29dffd6b in MDL_lock::notify_conflicting_locks (ctx=0x146fb8001380, this=0x3cd3540acf30) at /test/mtest/10.2_dbg/sql/mdl.h:847
                #11 MDL_context::acquire_lock (this=this@entry=0x146fb8001380, mdl_request=0x146fb8011c30, lock_wait_timeout=lock_wait_timeout@entry=86400)
                    at /test/mtest/10.2_dbg/sql/mdl.cc:2104
                #12 0x0000562f29e00682 in MDL_context::acquire_locks (this=this@entry=0x146fb8001380, mdl_requests=mdl_requests@entry=0x4d1f42c0ea50, 
                    lock_wait_timeout=86400) at /test/mtest/10.2_dbg/sql/mdl.cc:2258
                #13 0x0000562f29c83b05 in lock_table_names (thd=thd@entry=0x146fb8001260, options=..., tables_start=tables_start@entry=0x146fb80117d8, 
                    tables_end=tables_end@entry=0x0, lock_wait_timeout=<optimized out>, flags=flags@entry=0) at /test/mtest/10.2_dbg/sql/sql_base.cc:3806
                #14 0x0000562f29d84b51 in lock_table_names (flags=0, lock_wait_timeout=<optimized out>, table_list_end=0x0, table_list=0x146fb80117d8, thd=0x146fb8001260)
                    at /test/mtest/10.2_dbg/sql/sql_base.h:238
                #15 mysql_rm_table (thd=thd@entry=0x146fb8001260, tables=tables@entry=0x146fb80117d8, if_exists=<optimized out>, drop_temporary=<optimized out>)
                    at /test/mtest/10.2_dbg/sql/sql_table.cc:2042
                #16 0x0000562f29ce3c6c in mysql_execute_command (thd=thd@entry=0x146fb8001260) at /test/mtest/10.2_dbg/sql/structs.h:530
                #17 0x0000562f29ce9809 in mysql_parse (thd=thd@entry=0x146fb8001260, rawbuf=rawbuf@entry=0x146fb8011718 "DROP TABLE t1", length=length@entry=13, 
                    parser_state=parser_state@entry=0x4d1f42c10540, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
                    at /test/mtest/10.2_dbg/sql/sql_parse.cc:7793
                #18 0x0000562f29ceae12 in wsrep_mysql_parse (thd=thd@entry=0x146fb8001260, rawbuf=0x146fb8011718 "DROP TABLE t1", length=13, 
                    parser_state=parser_state@entry=0x4d1f42c10540, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
                    at /test/mtest/10.2_dbg/sql/sql_parse.cc:7585
                #19 0x0000562f29cec36b in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x146fb8001260, 
                    packet=packet@entry=0x146fb8008f31 "DROP TABLE t1;", packet_length=packet_length@entry=14, is_com_multi=is_com_multi@entry=false, 
                    is_next_command=is_next_command@entry=false) at /test/mtest/10.2_dbg/sql/sql_class.h:1109
                #20 0x0000562f29cef14a in do_command (thd=0x146fb8001260) at /test/mtest/10.2_dbg/sql/sql_parse.cc:1381
                #21 0x0000562f29df51b1 in do_handle_one_connection (connect=connect@entry=0x562f2dce8540) at /test/mtest/10.2_dbg/sql/sql_connect.cc:1336
                #22 0x0000562f29df53dc in handle_one_connection (arg=0x562f2dce8540) at /test/mtest/10.2_dbg/sql/sql_connect.cc:1241
                #23 0x0000448464591609 in start_thread (arg=<optimized out>) at pthread_create.c:477
                #24 0x000073fe6305e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
                

            jplindst Jan Lindström (Inactive) added a comment - - edited branch : bb-10.2-galera commit : 60bcd225cc39f6e9072ba30e3c79c4dbf4277766 https://github.com/MariaDB/server/commit/60bcd225cc39f6e9072ba30e3c79c4dbf4277766 http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-galera Issues: test case: rpl.rpl_parallel_conflicts safe_mutex: Trying to unlock mutex LOCK_thd_data that wasn't locked at /home/jan/mysql/10.2/sql/slave.cc, line 323 Last used at /home/jan/mysql/10.2/sql/sql_class.h, line: 4144 test case : galera.MW-286 (BAD)   Thread 1 (Thread 0x7f92c8afe640 (LWP 1675562)): #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54 #1 0x00005647f29ba3fb in my_write_core (sig=6) at /home/jan/mysql/10.2/mysys/stacktrace.c:382 #2 0x00005647f2260fe2 in handle_fatal_signal (sig=6) at /home/jan/mysql/10.2/sql/signal_handler.cc:355 #3 <signal handler called> #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #5 0x00007f92d8c8d864 in __GI_abort () at abort.c:79 #6 0x00007f92d8c8d749 in __assert_fail_base (fmt=0x7f92d8e16f78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5647f2b7ef0d "!is_owned()", file=0x5647f2b7eed0 "/home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic", line=64, function=<optimized out>) at assert.c:92 #7 0x00007f92d8c9f3d6 in __GI___assert_fail (assertion=0x5647f2b7ef0d "!is_owned()", file=0x5647f2b7eed0 "/home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic", line=64, function=0x5647f2b7ee58 "void MutexDebug<Mutex>::enter(const Mutex*, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101 #8 0x00005647f24947dc in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f060, mutex=0x7f92c800f050, name=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic:64 #9 0x00005647f24930f4 in GenericPolicy<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f060, mutex=..., filename=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/sync0policy.h:346 #10 0x00005647f249120f in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f050, n_spins=30, n_delay=4, name=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/ib0mutex.h:635 #11 0x00005647f24691cb in innobase_kill_query (thd=0x7f9264000d90) at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:5263 #12 0x00005647f2263830 in kill_handlerton (thd=0x7f9264000d90, plugin=0x7f92b403ac50, level=0x7f92c8afbe74) at /home/jan/mysql/10.2/sql/handler.cc:843 #13 0x00005647f1ff6bb3 in plugin_foreach_with_mask (thd=0x7f9264000d90, func=0x5647f22637aa <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7f92c8afbe74) at /home/jan/mysql/10.2/sql/sql_plugin.cc:2411 #14 0x00005647f226388e in ha_kill_query (thd=0x7f9264000d90, level=THD_ABORT_SOFTLY) at /home/jan/mysql/10.2/sql/handler.cc:850 #15 0x00005647f1f904f2 in THD::awake (this=0x7f9264000d90, state_to_set=KILL_QUERY) at /home/jan/mysql/10.2/sql/sql_class.cc:1733 #16 0x00005647f21c34b6 in wsrep_thd_awake (thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2764 #17 0x00005647f248534b in wsrep_innobase_kill_one_trx (bf_thd=0x7f92b4001fe0, bf_trx=0x7f92abc00120, victim_trx=0x7f92abc01228, signal=1) at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:19722 #18 0x00005647f2485795 in wsrep_abort_transaction (hton=0x5647f479e730, bf_thd=0x7f92b4001fe0, victim_thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:19797 #19 0x00005647f22716fb in ha_abort_transaction (bf_thd=0x7f92b4001fe0, victim_thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/handler.cc:6326 #20 0x00005647f21d2ddc in wsrep_abort_thd (bf_thd_ptr=0x7f92b4001fe0, victim_thd_ptr=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/wsrep_thd.cc:837 #21 0x00005647f21c1679 in wsrep_grant_mdl_exception (requestor_ctx=0x7f92b4002100, ticket=0x7f926418abc0, key=0x7f9264158f80) at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2057 #22 0x00005647f213d493 in MDL_lock::can_grant_lock (this=0x7f9264158f80, type_arg=MDL_SHARED_WRITE, requestor_ctx=0x7f92b4002100, ignore_lock_priority=false) at /home/jan/mysql/10.2/sql/mdl.cc:1592 #23 0x00005647f213dd34 in MDL_context::try_acquire_lock_impl (this=0x7f92b4002100, mdl_request=0x7f92b48d50c8, out_ticket=0x7f92c8afc308) at /home/jan/mysql/10.2/sql/mdl.cc:1923 #24 0x00005647f213e0c0 in MDL_context::acquire_lock (this=0x7f92b4002100, mdl_request=0x7f92b48d50c8, lock_wait_timeout=86400) at /home/jan/mysql/10.2/sql/mdl.cc:2067 #25 0x00005647f1f664c5 in open_table_get_mdl_lock (thd=0x7f92b4001fe0, ot_ctx=0x7f92c8afc9d0, mdl_request=0x7f92b48d50c8, flags=0, mdl_ticket=0x7f92c8afc630) at /home/jan/mysql/10.2/sql/sql_base.cc:1461 #26 0x00005647f1f66e90 in open_table (thd=0x7f92b4001fe0, table_list=0x7f92b48d4c70, ot_ctx=0x7f92c8afc9d0) at /home/jan/mysql/10.2/sql/sql_base.cc:1760 #27 0x00005647f1f6a46c in open_and_process_table (thd=0x7f92b4001fe0, tables=0x7f92b48d4c70, counter=0x7f92c8afca64, flags=0, prelocking_strategy=0x7f92c8afcae8, has_prelocking_list=false, ot_ctx=0x7f92c8afc9d0) at /home/jan/mysql/10.2/sql/sql_base.cc:3589 #28 0x00005647f1f6b370 in open_tables (thd=0x7f92b4001fe0, options=..., start=0x7f92c8afca48, counter=0x7f92c8afca64, flags=0, prelocking_strategy=0x7f92c8afcae8) at /home/jan/mysql/10.2/sql/sql_base.cc:4062 #29 0x00005647f1f6c9c0 in open_and_lock_tables (thd=0x7f92b4001fe0, options=..., tables=0x7f92b48d4c70, derived=false, flags=0, prelocking_strategy=0x7f92c8afcae8) at /home/jan/mysql/10.2/sql/sql_base.cc:4861 #30 0x00005647f1f3251b in open_and_lock_tables (thd=0x7f92b4001fe0, tables=0x7f92b48d4c70, derived=false, flags=0) at /home/jan/mysql/10.2/sql/sql_base.h:508 #31 0x00005647f239a995 in Rows_log_event::do_apply_event (this=0x7f92b4028840, rgi=0x7f92b4021740) at /home/jan/mysql/10.2/sql/log_event.cc:10854 #32 0x00005647f1f22f47 in Log_event::apply_event (this=0x7f92b4028840, rgi=0x7f92b4021740) at /home/jan/mysql/10.2/sql/log_event.h:1462 #33 0x00005647f21cf5bf in wsrep_apply_events (thd=0x7f92b4001fe0, events_buf=0x7f92cbd57150, buf_len=493589) at /home/jan/mysql/10.2/sql/wsrep_applier.cc:164 #34 0x00005647f21cfb8b in wsrep_apply_cb (ctx=0x7f92b4001fe0, buf=0x7f92cbd57150, buf_len=502024, flags=1, meta=0x7f92c8afd350) at /home/jan/mysql/10.2/sql/wsrep_applier.cc:267 #35 0x00007f92d3cc99cc in galera::TrxHandle::apply (this=0x7f92b403e060, recv_ctx=0x7f92b4001fe0, apply_cb=0x5647f21cf87f <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, meta=...) at /home/jan/mysql/mariadb-galeralib/galera/src/trx_handle.cpp:316 #36 0x00007f92d3cd08e3 in apply_trx_ws (recv_ctx=0x7f92b4001fe0, apply_cb=0x5647f21cf87f <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, commit_cb=0x5647f21cfeb4 <wsrep_commit_cb(void*, uint32_t, wsrep_trx_meta_t const*, wsrep_bool_t*, bool)>, trx=..., meta=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:34 #37 0x00007f92d3cd3b15 in galera::ReplicatorSMM::apply_trx (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0, trx=0x7f92b403e060) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:455 #38 0x00007f92d3cd82a6 in galera::ReplicatorSMM::process_trx (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0, trx=0x7f92b403e060) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:1259 #39 0x00007f92d3d13781 in galera::GcsActionSource::dispatch (this=0x5647f47376d8, recv_ctx=0x7f92b4001fe0, act=..., exit_loop=@0x7f92c8afda7b: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:115 #40 0x00007f92d3d13fb9 in galera::GcsActionSource::process (this=0x5647f47376d8, recv_ctx=0x7f92b4001fe0, exit_loop=@0x7f92c8afda7b: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:180 #41 0x00007f92d3cd3141 in galera::ReplicatorSMM::async_recv (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:363 #42 0x00007f92d3ca86b3 in galera_recv (gh=0x5647f47040f0, recv_ctx=0x7f92b4001fe0) at /home/jan/mysql/mariadb-galeralib/galera/src/wsrep_provider.cpp:244 #43 0x00005647f21d1df8 in wsrep_replication_process (thd=0x7f92b4001fe0) at /home/jan/mysql/10.2/sql/wsrep_thd.cc:486 #44 0x00005647f21c1ba4 in start_wsrep_THD (arg=0x5647f476aaf0) at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2169 #45 0x00005647f295c085 in pfs_spawn_thread (arg=0x5647f47595f0) at /home/jan/mysql/10.2/storage/perfschema/pfs.cc:1869 #46 0x00007f92d91de450 in start_thread (arg=0x7f92c8afe640) at pthread_create.c:473 #47 0x00007f92d8d7ed53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Problem found with pquery: (rr) where #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x000073fe62f61859 in __GI_abort () at abort.c:79 #2 0x0000562f2a5ffabe in safe_mutex_lock (mp=mp@entry=0x146fb4002c28, my_flags=my_flags@entry=0, file=file@entry=0x562f2a6d7ec0 "/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc", line=line@entry=2689) at /test/mtest/10.2_dbg/mysys/thr_mutex.c:264 #3 0x0000562f29e70f47 in inline_mysql_mutex_lock (src_line=2689, src_file=0x562f2a6d7ec0 "/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc", that=0x146fb4002c28) at /test/mtest/10.2_dbg/include/mysql/psi/mysql_thread.h:688 #4 wsrep_thd_LOCK (thd=thd@entry=0x146fb4001230) at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2689 #5 0x0000562f2a0e351b in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x146fb8001260, bf_trx=bf_trx@entry=0x1d1f69a22120, victim_trx=victim_trx@entry=0x1d1f69a3cae8, signal=signal@entry=0) at /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19607 #6 0x0000562f2a0f3b96 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=0x146fb8001260, victim_thd=<optimized out>, signal=0 '\000') at /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19797 #7 0x0000562f29efaafb in ha_abort_transaction (bf_thd=bf_thd@entry=0x146fb8001260, victim_thd=victim_thd@entry=0x146fb4001230, signal=signal@entry=0 '\000') at /test/mtest/10.2_dbg/sql/handler.cc:6326 #8 0x0000562f29e80b87 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x146fb8001260, victim_thd_ptr=victim_thd_ptr@entry=0x146fb4001230, signal=signal@entry=0 '\000') at /test/mtest/10.2_dbg/sql/wsrep_thd.cc:837 #9 0x0000562f29c9b55a in THD::notify_shared_lock (this=0x146fb8001260, ctx_in_use=<optimized out>, needs_thr_lock_abort=<optimized out>) at /test/mtest/10.2_dbg/sql/sql_class.cc:1897 #10 0x0000562f29dffd6b in MDL_lock::notify_conflicting_locks (ctx=0x146fb8001380, this=0x3cd3540acf30) at /test/mtest/10.2_dbg/sql/mdl.h:847 #11 MDL_context::acquire_lock (this=this@entry=0x146fb8001380, mdl_request=0x146fb8011c30, lock_wait_timeout=lock_wait_timeout@entry=86400) at /test/mtest/10.2_dbg/sql/mdl.cc:2104 #12 0x0000562f29e00682 in MDL_context::acquire_locks (this=this@entry=0x146fb8001380, mdl_requests=mdl_requests@entry=0x4d1f42c0ea50, lock_wait_timeout=86400) at /test/mtest/10.2_dbg/sql/mdl.cc:2258 #13 0x0000562f29c83b05 in lock_table_names (thd=thd@entry=0x146fb8001260, options=..., tables_start=tables_start@entry=0x146fb80117d8, tables_end=tables_end@entry=0x0, lock_wait_timeout=<optimized out>, flags=flags@entry=0) at /test/mtest/10.2_dbg/sql/sql_base.cc:3806 #14 0x0000562f29d84b51 in lock_table_names (flags=0, lock_wait_timeout=<optimized out>, table_list_end=0x0, table_list=0x146fb80117d8, thd=0x146fb8001260) at /test/mtest/10.2_dbg/sql/sql_base.h:238 #15 mysql_rm_table (thd=thd@entry=0x146fb8001260, tables=tables@entry=0x146fb80117d8, if_exists=<optimized out>, drop_temporary=<optimized out>) at /test/mtest/10.2_dbg/sql/sql_table.cc:2042 #16 0x0000562f29ce3c6c in mysql_execute_command (thd=thd@entry=0x146fb8001260) at /test/mtest/10.2_dbg/sql/structs.h:530 #17 0x0000562f29ce9809 in mysql_parse (thd=thd@entry=0x146fb8001260, rawbuf=rawbuf@entry=0x146fb8011718 "DROP TABLE t1", length=length@entry=13, parser_state=parser_state@entry=0x4d1f42c10540, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.2_dbg/sql/sql_parse.cc:7793 #18 0x0000562f29ceae12 in wsrep_mysql_parse (thd=thd@entry=0x146fb8001260, rawbuf=0x146fb8011718 "DROP TABLE t1", length=13, parser_state=parser_state@entry=0x4d1f42c10540, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.2_dbg/sql/sql_parse.cc:7585 #19 0x0000562f29cec36b in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x146fb8001260, packet=packet@entry=0x146fb8008f31 "DROP TABLE t1;", packet_length=packet_length@entry=14, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/mtest/10.2_dbg/sql/sql_class.h:1109 #20 0x0000562f29cef14a in do_command (thd=0x146fb8001260) at /test/mtest/10.2_dbg/sql/sql_parse.cc:1381 #21 0x0000562f29df51b1 in do_handle_one_connection (connect=connect@entry=0x562f2dce8540) at /test/mtest/10.2_dbg/sql/sql_connect.cc:1336 #22 0x0000562f29df53dc in handle_one_connection (arg=0x562f2dce8540) at /test/mtest/10.2_dbg/sql/sql_connect.cc:1241 #23 0x0000448464591609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #24 0x000073fe6305e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

            Current fix is not correct.

            jplindst Jan Lindström (Inactive) added a comment - Current fix is not correct.

            As Debian 11 ships mariadb 10.5 it's impossible to get galera up and running there. Any ETA available?

            fbezdeka Florian Bezdeka added a comment - As Debian 11 ships mariadb 10.5 it's impossible to get galera up and running there. Any ETA available?
            jplindst Jan Lindström (Inactive) added a comment - - edited User KILL as a TOI https://github.com/MariaDB/server/pull/1894 10.2 http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-galera https://github.com/MariaDB/server/commit/5236b6995560b1f4d04c6e6d7c9188785b07bdb1 10.3 http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.3-MDEV-25114 https://github.com/MariaDB/server/commit/568b7b64cd427321dc3513b6b8d1db88f7e3d59f 10.4 (NOT CORRECT) https://github.com/MariaDB/server/commit/cd999b6f790531533bb92366fba976093aecc518

            Problem in 10.4 as follows

            • branch : bb-10.4-MDEV-25114
            • test case : galera.MW-286
            • stack trace:

              Thread 1 (Thread 0x7f5fd43e0640 (LWP 924119)):
              #0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
              #1  0x000055bd674e2f2e in my_write_core (sig=6) at /home/jan/mysql/10.4/mysys/stacktrace.c:386
              #2  0x000055bd66c26a9d in handle_fatal_signal (sig=6) at /home/jan/mysql/10.4/sql/signal_handler.cc:356
              #3  <signal handler called>
              #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
              #5  0x00007f5fdbe81864 in __GI_abort () at abort.c:79
              #6  0x00007f5fdbe81749 in __assert_fail_base (fmt=0x7f5fdc00af78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55bd677df9ee "!is_owned()", file=0x55bd677dfa00 "/home/jan/mysql/10.4/storage/innobase/include/sync0policy.h", line=121, function=<optimized out>) at assert.c:92
              #7  0x00007f5fdbe933d6 in __GI___assert_fail (assertion=0x55bd677df9ee "!is_owned()", file=0x55bd677dfa00 "/home/jan/mysql/10.4/storage/innobase/include/sync0policy.h", line=121, function=0x55bd677dfa40 "void MutexDebug<Mutex>::enter(const Mutex&, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101
              #8  0x000055bd66eb265a in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x55bd680a09d0 <lock_sys+80>, mutex=..., filename=0x55bd677cfbb0 "/home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc", line=5195) at /home/jan/mysql/10.4/storage/innobase/include/sync0policy.h:121
              #9  0x000055bd66eb0599 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x55bd680a09c0 <lock_sys+64>, n_spins=30, n_delay=4, name=0x55bd677cfbb0 "/home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc", line=5195) at /home/jan/mysql/10.4/storage/innobase/include/ib0mutex.h:592
              #10 0x000055bd66e843dc in innobase_kill_query (thd=0x7f5fa0002120) at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:5195
              #11 0x000055bd66c2b196 in kill_handlerton (thd=0x7f5fa0002120, plugin=0x7f5fb0034500, level=0x7f5fd43dd974) at /home/jan/mysql/10.4/sql/handler.cc:865
              #12 0x000055bd668d4eff in plugin_foreach_with_mask (thd=0x7f5fa0002120, func=0x55bd66c2b056 <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7f5fd43dd974) at /home/jan/mysql/10.4/sql/sql_plugin.cc:2474
              #13 0x000055bd66c2b1f8 in ha_kill_query (thd=0x7f5fa0002120, level=THD_ABORT_SOFTLY) at /home/jan/mysql/10.4/sql/handler.cc:872
              #14 0x000055bd6683e273 in THD::awake_no_mutex (this=0x7f5fa0002120, state_to_set=KILL_QUERY) at /home/jan/mysql/10.4/sql/sql_class.cc:1899
              #15 0x000055bd66b4dcc9 in wsrep_thd_bf_abort (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/sql/service_wsrep.cc:252
              #16 0x000055bd66ea1888 in wsrep_innobase_kill_one_trx (bf_thd=0x7f5fb0000d90, victim_trx=0x7f5fc4bd5218, signal=1 '\001') at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:18725
              #17 0x000055bd66ea1c48 in wsrep_abort_transaction (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:18773
              #18 0x000055bd66c3df05 in ha_abort_transaction (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/sql/handler.cc:7129
              #19 0x000055bd66b30040 in wsrep_handle_mdl_conflict (requestor_ctx=0x7f5fb0000eb0, ticket=0x7f5fa0143f20, key=0x7f5fa00944d0) at /home/jan/mysql/10.4/sql/wsrep_mysqld.cc:2386
              #20 0x000055bd66a510ed in MDL_lock::can_grant_lock (this=0x7f5fa00944d0, type_arg=MDL_SHARED_WRITE, requestor_ctx=0x7f5fb0000eb0, ignore_lock_priority=false) at /home/jan/mysql/10.4/sql/mdl.cc:1798
              #21 0x000055bd66a51a5e in MDL_context::try_acquire_lock_impl (this=0x7f5fb0000eb0, mdl_request=0x7f5fb002c6b0, out_ticket=0x7f5fd43dddd8) at /home/jan/mysql/10.4/sql/mdl.cc:2129
              #22 0x000055bd66a51e6e in MDL_context::acquire_lock (this=0x7f5fb0000eb0, mdl_request=0x7f5fb002c6b0, lock_wait_timeout=86400) at /home/jan/mysql/10.4/sql/mdl.cc:2276
              #23 0x000055bd6680e7cb in open_table_get_mdl_lock (thd=0x7f5fb0000d90, ot_ctx=0x7f5fd43de4a0, mdl_request=0x7f5fb002c6b0, flags=0, mdl_ticket=0x7f5fd43de100) at /home/jan/mysql/10.4/sql/sql_base.cc:1647
              #24 0x000055bd6680f15b in open_table (thd=0x7f5fb0000d90, table_list=0x7f5fb002c260, ot_ctx=0x7f5fd43de4a0) at /home/jan/mysql/10.4/sql/sql_base.cc:1915
              #25 0x000055bd66813523 in open_and_process_table (thd=0x7f5fb0000d90, tables=0x7f5fb002c260, counter=0x7f5fd43de534, flags=0, prelocking_strategy=0x7f5fd43de5b8, has_prelocking_list=false, ot_ctx=0x7f5fd43de4a0) at /home/jan/mysql/10.4/sql/sql_base.cc:3890
              #26 0x000055bd668147f7 in open_tables (thd=0x7f5fb0000d90, options=..., start=0x7f5fd43de518, counter=0x7f5fd43de534, flags=0, prelocking_strategy=0x7f5fd43de5b8) at /home/jan/mysql/10.4/sql/sql_base.cc:4371
              #27 0x000055bd6681676d in open_and_lock_tables (thd=0x7f5fb0000d90, options=..., tables=0x7f5fb002c260, derived=false, flags=0, prelocking_strategy=0x7f5fd43de5b8) at /home/jan/mysql/10.4/sql/sql_base.cc:5317
              #28 0x000055bd667ce32f in open_and_lock_tables (thd=0x7f5fb0000d90, tables=0x7f5fb002c260, derived=false, flags=0) at /home/jan/mysql/10.4/sql/sql_base.h:503
              #29 0x000055bd66d96de7 in Rows_log_event::do_apply_event (this=0x7f5fb002cb90, rgi=0x7f5fb0022700) at /home/jan/mysql/10.4/sql/log_event.cc:11409
              #30 0x000055bd667b6b41 in Log_event::apply_event (this=0x7f5fb002cb90, rgi=0x7f5fb0022700) at /home/jan/mysql/10.4/sql/log_event.h:1492
              #31 0x000055bd66b40a1b in wsrep_apply_events (thd=0x7f5fb0000d90, rli=0x7f5fb0016fd0, events_buf=0x7f5fc7fffbf8, buf_len=0) at /home/jan/mysql/10.4/sql/wsrep_applier.cc:200
              #32 0x000055bd66b2183e in Wsrep_applier_service::apply_write_set (this=0x7f5fd43dfca0, ws_meta=..., data=...) at /home/jan/mysql/10.4/sql/wsrep_high_priority_service.cc:550
              #33 0x000055bd675613b9 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/src/server_state.cpp:327
              #34 0x000055bd6756532c in wsrep::server_state::on_apply (this=0x55bd694aafe0, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/src/server_state.cpp:1137
              #35 0x000055bd675803e5 in wsrep::high_priority_service::apply (this=0x7f5fd43dfca0, ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/include/wsrep/high_priority_service.hpp:47
              #36 0x000055bd6757d343 in (anonymous namespace)::apply_cb (ctx=0x7f5fd43dfca0, wsh=0x7f5fd43deec0, flags=65, buf=0x7f5fd43deed0, meta=0x7f5fd43df190, exit_loop=0x7f5fd43df14f) at /home/jan/mysql/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:504
              #37 0x00007f5fd6dd6bf5 in galera::TrxHandleSlave::apply (this=this@entry=0x7f5fb0030700, recv_ctx=recv_ctx@entry=0x7f5fd43dfca0, apply_cb=0x55bd6757d110 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., exit_loop=exit_loop@entry=@0x7f5fd43df14f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/trx_handle.cpp:391
              #38 0x00007f5fd6de6216 in galera::ReplicatorSMM::apply_trx (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0, ts=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:504
              #39 0x00007f5fd6de9fd0 in galera::ReplicatorSMM::process_trx (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0, ts_ptr=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:2123
              #40 0x00007f5fd6e1dd20 in galera::GcsActionSource::process_writeset (this=0x55bd69511270, recv_ctx=0x7f5fd43dfca0, act=..., exit_loop=@0x7f5fd43df87f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:62
              #41 0x00007f5fd6e1e9d2 in galera::GcsActionSource::process (this=0x55bd69511270, recv_ctx=0x7f5fd43dfca0, exit_loop=@0x7f5fd43df87f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:183
              #42 0x00007f5fd6dea4c0 in galera::ReplicatorSMM::async_recv (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:390
              #43 0x00007f5fd6dc31f1 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /home/jan/mysql/mariadb-galeralib/galera/src/wsrep_provider.cpp:236
              #44 0x000055bd6757e510 in wsrep::wsrep_provider_v26::run_applier (this=0x55bd6939c410, applier_ctx=0x7f5fd43dfca0) at /home/jan/mysql/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:796
              #45 0x000055bd66b41359 in wsrep_replication_process (thd=0x7f5fb0000d90, arg=0x55bd69517260) at /home/jan/mysql/10.4/sql/wsrep_thd.cc:57
              #46 0x000055bd66b31d01 in start_wsrep_THD (arg=0x55bd69517260) at /home/jan/mysql/10.4/sql/wsrep_mysqld.cc:2915
              #47 0x000055bd6746e4bf in pfs_spawn_thread (arg=0x55bd69501450) at /home/jan/mysql/10.4/storage/perfschema/pfs.cc:1869
              #48 0x00007f5fdc3db450 in start_thread (arg=0x7f5fd43e0640) at pthread_create.c:473
              #49 0x00007f5fdbf72d53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
               
               - saving '/home/jan/mysql/10.4/mysql-test/var/log/galera.MW-286-innodb/' to '/home/jan/mysql/10.4/mysql-test/var/log/galera.MW-286-innodb/'
              

            jplindst Jan Lindström (Inactive) added a comment - Problem in 10.4 as follows branch : bb-10.4- MDEV-25114 test case : galera.MW-286 stack trace: Thread 1 (Thread 0x7f5fd43e0640 (LWP 924119)): #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54 #1 0x000055bd674e2f2e in my_write_core (sig=6) at /home/jan/mysql/10.4/mysys/stacktrace.c:386 #2 0x000055bd66c26a9d in handle_fatal_signal (sig=6) at /home/jan/mysql/10.4/sql/signal_handler.cc:356 #3 <signal handler called> #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #5 0x00007f5fdbe81864 in __GI_abort () at abort.c:79 #6 0x00007f5fdbe81749 in __assert_fail_base (fmt=0x7f5fdc00af78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55bd677df9ee "!is_owned()", file=0x55bd677dfa00 "/home/jan/mysql/10.4/storage/innobase/include/sync0policy.h", line=121, function=<optimized out>) at assert.c:92 #7 0x00007f5fdbe933d6 in __GI___assert_fail (assertion=0x55bd677df9ee "!is_owned()", file=0x55bd677dfa00 "/home/jan/mysql/10.4/storage/innobase/include/sync0policy.h", line=121, function=0x55bd677dfa40 "void MutexDebug<Mutex>::enter(const Mutex&, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101 #8 0x000055bd66eb265a in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x55bd680a09d0 <lock_sys+80>, mutex=..., filename=0x55bd677cfbb0 "/home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc", line=5195) at /home/jan/mysql/10.4/storage/innobase/include/sync0policy.h:121 #9 0x000055bd66eb0599 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x55bd680a09c0 <lock_sys+64>, n_spins=30, n_delay=4, name=0x55bd677cfbb0 "/home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc", line=5195) at /home/jan/mysql/10.4/storage/innobase/include/ib0mutex.h:592 #10 0x000055bd66e843dc in innobase_kill_query (thd=0x7f5fa0002120) at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:5195 #11 0x000055bd66c2b196 in kill_handlerton (thd=0x7f5fa0002120, plugin=0x7f5fb0034500, level=0x7f5fd43dd974) at /home/jan/mysql/10.4/sql/handler.cc:865 #12 0x000055bd668d4eff in plugin_foreach_with_mask (thd=0x7f5fa0002120, func=0x55bd66c2b056 <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7f5fd43dd974) at /home/jan/mysql/10.4/sql/sql_plugin.cc:2474 #13 0x000055bd66c2b1f8 in ha_kill_query (thd=0x7f5fa0002120, level=THD_ABORT_SOFTLY) at /home/jan/mysql/10.4/sql/handler.cc:872 #14 0x000055bd6683e273 in THD::awake_no_mutex (this=0x7f5fa0002120, state_to_set=KILL_QUERY) at /home/jan/mysql/10.4/sql/sql_class.cc:1899 #15 0x000055bd66b4dcc9 in wsrep_thd_bf_abort (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/sql/service_wsrep.cc:252 #16 0x000055bd66ea1888 in wsrep_innobase_kill_one_trx (bf_thd=0x7f5fb0000d90, victim_trx=0x7f5fc4bd5218, signal=1 '\001') at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:18725 #17 0x000055bd66ea1c48 in wsrep_abort_transaction (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:18773 #18 0x000055bd66c3df05 in ha_abort_transaction (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/sql/handler.cc:7129 #19 0x000055bd66b30040 in wsrep_handle_mdl_conflict (requestor_ctx=0x7f5fb0000eb0, ticket=0x7f5fa0143f20, key=0x7f5fa00944d0) at /home/jan/mysql/10.4/sql/wsrep_mysqld.cc:2386 #20 0x000055bd66a510ed in MDL_lock::can_grant_lock (this=0x7f5fa00944d0, type_arg=MDL_SHARED_WRITE, requestor_ctx=0x7f5fb0000eb0, ignore_lock_priority=false) at /home/jan/mysql/10.4/sql/mdl.cc:1798 #21 0x000055bd66a51a5e in MDL_context::try_acquire_lock_impl (this=0x7f5fb0000eb0, mdl_request=0x7f5fb002c6b0, out_ticket=0x7f5fd43dddd8) at /home/jan/mysql/10.4/sql/mdl.cc:2129 #22 0x000055bd66a51e6e in MDL_context::acquire_lock (this=0x7f5fb0000eb0, mdl_request=0x7f5fb002c6b0, lock_wait_timeout=86400) at /home/jan/mysql/10.4/sql/mdl.cc:2276 #23 0x000055bd6680e7cb in open_table_get_mdl_lock (thd=0x7f5fb0000d90, ot_ctx=0x7f5fd43de4a0, mdl_request=0x7f5fb002c6b0, flags=0, mdl_ticket=0x7f5fd43de100) at /home/jan/mysql/10.4/sql/sql_base.cc:1647 #24 0x000055bd6680f15b in open_table (thd=0x7f5fb0000d90, table_list=0x7f5fb002c260, ot_ctx=0x7f5fd43de4a0) at /home/jan/mysql/10.4/sql/sql_base.cc:1915 #25 0x000055bd66813523 in open_and_process_table (thd=0x7f5fb0000d90, tables=0x7f5fb002c260, counter=0x7f5fd43de534, flags=0, prelocking_strategy=0x7f5fd43de5b8, has_prelocking_list=false, ot_ctx=0x7f5fd43de4a0) at /home/jan/mysql/10.4/sql/sql_base.cc:3890 #26 0x000055bd668147f7 in open_tables (thd=0x7f5fb0000d90, options=..., start=0x7f5fd43de518, counter=0x7f5fd43de534, flags=0, prelocking_strategy=0x7f5fd43de5b8) at /home/jan/mysql/10.4/sql/sql_base.cc:4371 #27 0x000055bd6681676d in open_and_lock_tables (thd=0x7f5fb0000d90, options=..., tables=0x7f5fb002c260, derived=false, flags=0, prelocking_strategy=0x7f5fd43de5b8) at /home/jan/mysql/10.4/sql/sql_base.cc:5317 #28 0x000055bd667ce32f in open_and_lock_tables (thd=0x7f5fb0000d90, tables=0x7f5fb002c260, derived=false, flags=0) at /home/jan/mysql/10.4/sql/sql_base.h:503 #29 0x000055bd66d96de7 in Rows_log_event::do_apply_event (this=0x7f5fb002cb90, rgi=0x7f5fb0022700) at /home/jan/mysql/10.4/sql/log_event.cc:11409 #30 0x000055bd667b6b41 in Log_event::apply_event (this=0x7f5fb002cb90, rgi=0x7f5fb0022700) at /home/jan/mysql/10.4/sql/log_event.h:1492 #31 0x000055bd66b40a1b in wsrep_apply_events (thd=0x7f5fb0000d90, rli=0x7f5fb0016fd0, events_buf=0x7f5fc7fffbf8, buf_len=0) at /home/jan/mysql/10.4/sql/wsrep_applier.cc:200 #32 0x000055bd66b2183e in Wsrep_applier_service::apply_write_set (this=0x7f5fd43dfca0, ws_meta=..., data=...) at /home/jan/mysql/10.4/sql/wsrep_high_priority_service.cc:550 #33 0x000055bd675613b9 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/src/server_state.cpp:327 #34 0x000055bd6756532c in wsrep::server_state::on_apply (this=0x55bd694aafe0, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/src/server_state.cpp:1137 #35 0x000055bd675803e5 in wsrep::high_priority_service::apply (this=0x7f5fd43dfca0, ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/include/wsrep/high_priority_service.hpp:47 #36 0x000055bd6757d343 in (anonymous namespace)::apply_cb (ctx=0x7f5fd43dfca0, wsh=0x7f5fd43deec0, flags=65, buf=0x7f5fd43deed0, meta=0x7f5fd43df190, exit_loop=0x7f5fd43df14f) at /home/jan/mysql/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:504 #37 0x00007f5fd6dd6bf5 in galera::TrxHandleSlave::apply (this=this@entry=0x7f5fb0030700, recv_ctx=recv_ctx@entry=0x7f5fd43dfca0, apply_cb=0x55bd6757d110 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., exit_loop=exit_loop@entry=@0x7f5fd43df14f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/trx_handle.cpp:391 #38 0x00007f5fd6de6216 in galera::ReplicatorSMM::apply_trx (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0, ts=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:504 #39 0x00007f5fd6de9fd0 in galera::ReplicatorSMM::process_trx (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0, ts_ptr=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:2123 #40 0x00007f5fd6e1dd20 in galera::GcsActionSource::process_writeset (this=0x55bd69511270, recv_ctx=0x7f5fd43dfca0, act=..., exit_loop=@0x7f5fd43df87f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:62 #41 0x00007f5fd6e1e9d2 in galera::GcsActionSource::process (this=0x55bd69511270, recv_ctx=0x7f5fd43dfca0, exit_loop=@0x7f5fd43df87f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:183 #42 0x00007f5fd6dea4c0 in galera::ReplicatorSMM::async_recv (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:390 #43 0x00007f5fd6dc31f1 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /home/jan/mysql/mariadb-galeralib/galera/src/wsrep_provider.cpp:236 #44 0x000055bd6757e510 in wsrep::wsrep_provider_v26::run_applier (this=0x55bd6939c410, applier_ctx=0x7f5fd43dfca0) at /home/jan/mysql/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:796 #45 0x000055bd66b41359 in wsrep_replication_process (thd=0x7f5fb0000d90, arg=0x55bd69517260) at /home/jan/mysql/10.4/sql/wsrep_thd.cc:57 #46 0x000055bd66b31d01 in start_wsrep_THD (arg=0x55bd69517260) at /home/jan/mysql/10.4/sql/wsrep_mysqld.cc:2915 #47 0x000055bd6746e4bf in pfs_spawn_thread (arg=0x55bd69501450) at /home/jan/mysql/10.4/storage/perfschema/pfs.cc:1869 #48 0x00007f5fdc3db450 in start_thread (arg=0x7f5fd43e0640) at pthread_create.c:473 #49 0x00007f5fdbf72d53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   - saving '/home/jan/mysql/10.4/mysql-test/var/log/galera.MW-286-innodb/' to '/home/jan/mysql/10.4/mysql-test/var/log/galera.MW-286-innodb/'
            jplindst Jan Lindström (Inactive) added a comment - - edited serg marko This is now ready for review. Latest versions: 10.2: https://github.com/MariaDB/server/commit/8f075c6d3dde8a7eec3ce901a0699d1478e5c065 10.3: https://github.com/MariaDB/server/commit/1eb067c5a29cc7f7245e8e7056e687b74494476e 10.4: https://github.com/MariaDB/server/commit/f79227a61f80e6590d967c110b0f9cf6f3d8bffe 10.5: https://github.com/MariaDB/server/commit/d683b6ece75a07d8cfaf9301532184cde9e3a1f7 10.6: https://github.com/MariaDB/server/commit/14b125c8b54439360e2277e725c56af0ee52d029 Next step is to do pquery QA runs on these.

            jplindst pquery run looks good. Did not hit any new crashes related to this issue.
            pquery run test case:

            • multi-threaded
            • multi master
            • SQL file prepared with KILL CONNECTION/QUERY statements
            ramesh Ramesh Sivaraman added a comment - jplindst pquery run looks good. Did not hit any new crashes related to this issue. pquery run test case: multi-threaded multi master SQL file prepared with KILL CONNECTION/QUERY statements

            I posted a few comments to the 10.2 version of this:

            • The claimed revert of MDEV-23328 includes some strange changes, including introducing a potential race condition to wsrep_abort_transaction().
            • The proposed fix appears to introduce a race condition in THD::notify_shared_lock(). I am not very familiar with this code, but I suspect that as soon as THD::LOCK_thd_data on the to-be-aborted session is released, that session object may be concurrently aborted by some other thread, and could be assigned to another transaction before wsrep_abort_thd() is invoked. I think that this (as well as the above mentioned change to wsrep_abort_transaction() needs to be tested with an extra sleep or yield added between the mutex release and the call.
            • If I look at the total across the 2 commits, I see that the function wsrep_innobase_kill_one_trx() has been moved somewhere else in the file, which makes it harder to read the actual changes. Please try to generate a minimal diff, so that I can review deeper.

            I see that there is a 10.6 version as well, not affecting InnoDB, and seemingly not affecting anything outside Galera.

            marko Marko Mäkelä added a comment - I posted a few comments to the 10.2 version of this: The claimed revert of MDEV-23328 includes some strange changes, including introducing a potential race condition to wsrep_abort_transaction() . The proposed fix appears to introduce a race condition in THD::notify_shared_lock() . I am not very familiar with this code, but I suspect that as soon as THD::LOCK_thd_data on the to-be-aborted session is released, that session object may be concurrently aborted by some other thread, and could be assigned to another transaction before wsrep_abort_thd() is invoked. I think that this (as well as the above mentioned change to wsrep_abort_transaction() needs to be tested with an extra sleep or yield added between the mutex release and the call. If I look at the total across the 2 commits, I see that the function wsrep_innobase_kill_one_trx() has been moved somewhere else in the file, which makes it harder to read the actual changes. Please try to generate a minimal diff, so that I can review deeper. I see that there is a 10.6 version as well, not affecting InnoDB, and seemingly not affecting anything outside Galera.

            The latest revision for 10.2 looks mostly OK to me. My only concern is a potential race condition in wsrep_close_client_connections(), but because I am not familiar with that code, I may be mistaken there.

            marko Marko Mäkelä added a comment - The latest revision for 10.2 looks mostly OK to me. My only concern is a potential race condition in wsrep_close_client_connections() , but because I am not familiar with that code, I may be mistaken there.
            jplindst Jan Lindström (Inactive) added a comment - - edited serg Can you review latest versions in here 10.2 : https://github.com/MariaDB/server/commit/4b164f176e6ec34864afdaa3dc7be23c1ce09c79 10.3 : https://github.com/MariaDB/server/commit/8b29e8dec808f9f17d5f72c3209322dc72fbe4d3 10.4 : https://github.com/MariaDB/server/commit/5e9c98bc0595ea5e8ce297adf9ef31bfcabb638d 10.5 : https://github.com/MariaDB/server/commit/da9a37bf4181d92699af40fdd82ab5b7ae1bf307 10.6 : https://github.com/MariaDB/server/commit/2b9b10e05bbf67d8e6ee67337182b114e88656e2

            Suggested way to fix this issue is not correct. I will fall back to KILL as TOI solution.

            jplindst Jan Lindström (Inactive) added a comment - Suggested way to fix this issue is not correct. I will fall back to KILL as TOI solution.
            jplindst Jan Lindström (Inactive) added a comment - - edited

            serg Can you please review following:

            jplindst Jan Lindström (Inactive) added a comment - - edited serg Can you please review following: 10.2 branch: bb-10.2-KILL-as-TOI-galera commit: https://github.com/MariaDB/server/commit/6d0c1f3ae12593470f2a556c33e586bc08c677e8 10.3 branch: bb-10.3-KILL-as-TOI-galera commit: https://github.com/MariaDB/server/commit/ed85dc379fe686ee2bd8707826864e3d7c62877c
            jplindst Jan Lindström (Inactive) added a comment - - edited Final versions: branch: bb-10.2-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/f3e5e3d897c90dcf04586e57c6bec5ef6fabb8e5 branch: bb-10.3-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/002192b92cf3c9e206d711ca36cdf6694a457a8f branch: bb-10.4-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/778d88ea2d9ec9b2083a6eaadea5d31f9571ff14 branch: bb-10.5-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/2c8f52ea53d7532e395fa92c3b0e9c5dfb619403

            rebase,testing and push 10.5

            jplindst Jan Lindström (Inactive) added a comment - rebase,testing and push 10.5
            Baroti Steve Baroti added a comment -

            OMG, we are awaiting for this fix, thank you! After the September's 10.3.24 -> 10.3.31 upgrade, our three nodes dev cluster (10.3.31) has its node's mariadb service crashing, each node, every one or two hours. Wanted to revert, but we have to much data to restore. A festival of mariadb restarting nodes! Thanks again for your great efforts! Fingers crossed!

            Baroti Steve Baroti added a comment - OMG, we are awaiting for this fix, thank you! After the September's 10.3.24 -> 10.3.31 upgrade, our three nodes dev cluster (10.3.31) has its node's mariadb service crashing, each node, every one or two hours. Wanted to revert, but we have to much data to restore. A festival of mariadb restarting nodes! Thanks again for your great efforts! Fingers crossed!

            In 10.6.0, this was fixed in a simpler way by MDEV-24915.

            marko Marko Mäkelä added a comment - In 10.6.0, this was fixed in a simpler way by MDEV-24915 .

            People

              jplindst Jan Lindström (Inactive)
              emaijala Ere Maijala
              Votes:
              16 Vote for this issue
              Watchers:
              37 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.