Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29512

WSREP: cluster conflict causes cluster deadlock

Details

    Description

      On a galera cluster with 3 nodes we have an even running the query below:

      2022-09-12  6:02:11 1 [Note] WSREP: cluster conflict due to high priority abort for threads:
      2022-09-12  6:02:11 1 [Note] WSREP: Winning thread: 
         THD: 1, mode: high priority, state: exec, conflict: executing, seqno: 76697477
         SQL: delete entry from entry inner join entry_file on entry_file.id=entry.file where entry_file.date_upload < now() -interval 366 day and  entry_file.date_upload > now() - interval 370 day and entry.id not in (select jm.entryid from entrymail_records jm) and entry.country_code <> 'BE'^V<AF>^^c^S^A
      2022-09-12  6:02:11 1 [Note] WSREP: Victim thread: 
         THD: 2401338, mode: local, state: exec, conflict: certifying, seqno: -1
         SQL: delete entry from entry inner join entry_file on entry_file.id=entry.file where entry_file.date_upload < now() -interval 366 day and  entry_file.date_upload > now() - interval 370 day and entry.id not in (select jm.entryid from entrymail_records jm) and entry.country_code <> 'BE'
      2022-09-12  6:02:11 1 [Note] WSREP: context: /home/buildbot/buildbot/build/mariadb-10.5.15/storage/innobase/handler/ha_innodb.cc:18802
      

      Now this caused a conflict, and WSREP aborted a thread.

      But this abort caused a deadlock on the node affecting all writes on the whole mariadb cluster.
      All writes on other nodes were hanging in a 'Commit' state.

      MariaDB also cannot be stopped correctly, only a kill -9 can stop the instance.
      I tried to stop the instance, and it failed, but after this I created a gdb coredump.

      Some relevant traces:

      Thread 5 (Thread 0x7f47f82db700 (LWP 78621)):
      #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
      #1  0x00007f480139a87a in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f44a0665340) at ../nptl/pthread_mutex_lock.c:135
      #2  0x000055c808b3ecc3 in inline_mysql_mutex_lock (src_line=591, src_file=0x55c809370458 "/home/buildbot/buildbot/build/mariadb-10.5.15/sql/sql_repl.cc", that=0x7f44a0665340) at ./include/mysql/psi/mysql_thread.h:752
      #3  log_in_use_callback (log_name=0x7f47f82d8890 "/var/lib/mysql/cust-db002-bin.000145", thd=0x7f44a06636a8) at ./sql/sql_repl.cc:591
      #4  THD_list_iterator::iterate<char const> (arg=<optimized out>, action=<optimized out>, this=<optimized out>) at ./sql/sql_class.h:1077
      #5  log_in_use (log_name=log_name@entry=0x7f47f82d8890 "/var/lib/mysql/cust-db002-bin.000145") at ./sql/sql_repl.cc:601
      #6  0x000055c808dfbfa7 in MYSQL_BIN_LOG::can_purge_log (this=0x55c809b76dc0 <mysql_bin_log>, log_file_name_arg=0x7f47f82d8890 "/var/lib/mysql/cust-db002-bin.000145") at ./sql/log.cc:5149
      #7  MYSQL_BIN_LOG::can_purge_log (this=0x55c809b76dc0 <mysql_bin_log>, log_file_name_arg=0x7f47f82d8890 "/var/lib/mysql/cust-db002-bin.000145") at ./sql/log.cc:5132
      #8  0x000055c808e04075 in MYSQL_BIN_LOG::purge_logs_before_date (this=0x55c809b76dc0 <mysql_bin_log>, purge_time=1662091333) at ./sql/log.cc:5075
      #9  0x000055c808e09774 in MYSQL_BIN_LOG::trx_group_commit_leader (this=this@entry=0x55c809b76dc0 <mysql_bin_log>, leader=leader@entry=0x7f47f82d8c00) at ./sql/log.cc:8486
      #10 0x000055c808e09a14 in MYSQL_BIN_LOG::write_transaction_to_binlog_events (this=0x55c809b76dc0 <mysql_bin_log>, entry=entry@entry=0x7f47f82d8c00) at ./sql/log.cc:8046
      #11 0x000055c808e09f07 in MYSQL_BIN_LOG::write_transaction_to_binlog (this=0x55c809b76dc0 <mysql_bin_log>, thd=<optimized out>, cache_mngr=0x7f479c030658, end_ev=<optimized out>, all=<optimized out>, using_stmt_cache=<optimized out>, using_trx_cache=true) at ./sql/log.cc:7641
      #12 0x000055c808e09ff7 in binlog_flush_cache (thd=thd@entry=0x7f479c000c18, cache_mngr=cache_mngr@entry=0x7f479c030658, end_ev=end_ev@entry=0x7f47f82d8d60, all=all@entry=true, using_stmt=using_stmt@entry=true, using_trx=using_trx@entry=true) at ./sql/log.cc:1780
      #13 0x000055c808e0b953 in binlog_commit_flush_xid_caches (xid=76697477, all=true, cache_mngr=0x7f479c030658, thd=0x7f479c000c18) at ./sql/log.cc:1928
      #14 MYSQL_BIN_LOG::log_and_order (this=<optimized out>, thd=0x7f479c000c18, xid=76697477, all=<optimized out>, need_prepare_ordered=<optimized out>, need_commit_ordered=<optimized out>) at ./sql/log.cc:10019
      #15 0x000055c808d23637 in ha_commit_trans (thd=thd@entry=0x7f479c000c18, all=all@entry=true) at ./sql/handler.cc:1789
      #16 0x000055c808c26611 in trans_commit (thd=thd@entry=0x7f479c000c18) at ./sql/transaction.cc:266
      #17 0x000055c808fb69f8 in Wsrep_high_priority_service::commit (this=0x7f47f82dacc0, ws_handle=..., ws_meta=...) at ./sql/wsrep_high_priority_service.cc:329
      #18 0x000055c80932fe66 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at ./wsrep-lib/src/server_state.cpp:332
      #19 0x000055c809330aed in wsrep::server_state::on_apply (this=<optimized out>, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at ./wsrep-lib/src/server_state.cpp:1114
      #20 0x000055c809344314 in wsrep::high_priority_service::apply (data=..., ws_meta=..., ws_handle=..., this=0x7f47f82dacc0) at ./wsrep-lib/include/wsrep/high_priority_service.hpp:47
      #21 (anonymous namespace)::apply_cb (ctx=0x7f47f82dacc0, wsh=<optimized out>, flags=<optimized out>, buf=<optimized out>, meta=<optimized out>, exit_loop=0x7f47f82d9f4f) at ./wsrep-lib/src/wsrep_provider_v26.cpp:507
      #22 0x00007f47f946698b in galera::TrxHandleSlave::apply (this=this@entry=0x7f431cbe7c70, recv_ctx=recv_ctx@entry=0x7f47f82dacc0, apply_cb=0x55c8093441e0 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., exit_loop=exit_loop@entry=@0x7f47f82d9f4f: false) at ./galera/src/trx_handle.cpp:396
      #23 0x00007f47f9478290 in galera::ReplicatorSMM::apply_trx (this=0x55c80c1e2790, recv_ctx=0x7f47f82dacc0, ts=...) at ./galera/src/replicator_smm.cpp:516
      #24 0x00007f47f947a351 in galera::ReplicatorSMM::process_trx (this=0x55c80c1e2790, recv_ctx=0x7f47f82dacc0, ts_ptr=...) at ./galera/src/replicator_smm.cpp:2136
      #25 0x00007f47f94a9d0b in galera::GcsActionSource::process_writeset (this=0x55c80c2dbf70, recv_ctx=0x7f47f82dacc0, act=..., exit_loop=@0x7f47f82da8cf: false) at ./galera/src/gcs_action_source.cpp:62
      #26 0x00007f47f94aa69a in galera::GcsActionSource::dispatch (this=0x55c80c2dbf70, recv_ctx=0x7f47f82dacc0, act=..., exit_loop=<optimized out>) at ./galera/src/gcs_action_source.cpp:110
      #27 0x00007f47f94ab1c1 in galera::GcsActionSource::process (this=0x55c80c2dbf70, recv_ctx=0x7f47f82dacc0, exit_loop=@0x7f47f82da8cf: false) at ./galera/src/gcs_action_source.cpp:186
      #28 0x00007f47f947b420 in galera::ReplicatorSMM::async_recv (this=0x55c80c1e2790, recv_ctx=0x7f47f82dacc0) at ./galera/src/replicator_smm.cpp:402
      #29 0x00007f47f945551b in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at ./galera/src/wsrep_provider.cpp:263
      #30 0x000055c8093450ee in wsrep::wsrep_provider_v26::run_applier (this=<optimized out>, applier_ctx=<optimized out>) at ./wsrep-lib/src/wsrep_provider_v26.cpp:858
      #31 0x000055c808fd11c3 in wsrep_replication_process (thd=0x7f479c000c18, arg=<optimized out>) at ./wsrep-lib/include/wsrep/server_state.hpp:320
      #32 0x000055c808fc1f7f in start_wsrep_THD (arg=arg@entry=0x55c80c42a580) at ./sql/wsrep_mysqld.h:560
      #33 0x000055c808f4fa04 in pfs_spawn_thread (arg=0x55c80c415458) at ./storage/perfschema/pfs.cc:2201
      #34 0x00007f4801397fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
      #35 0x00007f4800fbaeff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Lock seems to be owned by thread 99:

      (gdb) print *(pthread_mutex_t*)0x7f44a0665340
      $3 = {__data = {__lock = 2, __count = 0, __owner = 105061, __nusers = 1, __kind = 3, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000e\232\001\000\001\000\000\000\003", '\000' <repeats 22 times>, __align = 2}
      (gdb) thread find 105061
      Thread 99 has target id 'Thread 0x7f44c7687700 (LWP 105061)'
      

      Then a lot of threads in the following state:

      Thread 14 (Thread 0x7f4790f63700 (LWP 78879)):
      #0  futex_abstimed_wait (private=0, abstime=0x0, expected=3, futex_word=0x55c809b4216c <server_threads+44>) at ../sysdeps/unix/sysv/linux/futex-internal.h:172
      #1  __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x55c809b42160 <server_threads+32>, rwlock@entry=0x20) at pthread_rwlock_common.c:706
      #2  __GI___pthread_rwlock_wrlock (rwlock=rwlock@entry=0x55c809b42160 <server_threads+32>) at pthread_rwlock_wrlock.c:27
      #3  0x000055c808a5a139 in inline_mysql_rwlock_wrlock (src_line=7501, src_file=0x55c80935a140 "/home/buildbot/buildbot/build/mariadb-10.5.15/sql/sql_class.h", that=0x55c809b42160 <server_threads+32>) at ./include/mysql/psi/mysql_thread.h:1008
      #4  THD_list::erase (thd=0x7f478c0113a8, this=0x55c809b42140 <server_threads>) at ./sql/sql_class.h:7501
      #5  unlink_thd (thd=0x7f478c0113a8) at ./sql/mysqld.cc:2578
      #6  0x000055c808c1757f in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55c817dd3828, put_in_cache=put_in_cache@entry=true) at ./sql/sql_connect.cc:1429
      #7  0x000055c808c17c0d in handle_one_connection (arg=arg@entry=0x55c817dd3828) at ./sql/sql_connect.cc:1312
      #8  0x000055c808f4fa04 in pfs_spawn_thread (arg=0x55c817dddc68) at ./storage/perfschema/pfs.cc:2201
      #9  0x00007f4801397fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
      #10 0x00007f4800fbaeff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      And one galera thread that seems to hang:

      Thread 99 (Thread 0x7f44c7687700 (LWP 105061)):
      #0  futex_abstimed_wait_cancelable (private=0, abstime=0x7f44c7685490, expected=0, futex_word=0x7f44a17fb248) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
      #1  __pthread_cond_wait_common (abstime=0x7f44c7685490, mutex=0x55c80c1e3a38, cond=0x7f44a17fb220) at pthread_cond_wait.c:539
      #2  __pthread_cond_timedwait (cond=cond@entry=0x7f44a17fb220, mutex=mutex@entry=0x55c80c1e3a38, abstime=abstime@entry=0x7f44c7685490) at pthread_cond_wait.c:667
      #3  0x00007f47f9472346 in gu_cond_timedwait_SYS (ts=0x7f44c7685490, mutex=0x55c80c1e3a38, cond=<optimized out>) at ./galerautils/src/gu_threads.h:264
      #4  gu::Lock::wait (this=<synthetic pointer>, date=..., cond=...) at ./galerautils/src/gu_lock.hpp:64
      #5  galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::wait (wait_until=..., gtid=..., this=0x55c80c1e3a38) at ./galera/src/monitor.hpp:399
      #6  galera::ReplicatorSMM::sync_wait (this=0x55c80c1e2790, upto=<optimized out>, tout=<optimized out>, gtid=0x0) at ./galera/src/replicator_smm.cpp:1603
      #7  0x00007f47f945557e in galera_sync_wait (wsrep=<optimized out>, upto=<optimized out>, tout=<optimized out>, gtid=<optimized out>) at ./galera/src/wsrep_provider.cpp:502
      #8  0x000055c80934560a in wsrep::wsrep_provider_v26::wait_for_gtid (this=<optimized out>, gtid=..., timeout=<optimized out>) at ./wsrep-lib/src/wsrep_provider_v26.cpp:1056
      #9  0x000055c808fd85a9 in Wsrep_server_service::set_position (this=<optimized out>, c=..., gtid=...) at ./wsrep-lib/include/wsrep/server_state.hpp:320
      #10 0x000055c809335d36 in wsrep::transaction::release_commit_order (this=0x7f44a0669958, lock=...) at ./wsrep-lib/include/wsrep/provider.hpp:143
      #11 0x000055c809340b4a in wsrep::transaction::after_statement (this=this@entry=0x7f44a0669958) at ./wsrep-lib/src/transaction.cpp:890
      #12 0x000055c809322c79 in wsrep::client_state::after_statement (this=this@entry=0x7f44a06698f0) at ./wsrep-lib/src/client_state.cpp:278
      #13 0x000055c808a885d9 in wsrep_after_statement (thd=<optimized out>) at ./sql/sql_class.h:5102
      #14 sp_head::execute (this=0x7f44a18757e0, thd=0x7f44a06636a8, merge_da_on_success=true) at ./sql/sp_head.cc:1476
      #15 0x000055c808a896a7 in sp_head::execute_procedure (this=this@entry=0x7f44a18757e0, thd=thd@entry=0x7f44a06636a8, args=args@entry=0x7f44c7686380) at ./sql/sp_head.cc:2449
      #16 0x000055c808bfcb82 in Event_job_data::execute (this=this@entry=0x7f44c7686b40, thd=thd@entry=0x7f44a06636a8, drop=<optimized out>) at ./sql/event_data_objects.cc:1485
      #17 0x000055c808e88d3d in Event_worker_thread::run (this=0x7f44c7686ddf, thd=0x7f44a06636a8, event=0x7f44a0fe8380) at ./sql/event_scheduler.cc:323
      #18 0x000055c808e88dca in event_worker_thread (arg=arg@entry=0x7f44a0fe8380) at ./sql/event_scheduler.cc:268
      #19 0x000055c808f4fa04 in pfs_spawn_thread (arg=0x7f44a0e31198) at ./storage/perfschema/pfs.cc:2201
      #20 0x00007f4801397fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
      #21 0x00007f4800fbaeff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      It seems very similar to MDEV-25368, but that should already have been fixed in 10.5.15?

      Corefile & libs have been uploaded to the MariaDB FTP!
      This is a debian 10 system with MariaDB 1:10.5.15+maria~buster installed.

      Attachments

        Issue Links

          Activity

            Thanks for fixing this!
            Does this affect only 10.5 or also newer versions?

            dupondje Jean-Louis Dupond added a comment - Thanks for fixing this! Does this affect only 10.5 or also newer versions?
            seppo Seppo Jaakola added a comment -

            I would expect this issue to affect all versions 10.4 and later. I have now a test for reproducing the hanging, and it should be simple to verify the affected versions. The fix looks effective in my testing.

            seppo Seppo Jaakola added a comment - I would expect this issue to affect all versions 10.4 and later. I have now a test for reproducing the hanging, and it should be simple to verify the affected versions. The fix looks effective in my testing.
            seppo Seppo Jaakola added a comment -

            Fix is pushed in wsrep-lib
            PR contains a new mtr test for reproducing the issue, all versions 10.4 and up are vulnerable

            seppo Seppo Jaakola added a comment - Fix is pushed in wsrep-lib PR contains a new mtr test for reproducing the issue, all versions 10.4 and up are vulnerable
            seppo Seppo Jaakola added a comment -

            PR containing the fix (in wsrep-lib) and mtr test was submitted today.
            The issue is not limited to event processing, it can happen with regular transactions as well (e.g. the mtr test does not use events).
            The conditions for the issue to happen are:

            • binlog purging must happen (i.e. binlog expiration is configured)
            • cluster wide conflict must happen, and the BF abort victim should be ordered later than the binlog purge processor
            seppo Seppo Jaakola added a comment - PR containing the fix (in wsrep-lib) and mtr test was submitted today. The issue is not limited to event processing, it can happen with regular transactions as well (e.g. the mtr test does not use events). The conditions for the issue to happen are: binlog purging must happen (i.e. binlog expiration is configured) cluster wide conflict must happen, and the BF abort victim should be ordered later than the binlog purge processor
            jplindst Jan Lindström (Inactive) added a comment - 10.5: bb-10.5- MDEV-29880 -galera 10.6: bb-10.6- MDEV-29880 -galera

            People

              seppo Seppo Jaakola
              dupondje Jean-Louis Dupond
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.