Details
Description
Server starts crashing after upgrade to Mariadb 10.4.11. We have 3 node galera cluster. Error is probably related to dropping and recreating table, which is used by other node.
Logs from few last crashes:
led 24 22:00:19 db2 mysqld[18625]: 2020-01-24 22:00:19 16 [Note] WSREP: MDL BF-BF conflict
|
led 24 22:00:19 db2 mysqld[18625]: schema: s_demo
|
led 24 22:00:19 db2 mysqld[18625]: [157B blob data]
|
led 24 22:00:19 db2 mysqld[18625]: granted: (11 seqno 804791504 wsrep (toi, exec, committed) cmd 0 1 CREATE TABLE IF NOT EXISTS `shopio_vouchers_products` (
|
led 24 22:00:19 db2 mysqld[18625]: `id` int(11) NOT NULL AUTO_INCREMENT,
|
led 24 22:00:19 db2 mysqld[18625]: `vouchers_id` int(11) NOT NULL,
|
led 24 22:00:19 db2 mysqld[18625]: `products_id` int(11) NOT NULL,
|
led 24 22:00:19 db2 mysqld[18625]: PRIMARY KEY (`id`),
|
led 24 22:00:19 db2 mysqld[18625]: UNIQUE KEY `unique_ids` (`vouchers_id`,`products_id`),
|
led 24 22:00:19 db2 mysqld[18625]: KEY `products_id` (`products_id`),
|
led 24 22:00:19 db2 mysqld[18625]: CONSTRAINT FOREIGN KEY (`products_id`) REFERENCES `shopio_products` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
|
led 24 22:00:19 db2 mysqld[18625]: CONSTRAINT FOREIGN KEY (`vouchers_id`) REFERENCES `shopio_vouchers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
|
led 24 22:00:19 db2 mysqld[18625]: ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4)
|
led 24 22:00:19 db2 mysqld[18625]: 2020-01-24 22:00:19 16 [ERROR] Aborting
|
led 24 22:08:32 db2 systemd[1]: Stopping MariaDB 10.4.11 database server...
|
led 24 22:08:32 db2 mysqld[18625]: 2020-01-24 22:08:32 0 [Warning] WSREP: server: db2 unallowed state transition: disconnecting -> disconnecting
|
led 24 22:08:32 db2 mysqld[18625]: mysqld: /home/buildbot/buildbot/build/mariadb-10.4.11/wsrep-lib/src/server_state.cpp:1340: void wsrep::server_state::state(wsrep::unique_lock<wsrep::mutex>&, wsrep::server_state::state): Assertion `0' failed.
|
led 24 22:08:32 db2 mysqld[18625]: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%200124 22:08:32 [ERROR] mysqld got signal 6 ;
|
led 24 22:08:32 db2 mysqld[18625]: This could be because you hit a bug. It is also possible that this binary
|
led 24 22:08:32 db2 mysqld[18625]: or one of the libraries it was linked against is corrupt, improperly built,
|
led 24 22:08:32 db2 mysqld[18625]: or misconfigured. This error can also be caused by malfunctioning hardware.
|
led 24 22:08:32 db2 mysqld[18625]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
led 24 22:08:32 db2 mysqld[18625]: We will try our best to scrape up some info that will hopefully help
|
led 24 22:08:32 db2 mysqld[18625]: diagnose the problem, but since we have already crashed,
|
led 24 22:08:32 db2 mysqld[18625]: something is definitely wrong and this may fail.
|
led 24 22:08:32 db2 mysqld[18625]: Server version: 10.4.11-MariaDB-1:10.4.11+maria~stretch-log
|
led 24 22:08:32 db2 mysqld[18625]: key_buffer_size=1048576
|
led 24 22:08:32 db2 mysqld[18625]: read_buffer_size=262144
|
led 24 22:08:32 db2 mysqld[18625]: max_used_connections=44
|
led 24 22:08:32 db2 mysqld[18625]: max_threads=2327
|
led 24 22:08:32 db2 mysqld[18625]: thread_count=21
|
led 24 22:08:32 db2 mysqld[18625]: It is possible that mysqld could use up to
|
led 24 22:08:32 db2 mysqld[18625]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5419764 K bytes of memory
|
led 24 22:08:32 db2 mysqld[18625]: Hope that's ok; if not, decrease some variables in the equation.
|
led 24 22:08:32 db2 mysqld[18625]: Thread pointer: 0x0
|
led 24 22:08:32 db2 mysqld[18625]: Attempting backtrace. You can use the following information to find out
|
led 24 22:08:32 db2 mysqld[18625]: where mysqld died. If you see no messages after this, something went
|
led 24 22:08:32 db2 mysqld[18625]: terribly wrong...
|
led 24 22:08:32 db2 mysqld[18625]: stack_bottom = 0x0 thread_stack 0x31000
|
led 24 22:08:32 db2 mysqld[18625]: *** buffer overflow detected ***: /usr/sbin/mysqld terminated
|
led 24 22:08:32 db2 mysqld[18625]: ======= Backtrace: =========
|
led 24 22:08:32 db2 mysqld[18625]: /lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f39ad77cbfb]
|
led 24 22:08:32 db2 mysqld[18625]: /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f39ad805437]
|
led 24 22:08:32 db2 mysqld[18625]: /lib/x86_64-linux-gnu/libc.so.6(+0xf7570)[0x7f39ad803570]
|
led 24 22:08:32 db2 mysqld[18625]: /lib/x86_64-linux-gnu/libc.so.6(+0xf93aa)[0x7f39ad8053aa]
|
led 24 22:08:32 db2 mysqld[18625]: /usr/sbin/mysqld(my_addr_resolve+0xe2)[0x559a16f0f492]
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 2 conflict_state: 0 seqno: 805439671)
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [ERROR] Slave SQL: Error executing row event: 'Table 's_demo.shopio_vouchers_products' doesn't exist', Internal MariaDB error code: 1146
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [Warning] WSREP: Event 23 Update_rows_v1 apply failed: 1146, seqno 805439671
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [ERROR] WSREP: Failed to apply write set: gtid: 54078096-9d91-11e9-8472-a7c937a2aaa8:805439671 server_id: 34aa6dfd-3d7d-11ea-9f02-1e9305220e61 client_id: 1
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [Note] WSREP: Closing send monitor...
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [Note] WSREP: Closed send monitor.
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [Note] WSREP: gcomm: terminating thread
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [Note] WSREP: gcomm: joining thread
|
led 25 02:00:21 db2 mysqld[11890]: 2020-01-25 2:00:21 1 [Note] WSREP: gcomm: closing backend
|
led 25 07:00:19 db2 mysqld[17264]: 2020-01-25 7:00:19 16 [Note] WSREP: MDL BF-BF conflict
|
led 25 07:00:19 db2 mysqld[17264]: schema: s_demo
|
led 25 07:00:19 db2 mysqld[17264]: request: (16 seqno 806396466 wsrep (toi, exec, committed) cmd 0 3 /*!40000 ALTER TABLE `shopio_vouchers_products` DISABLE KEYS */)
|
led 25 07:00:19 db2 mysqld[17264]: [157B blob data]
|
led 25 07:00:19 db2 mysqld[17264]: 2020-01-25 7:00:19 16 [ERROR] Aborting
|
led 25 07:10:24 db2 systemd[1]: Stopping MariaDB 10.4.11 database server...
|
led 25 07:10:24 db2 mysqld[17264]: 2020-01-25 7:10:24 0 [Warning] WSREP: server: db2 unallowed state transition: disconnecting -> disconnecting
|
led 25 07:10:24 db2 mysqld[17264]: mysqld: /home/buildbot/buildbot/build/mariadb-10.4.11/wsrep-lib/src/server_state.cpp:1340: void wsrep::server_state::state(wsrep::unique_lock<wsrep::mutex>&, wsrep::server_state
|
led 25 07:10:24 db2 mysqld[17264]: 200125 7:10:24 [ERROR] mysqld got signal 6 ;
|
led 25 07:10:24 db2 mysqld[17264]: This could be because you hit a bug. It is also possible that this binary
|
led 25 07:10:24 db2 mysqld[17264]: or one of the libraries it was linked against is corrupt, improperly built,
|
led 25 07:10:24 db2 mysqld[17264]: or misconfigured. This error can also be caused by malfunctioning hardware.
|
led 25 07:10:24 db2 mysqld[17264]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
led 25 07:10:24 db2 mysqld[17264]: We will try our best to scrape up some info that will hopefully help
|
led 25 07:10:24 db2 mysqld[17264]: diagnose the problem, but since we have already crashed,
|
led 25 07:10:24 db2 mysqld[17264]: something is definitely wrong and this may fail.
|
led 25 07:10:24 db2 mysqld[17264]: Server version: 10.4.11-MariaDB-1:10.4.11+maria~stretch-log
|
led 25 07:10:24 db2 mysqld[17264]: key_buffer_size=1048576
|
led 25 07:10:24 db2 mysqld[17264]: read_buffer_size=262144
|
led 25 07:10:24 db2 mysqld[17264]: max_used_connections=39
|
led 25 07:10:24 db2 mysqld[17264]: max_threads=2327
|
led 25 07:10:24 db2 mysqld[17264]: thread_count=26
|
led 25 07:10:24 db2 mysqld[17264]: It is possible that mysqld could use up to
|
led 25 07:10:24 db2 mysqld[17264]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5419764 K bytes of memory
|
Attachments
Issue Links
- relates to
-
MDEV-24119 MDL BF-BF Conflict caused by TRUNCATE TABLE
-
- Closed
-
-
MDEV-23997 truncate is crashing mariadb galera cluster slave nodes
-
- Closed
-
The followup by Fišer accurately describes the hanging mariadbd 10.4.20 that I ran into just now:
2021-11-17 16:10:06 22 [Note] WSREP: MDL BF-BF conflict
schema: redacted_2021
request: (22 seqno 31261949 wsrep (toi, exec, committed) cmd 0 3 ALTER TABLE `stc_16_cblist` ADD `field_column` text)
granted: (17 seqno 31261952 wsrep (high priority, exec, committing) cmd 0 161 (null))
2021-11-17 16:10:06 22 [ERROR] Aborting
The SHOW PROCESSLIST shows all threads in Daemon or Sleep state:
I salvaged a core dump. But this setup has a interesting peculiarity with these dumps, which I haven't been able to explain
. There seem to be certain threads missing in the core dump, which results in this behaviour:
# gdb `which mysqld` ./core -batch -ex 'info threads' | tail -n1
29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
warning: Couldn't find general-purpose registers in core file.
PC register is not available
61 Thread 0x7f3a8cd756c0 (LWP 1)
That "PC register is not available" error fails all "thread apply all" lookups, but we can get at some/most
of the threads using a workaround:
for x in `seq 1 61`; do echo -ex; echo "thread apply $x bt full"; done |
xargs -d\\n gdb `which mysqld` ./core -batch
Threads are all waiting for something non-obvious:
# tail -n61 info-threads.txt | sed -e 's/^[*]/ /' | awk '{print $6}' | sort | uniq -c
1
19 __GI___nanosleep
1 __GI___poll
1 __GI___sigtimedwait
8 futex_abstimed_wait_cancelable
31 futex_wait_cancelable
The Aborting thread appears to be number 47:
Thread 47 (LWP 1202077):
#0 0x00007f3a8b485ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x559b8bcd38f0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x559b8bcd38a0, cond=0x559b8bcd38c8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=cond@entry=0x559b8bcd38c8, mutex=0x559b8bcd38a0) at pthread_cond_wait.c:655
#3 0x00007f3a5da09845 in gu::Lock::wait (this=0x7f38e01acb00, cond=...) at ./galerautils/src/gu_lock.hpp:46
#4 galera::ReplicatorSMM::wait_for_CLOSED (this=this@entry=0x559b8bcd3760, lock=...) at ./galera/src/replicator_smm.cpp:273
#5 0x00007f3a5da098c5 in galera::ReplicatorSMM::close (this=0x559b8bcd3760) at ./galera/src/replicator_smm.cpp:364
#6 0x00007f3a5d9e681b in galera_disconnect (gh=<optimized out>) at ./galera/src/wsrep_provider.cpp:208
#7 0x0000559b89905e51 in wsrep::wsrep_provider_v26::disconnect (this=<optimized out>) at ./wsrep-lib/src/wsrep_provider_v26.cpp:759
#8 0x0000559b898f329b in wsrep::server_state::disconnect (this=0x559b8bc9e790) at ./wsrep-lib/src/server_state.cpp:540
#9 0x0000559b89011888 in unireg_abort (exit_code=exit_code@entry=1) at ./sql/mysqld.cc:1862
#10 0x0000559b8924d4d9 in wsrep_handle_mdl_conflict (requestor_ctx=requestor_ctx@entry=0x7f361c000d28, ticket=ticket@entry=0x7f3638020b80, key=key@entry=0x7f38f86299f8) at ./sql/wsrep_mysqld.cc:2324
#11 0x0000559b891c71f6 in MDL_lock::can_grant_lock (this=this@entry=0x7f38f86299f8, type_arg=<optimized out>, requestor_ctx=requestor_ctx@entry=0x7f361c000d28, ignore_lock_priority=ignore_lock_priority@entry=false) at ./sql/mdl.cc:1769
#12 0x0000559b891c7915 in MDL_context::try_acquire_lock_impl (this=this@entry=0x7f361c000d28, mdl_request=mdl_request@entry=0x7f38e01ad080, out_ticket=out_ticket@entry=0x7f38e01acf98) at ./sql/mdl.cc:2100
#13 0x0000559b891c7a4e in MDL_context::acquire_lock (this=this@entry=0x7f361c000d28, mdl_request=mdl_request@entry=0x7f38e01ad080, lock_wait_timeout=lock_wait_timeout@entry=86400) at ./sql/mdl.cc:2247
#14 0x0000559b891c85c2 in MDL_context::upgrade_shared_lock (this=this@entry=0x7f361c000d28, mdl_ticket=0x7f361c07de10, new_type=new_type@entry=MDL_EXCLUSIVE, lock_wait_timeout=86400) at ./sql/mdl.cc:2523
#15 0x0000559b8916c826 in mysql_inplace_alter_table (target_mdl_request=0x7f38e01ae300, alter_ctx=0x7f38e01af420, ha_alter_info=0x7f38e01ad3d0, altered_table=0x7f38e01ad470, table=0x7f362c0f19e8, table_list=0x7f361c00be40, thd=0x7f361c000c08) at ./sql/sql_table.cc:7730
#16 mysql_alter_table (thd=thd@entry=0x7f361c000c08, new_db=new_db@entry=0x7f361c005230, new_name=new_name@entry=0x7f361c005658, create_info=create_info@entry=0x7f38e01b0010, table_list=<optimized out>, table_list@entry=0x7f361c00be40, alter_info=alter_info@entry=0x7f38e01aff50, order_num=0, order=0x0, ignore=false) at ./sql/sql_table.cc:10248
#17 0x0000559b891c28ce in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x7f361c000c08) at ./sql/sql_alter.cc:520
#18 0x0000559b890d58c7 in mysql_execute_command (thd=thd@entry=0x7f361c000c08) at ./sql/sql_parse.cc:6189
#19 0x0000559b890db70a in mysql_parse (thd=0x7f361c000c08, rawbuf=<optimized out>, length=51, parser_state=parser_state@entry=0x7f38e01b2740, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at ./sql/sql_parse.cc:7992
#20 0x0000559b893e6f66 in Query_log_event::do_apply_event (this=0x7f361c0660d8, rgi=0x7f361c01c4b0, query_arg=0x7f361c0dfaca "ALTER TABLE `stc_16_cblist` ADD `field_column` text", q_len_arg=<optimized out>) at ./sql/log_event.cc:5734
#21 0x0000559b8925e3b8 in Log_event::apply_event (rgi=0x7f361c01c4b0, this=0x7f361c0660d8) at ./sql/log_event.h:1492
#22 wsrep_apply_events (thd=thd@entry=0x7f361c000c08, rli=<optimized out>, events_buf=<optimized out>, buf_len=0) at ./sql/wsrep_applier.cc:200
#23 0x0000559b89244ab9 in Wsrep_high_priority_service::apply_toi (this=0x7f38e01b3cf0, ws_meta=..., data=...) at ./sql/wsrep_high_priority_service.cc:382
#24 0x0000559b898f1eb3 in apply_toi (data=..., ws_meta=..., ws_handle=..., high_priority_service=..., provider=...) at ./wsrep-lib/src/server_state.cpp:461
#25 wsrep::server_state::on_apply (this=<optimized out>, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at ./wsrep-lib/src/server_state.cpp:1127
#26 0x0000559b899066d4 in wsrep::high_priority_service::apply (data=..., ws_meta=..., ws_handle=..., this=0x7f38e01b3cf0) at ./wsrep-lib/include/wsrep/high_priority_service.hpp:47
#27 (anonymous namespace)::apply_cb (ctx=0x7f38e01b3cf0, wsh=<optimized out>, flags=<optimized out>, buf=<optimized out>, meta=<optimized out>, exit_loop=0x7f38e01b300f) at ./wsrep-lib/src/wsrep_provider_v26.cpp:504
#28 0x00007f3a5d9fc265 in galera::TrxHandleSlave::apply (this=this@entry=0x7f36a0097820, recv_ctx=recv_ctx@entry=0x7f38e01b3cf0, apply_cb=0x559b899065a0 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., exit_loop=exit_loop@entry=@0x7f38e01b300f: false) at ./galera/src/trx_handle.cpp:391
#29 0x00007f3a5da0dc10 in galera::ReplicatorSMM::apply_trx (this=this@entry=0x559b8bcd3760, recv_ctx=recv_ctx@entry=0x7f38e01b3cf0, ts=...) at ./galera/src/replicator_smm.cpp:504
#30 0x00007f3a5da140f5 in galera::ReplicatorSMM::process_trx (this=0x559b8bcd3760, recv_ctx=0x7f38e01b3cf0, ts_ptr=...) at ./galera/src/replicator_smm.cpp:2127
#31 0x00007f3a5da466b5 in galera::GcsActionSource::process_writeset (this=0x559b8bce0c60, recv_ctx=0x7f38e01b3cf0, act=..., exit_loop=@0x7f38e01b38ff: false) at ./galera/src/gcs_action_source.cpp:62
#32 0x00007f3a5da472c2 in galera::GcsActionSource::dispatch (this=this@entry=0x559b8bce0c60, recv_ctx=recv_ctx@entry=0x7f38e01b3cf0, act=..., exit_loop=@0x7f38e01b38ff: false) at ./galera/src/gcs_action_source.cpp:110
#33 0x00007f3a5da47582 in galera::GcsActionSource::process (this=0x559b8bce0c60, recv_ctx=0x7f38e01b3cf0, exit_loop=@0x7f38e01b38ff: false) at ./galera/src/gcs_action_source.cpp:183
#34 0x00007f3a5da14b10 in galera::ReplicatorSMM::async_recv (this=0x559b8bcd3760, recv_ctx=0x7f38e01b3cf0) at ./galera/src/replicator_smm.cpp:390
#35 0x00007f3a5d9e69bb in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at ./galera/src/wsrep_provider.cpp:236
#36 0x0000559b8990731e in wsrep::wsrep_provider_v26::run_applier (this=<optimized out>, applier_ctx=<optimized out>) at ./wsrep-lib/src/wsrep_provider_v26.cpp:796
#37 0x0000559b8926051f in wsrep_replication_process (thd=0x7f361c000c08, arg=<optimized out>) at ./sql/wsrep_thd.cc:57
#38 0x0000559b8924f9d3 in start_wsrep_THD (arg=arg@entry=0x559b972686f0) at ./sql/wsrep_mysqld.cc:2892
#39 0x0000559b89824e8a in pfs_spawn_thread (arg=0x559b97268718) at ./storage/perfschema/pfs.cc:1869
#40 0x00007f3a8b47f6db in start_thread (arg=0x7f38e01b4700) at pthread_create.c:463
#41 0x00007f3a8a87d71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
I'd love to add more useful information, but I don't know where to go from here. I'll keep the core dump around for a while, in case jplindst or seppo or anyone else wants to know the value of something.
# objdump -x core |
grep -E '^ *[0-9]+ [.](reg|reg2|reg-xstate|note.linuxcore.siginfo)/' |
awk '{print $2}' | sed -e 's/\/[0-9].*//' | sort | uniq -c
60 .note.linuxcore.siginfo
60 .reg
60 .reg-xstate
60 .reg2