[MDEV-25378] MariaDB 10.4.18 (galera) crash - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.4.18
Fix Version/s: 10.4.19
Component/s: Galera
Labels:
None
Environment:
Debian GNU/Linux 8.11 (jessie) x86_64

Description

Yesterday my entire cluster crashed and was eventually reinitialised from scratch. In the process we also updated MariaDB on some of the servers. This afternoon I had this crash on node 5, which then took out 4 other nodes, leaving only 1 (everything recovered luckily)

MariaDB 10.4.18
galera: 4.7(ree4f10fc)

The crash that started it seems to have been:
(some information obfuscated)
```
2021-04-09 14:37:31 1 [ERROR] InnoDB: Conflicting lock on table: ``.`table1` index: PRIMARY that has lock
RECORD LOCKS space id 267 page no 21525 n bits 120 index PRIMARY of table ``.`table1` trx id 196886519 lock mode S locks rec but not gap
Record lock, heap no 47 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
0: len 8; hex 000000000057d9c6; asc W ;;
1: len 6; hex 000000000000; asc ;;
2: len 7; hex 80000000000000; asc ;;
3: len 4; hex 8000166d; asc m;;
4: SQL NULL;
5: len 11; hex 424d5f5748415453415050; asc astring;;
6: len 13; hex 2b393730353637353535383033; asc astring;;
7: len 1; hex 80; asc ;;

2021-04-09 14:37:31 1 [ERROR] InnoDB: WSREP state:
2021-04-09 14:37:31 1 [ERROR] WSREP: Thread BF trx_id: 196886521 thread: 1 seqno: 99055 client_state: exec client_mode: high priority transaction_mode: replaying applier: 1 toi: 0 local: 0 query: UPDATE table1 SET `profileId` = number, `isBot` = 0 WHERE `serviceId` = number AND `channelId` = 'astring' AND `channelUserId` = 'astring'?Jp`
2021-04-09 14:37:31 1 [ERROR] WSREP: Thread BF trx_id: 196886519 thread: 34 seqno: 99054 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO table2 (`uuid`, `type`, `conversationId`, `timestamp`, `channelUserId`, `channelAccountId`) VALUES ('guid', 'string', 11731790, 1617971851864, 5546465, 10799)?Jp`
2021-04-09 14:37:31 0x7f53edbe1700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.18/storage/innobase/lock/lock0lock.cc line 706
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
210409 14:37:31 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.4.18-MariaDB-1:10.4.18+maria~bionic-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=129
max_threads=65537
thread_count=78
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 403037846 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f533005de18
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f53edbe0d28 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5605d336452e]
/usr/sbin/mysqld(handle_fatal_signal+0x515)[0x5605d2dda2d5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f59e4c648a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f59e3598f47]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f59e359a8b1]
/usr/sbin/mysqld(+0x5889fd)[0x5605d2ae09fd]
/usr/sbin/mysqld(+0xa71d49)[0x5605d2fc9d49]
/usr/sbin/mysqld(+0xa72ecd)[0x5605d2fcaecd]
/usr/sbin/mysqld(+0xa782d1)[0x5605d2fd02d1]
/usr/sbin/mysqld(+0xb10c37)[0x5605d3068c37]
/usr/sbin/mysqld(+0xb16d74)[0x5605d306ed74]
/usr/sbin/mysqld(+0xa353b2)[0x5605d2f8d3b2]
/usr/sbin/mysqld(ZN7handler10ha_rnd_posEPhS0+0x132)[0x5605d2ddf562]
/usr/sbin/mysqld(+0x7a47c3)[0x5605d2cfc7c3]
/usr/sbin/mysqld(_ZN14Rows_log_event8find_rowEP14rpl_group_info+0x3a0)[0x5605d2edc4e0]
/usr/sbin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0xe0)[0x5605d2edcb90]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x237)[0x5605d2ecfcb7]
/usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x1d8)[0x5605d2d4d488]
/usr/sbin/mysqld(_ZN22Wsrep_replayer_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0xf4)[0x5605d2d344a4]
/usr/sbin/mysqld(+0xe83432)[0x5605d33db432]
/usr/sbin/mysqld(_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_metaERKNS_12const_bufferE+0x155)[0x5605d33dc245]
/usr/sbin/mysqld(+0xe98af4)[0x5605d33f0af4]
/usr/lib/galera/libgalera_smm.so(+0x30fb5)[0x7f59e157cfb5]
/usr/lib/galera/libgalera_smm.so(+0x4845c)[0x7f59e159445c]
/usr/lib/galera/libgalera_smm.so(+0x1fd75)[0x7f59e156bd75]
/usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v266replayERKNS_9ws_handleEPNS_21high_priority_serviceE+0x29)[0x5605d33f1459]
/usr/sbin/mysqld(_ZN20Wsrep_client_service6replayEv+0xf4)[0x5605d2d340d4]
/usr/sbin/mysqld(_ZN5wsrep11transaction6replayERSt11unique_lockINS_5mutexEE+0x96)[0x5605d33ea996]
/usr/sbin/mysqld(_ZN5wsrep11transaction15after_statementEv+0x1b4)[0x5605d33ec5a4]
/usr/sbin/mysqld(_ZN5wsrep12client_state15after_statementEv+0xb9)[0x5605d33cf229]
/usr/sbin/mysqld(+0x6757ad)[0x5605d2bcd7ad]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x22d2)[0x5605d2bd01e2]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x11c)[0x5605d2bd0aec]
/usr/sbin/mysqld(_Z11tp_callbackP13TP_connection+0xef)[0x5605d2dab70f]
/usr/sbin/mysqld(+0xa0b3e0)[0x5605d2f633e0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f59e4c596db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f59e367ba3f]
```

other nodes then tried to recover, did quorum checks and start state transfer routines, which caused 4 of them to crash with this stack trace
```
=================================================
2021-04-09 14:37:44 17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
210409 14:37:44 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.4.18-MariaDB-1:10.4.18+maria~bionic-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=91
max_threads=65537
thread_count=83
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 403037846 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f74ec000c08
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f75741c9e08 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5632ddda152e]
/usr/sbin/mysqld(handle_fatal_signal+0x515)[0x5632dd8172d5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f7b5ac268a0]
/usr/sbin/mysqld(+0x60843d)[0x5632dd59d43d]
/usr/sbin/mysqld(_Z11lock_tablesP3THDP10TABLE_LISTjj+0x4a0)[0x5632dd5a3460]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDRK14DDL_options_stP10TABLE_LISTbjP19Prelocking_strategy+0x8a)[0x5632dd5a4eea]
/usr/sbin/mysqld(_Z24open_n_lock_single_tableP3THDP10TABLE_LIST13thr_lock_typejP19Prelocking_strategy+0x41)[0x5632dd5a4f91]
/usr/sbin/mysqld(+0x7f78a7)[0x5632dd78c8a7]
/usr/sbin/mysqld(+0x7f8324)[0x5632dd78d324]
/usr/sbin/mysqld(_ZNK12Wsrep_schema12restore_viewEP3THDRKN5wsrep2idE+0x24b)[0x5632dd79039b]
/usr/sbin/mysqld(_ZN20Wsrep_server_service8log_viewEPN5wsrep21high_priority_serviceERKNS0_4viewE+0x700)[0x5632dd999dc0]
/usr/sbin/mysqld(_ZN5wsrep12server_state7on_viewERKNS_4viewEPNS_21high_priority_serviceE+0x3c5)[0x5632dde1eef5]
/usr/sbin/mysqld(+0xe9bc1c)[0x5632dde30c1c]
/usr/lib/galera/libgalera_smm.so(+0x3ee89)[0x7f7b5754ce89]
/usr/lib/galera/libgalera_smm.so(+0x3fc98)[0x7f7b5754dc98]
/usr/lib/galera/libgalera_smm.so(+0x4b188)[0x7f7b57559188]
/usr/lib/galera/libgalera_smm.so(+0x4b3be)[0x7f7b575593be]
/usr/lib/galera/libgalera_smm.so(+0x7c848)[0x7f7b5758a848]
/usr/lib/galera/libgalera_smm.so(+0x7caa2)[0x7f7b5758aaa2]
/usr/lib/galera/libgalera_smm.so(+0x49890)[0x7f7b57557890]
/usr/lib/galera/libgalera_smm.so(+0x1c2ab)[0x7f7b5752a2ab]
/usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0xe)[0x5632dde2e1de]
/usr/sbin/mysqld(+0x7f758f)[0x5632dd78c58f]
/usr/sbin/mysqld(_Z15start_wsrep_THDPv+0x413)[0x5632dd77bf33]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f7b5ac1b6db]
2021-04-09 14:37:46 0 [Note] WSREP: (1cc929aa-b399, 'tcp://0.0.0.0:4567') reconnecting to dd6e91d4-a06b (tcp://10.30.126.13:4567), attempt 0
2021-04-09 14:37:46 0 [Note] WSREP: (1cc929aa-b399, 'tcp://0.0.0.0:4567') reconnecting to cc37987d-b641 (tcp://10.35.126.15:4567), attempt 0
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f7b5963da3f]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 17
Status: NOT_KILLED
```

This all looks very not normal... I'm not sure how to analyse this.
The only thing that I could find that looks a little bit similar is ~~MDEV-23851~~

Any help with figuring out what this stack trace means would be appreciated

Attachments

Issue Links

relates to

MDEV-25579 MariaDB 10.4.18 crashes with InnoDB: Conflicting lock on table

Closed

MDEV-24923 Port selected Galera conflict resolution changes from 10.6

Closed

MariaDB 10.4.18 (galera) crash

Details

Description

Attachments

Issue Links

Activity

People

Dates