After a manual SST with mariabackup the IST fails with a assertion failure at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.17/storage/innobase/lock/lock0lock.cc line 687.
We tried the manual SST + IST twice and the WSREP Error before the assertion concerned both times the same table and index with the conflicting lock and the same insert (with different values).
Faced absolutely same issue. Nodes can't perform neither IST nor SST, regardless of how it was started. As a result, cluster members can't start as they can't sync after restart. Crash always refer same line:
2020-11-1701:10:560x7fb13c4c4640 InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb-10.4.17/work/mysql/storage/innobase/lock/lock0lock.cc line 687
2020-11-1819:18:050x7f0fb8fae640 InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb-10.4.17/work/mysql/storage/innobase/lock/lock0lock.cc line 687
Running mariadb 10.4.17 with galera-26.4.6 on Gentoo.
Eugene
added a comment - - edited Faced absolutely same issue. Nodes can't perform neither IST nor SST, regardless of how it was started. As a result, cluster members can't start as they can't sync after restart. Crash always refer same line:
2020 - 11 - 17 1 : 10 : 56 19 [ERROR] WSREP: Thread BF trx_id: 4264440786 thread: 35 seqno: 1143042227 client_state: exec client_mode: high priority transaction_mode: committing applier: 1 toi: 0 local: 0 query: NULL
2020 - 11 - 17 01 : 10 : 56 0x7fb13c4c4640 InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb- 10.4 . 17 /work/mysql/storage/innobase/lock/lock0lock.cc line 687
Another node:
2020 - 11 - 18 19 : 18 : 05 24 [ERROR] WSREP: Thread BF trx_id: 4021935550 thread: 43 seqno: 1155955449 client_state: exec client_mode: high priority transaction_mode: committing applier: 1 toi: 0 local: 0 query: NULL
2020 - 11 - 18 19 : 18 : 05 0x7f0fb8fae640 InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb- 10.4 . 17 /work/mysql/storage/innobase/lock/lock0lock.cc line 687
Running mariadb 10.4.17 with galera-26.4.6 on Gentoo.
We did some further testing and this error does not only happen during the IST. We stopped the Tool that creates the INSERT Statements which are causing the problem during backup and IST. The second node now can join. Then we started the tool again and after some time the second node crashed again with the same error.
2020-11-1821:40:0449 [ERROR] InnoDB: Conflicting lock on table: `web`.`locations` index: unique_locations that has lock
RECORD LOCKS space id 32369 page no 177144 n bits 824 index unique_locations of table `web`.`locations` trx id 34438968344 lock mode S
Record lock, heap no 380 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 4; hex 0000413f; asc A?;;
1: len 5; hex 99a12e5ca7; asc .\ ;;
2: len 4; hex 006fe223; asc o #;;
Record lock, heap no 752 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
0: len 4; hex 0000413d; asc A=;;
1: len 5; hex 99a7e4ea66; asc f;;
2: len 4; hex 06094458; asc DX;;
2020-11-1821:40:0449 [ERROR] InnoDB: WSREP state:
2020-11-1821:40:0449 [ERROR] WSREP: Thread BF trx_id: 34438968341 thread: 49 seqno: 85724971 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (4492,(SELECT current_km_offset FROM cars WHERE id=16701),'2020-11-18 00:20:47.0','2020-11-18 05:15:44.0','51.3959065','11.933976000000001',16701,'44657','2',1,40527,GREATEST(0,(SELECT ROUND(40527) - IFNULL(MAX(km),0) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 00:20:47.0' AND car_id=16701 ORDER BY timestamp_from DESC LIMIT 1) x)))£†µ_
2020-11-1821:40:0449 [ERROR] WSREP: Thread BF trx_id: 34438968344 thread: 33 seqno: 85724992 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (4492,(SELECT current_km_offset FROM cars WHERE id=16701),'2020-11-18 14:41:38.0',null,'51.39018214285715','11.945860714285715',16701,'44646','7',1,40530,GREATEST(0,(SELECT ROUND(40530) - IFNULL(MAX(km),0) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 14:41:38.0' AND car_id=16701 ORDER BY timestamp_from DESC LIMIT 1) x)))£†µ_
2020-11-1821:40:040x7fc069fb0700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.17/storage/innobase/lock/lock0lock.cc line 687
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
Some pointers may be invalid and cause the dump to abort.
Query (0x7fd10f9147db): INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (4492,(SELECT current_km_offset FROM cars WHERE id=16701),'2020-11-18 00:20:47.0','2020-11-18 05:15:44.0','51.3959065','11.933976000000001',16701,'44657','2',1,40527,GREATEST(0,(SELECT ROUND(40527) - IFNULL(MAX(km),0) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 00:20:47.0' AND car_id=16701 ORDER BY timestamp_from DESC LIMIT 1) x)))
Alexander Glaser
added a comment - - edited We did some further testing and this error does not only happen during the IST. We stopped the Tool that creates the INSERT Statements which are causing the problem during backup and IST. The second node now can join. Then we started the tool again and after some time the second node crashed again with the same error.
2020 - 11 - 18 21 : 40 : 04 49 [ERROR] InnoDB: Conflicting lock on table: `web`.`locations` index: unique_locations that has lock
RECORD LOCKS space id 32369 page no 177144 n bits 824 index unique_locations of table `web`.`locations` trx id 34438968344 lock mode S
Record lock, heap no 380 PHYSICAL RECORD: n_fields 3 ; compact format; info bits 0
0 : len 4 ; hex 0000413f; asc A?;;
1 : len 5 ; hex 99a12e5ca7; asc .\ ;;
2 : len 4 ; hex 006fe223; asc o #;;
Record lock, heap no 752 PHYSICAL RECORD: n_fields 3 ; compact format; info bits 32
0 : len 4 ; hex 0000413d; asc A=;;
1 : len 5 ; hex 99a7e4ea66; asc f;;
2 : len 4 ; hex 06094458 ; asc DX;;
2020 - 11 - 18 21 : 40 : 04 49 [ERROR] InnoDB: WSREP state:
2020 - 11 - 18 21 : 40 : 04 49 [ERROR] WSREP: Thread BF trx_id: 34438968341 thread: 49 seqno: 85724971 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES ( 4492 ,(SELECT current_km_offset FROM cars WHERE id= 16701 ), '2020-11-18 00:20:47.0' , '2020-11-18 05:15:44.0' , '51.3959065' , '11.933976000000001' , 16701 , '44657' , '2' , 1 , 40527 ,GREATEST( 0 ,(SELECT ROUND( 40527 ) - IFNULL(MAX(km), 0 ) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 00:20:47.0' AND car_id= 16701 ORDER BY timestamp_from DESC LIMIT 1 ) x)))£†µ_
2020 - 11 - 18 21 : 40 : 04 49 [ERROR] WSREP: Thread BF trx_id: 34438968344 thread: 33 seqno: 85724992 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES ( 4492 ,(SELECT current_km_offset FROM cars WHERE id= 16701 ), '2020-11-18 14:41:38.0' , null , '51.39018214285715' , '11.945860714285715' , 16701 , '44646' , '7' , 1 , 40530 ,GREATEST( 0 ,(SELECT ROUND( 40530 ) - IFNULL(MAX(km), 0 ) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 14:41:38.0' AND car_id= 16701 ORDER BY timestamp_from DESC LIMIT 1 ) x)))£†µ_
2020 - 11 - 18 21 : 40 : 04 0x7fc069fb0700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb- 10.4 . 17 /storage/innobase/lock/lock0lock.cc line 687
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https: //jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https: //mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
201118 21 : 40 : 04 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https: //mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.4 . 17 -MariaDB-log
key_buffer_size= 134217728
read_buffer_size= 131072
max_used_connections= 2
max_threads= 302
thread_count= 49
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 795659 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fbf300009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this , something went
terribly wrong...
stack_bottom = 0x7fc069fafd20 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+ 0x2e )[ 0x562997bf3ffe ]
/usr/sbin/mysqld(handle_fatal_signal+ 0x30f )[ 0x5629976859ef ]
/lib64/libpthread.so. 0 (+ 0xf6d0 )[ 0x7fd2537876d0 ]
: 0 (__GI_raise)[ 0x7fd251a5b277 ]
: 0 (__GI_abort)[ 0x7fd251a5c968 ]
/usr/sbin/mysqld(+ 0x5c2de8 )[ 0x56299737dde8 ]
/usr/sbin/mysqld(+ 0xaba93a )[ 0x56299787593a ]
/usr/sbin/mysqld(+ 0xabc144 )[ 0x562997877144 ]
/usr/sbin/mysqld(+ 0xac30fb )[ 0x56299787e0fb ]
/usr/sbin/mysqld(+ 0xbe5638 )[ 0x5629979a0638 ]
/usr/sbin/mysqld(+ 0xb2ffde )[ 0x5629978eafde ]
/usr/sbin/mysqld(+ 0xb30cd3 )[ 0x5629978ebcd3 ]
/usr/sbin/mysqld(+ 0xb311a3 )[ 0x5629978ec1a3 ]
/usr/sbin/mysqld(+ 0xb4370f )[ 0x5629978fe70f ]
/usr/sbin/mysqld(+ 0xa83235 )[ 0x56299783e235 ]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPKh+ 0x2d0 )[ 0x5629976912f0 ]
/usr/sbin/mysqld(_ZN14Rows_log_event9write_rowEP14rpl_group_infob+ 0x235 )[ 0x56299778d8c5 ]
/usr/sbin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEP14rpl_group_info+ 0x7d )[ 0x56299778dd2d ]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+ 0x2dc )[ 0x562997781cfc ]
/usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+ 0x23d )[ 0x5629975f7edd ]
/usr/sbin/mysqld(_ZN21Wsrep_applier_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+ 0x9d )[ 0x5629975e375d ]
/usr/sbin/mysqld(+ 0xecf3cf )[ 0x562997c8a3cf ]
/usr/sbin/mysqld(_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_metaERKNS_12const_bufferE+ 0xf5 )[ 0x562997c8aef5 ]
/usr/sbin/mysqld(+ 0xee6de8 )[ 0x562997ca1de8 ]
src/trx_handle.cpp: 387 (galera::TrxHandleSlave::apply( void *, wsrep_cb_status (*)( void *, wsrep_ws_handle const *, unsigned int , wsrep_buf const *, wsrep_trx_meta const *, bool*), wsrep_trx_meta const &, bool&))[ 0x7fd24dd6b560 ]
src/replicator_smm.cpp: 504 (galera::ReplicatorSMM::apply_trx( void *, galera::TrxHandleSlave&))[ 0x7fd24dda349a ]
src/replicator_smm.cpp: 2145 (galera::ReplicatorSMM::process_trx( void *, boost::shared_ptr<galera::TrxHandleSlave> const &))[ 0x7fd24dda8fd1 ]
src/gcs_action_source.cpp: 63 (galera::GcsActionSource::process_writeset( void *, gcs_action const &, bool&))[ 0x7fd24dd825f9 ]
src/gcs_action_source.cpp: 110 (galera::GcsActionSource::dispatch( void *, gcs_action const &, bool&))[ 0x7fd24dd82c3a ]
src/gcs_action_source.cpp: 29 (~Release)[ 0x7fd24dd82e6e ]
src/replicator_smm.cpp: 390 (galera::ReplicatorSMM::async_recv( void *))[ 0x7fd24dda94bb ]
src/wsrep_provider.cpp: 263 (galera_recv)[ 0x7fd24ddbbf38 ]
/usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+ 0xe )[ 0x562997ca2c8e ]
/usr/sbin/mysqld(+ 0x83e9f2 )[ 0x5629975f99f2 ]
/usr/sbin/mysqld(_Z15start_wsrep_THDPv+ 0x35d )[ 0x5629975edd3d ]
pthread_create.c: 0 (start_thread)[ 0x7fd25377fe25 ]
/lib64/libc.so. 6 (clone+ 0x6d )[ 0x7fd251b23bad ]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query ( 0x7fd10f9147db ): INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES ( 4492 ,(SELECT current_km_offset FROM cars WHERE id= 16701 ), '2020-11-18 00:20:47.0' , '2020-11-18 05:15:44.0' , '51.3959065' , '11.933976000000001' , 16701 , '44657' , '2' , 1 , 40527 ,GREATEST( 0 ,(SELECT ROUND( 40527 ) - IFNULL(MAX(km), 0 ) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 00:20:47.0' AND car_id= 16701 ORDER BY timestamp_from DESC LIMIT 1 ) x)))
Connection ID (thread ID): 49
Status: NOT_KILLED
Optimizer switch : index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
The manual page at https: //mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 514253 514253 processes
Max open files 300000 300000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 514253 514253 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h
We have now rewritten the INSERT Statement so that it does not use the destination table in the VALUES part. The node still crashes on when using this query. The definition for index "unique_locations" is UNIQUE INDEX `unique_locations` (`car_id`, `timestamp_from`) USING BTREE
2020-11-2013:56:5531 [ERROR] InnoDB: Conflicting lock on table: `web`.`locations` index: unique_locations that has lock
RECORD LOCKS space id 32369 page no 514311 n bits 952 index unique_locations of table `web`.`locations` trx id 34536873708 lock mode S
Record lock, heap no 80 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 4; hex 00017fca; asc ;;
1: len 5; hex 99a6f6598a; asc Y ;;
2: len 4; hex 0471a87f; asc q ;;
Record lock, heap no 560 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
0: len 4; hex 00017fc6; asc ;;
1: len 5; hex 99a7e85a49; asc ZI;;
2: len 4; hex 060c1142; asc B;;
2020-11-2013:56:5531 [ERROR] InnoDB: WSREP state:
2020-11-2013:56:5531 [ERROR] WSREP: Thread BF trx_id: 34536873707 thread: 31 seqno: 92889532 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (115597,(SELECT current_km_offset FROM cars WHERE id=98246),'2020-11-20 07:18:40.0',null,'51.399487','12.039409',98246,'44646','1',1,4595,2)n·_
2020-11-2013:56:550x7fcc9c357700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.17/storage/innobase/lock/lock0lock.cc line 687
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
Is there a way to prevent a complete loss of the cluster when something this bad happens? Every node will fail hard at the same time and then the originating node will be the only one online, but won't accept queries because it has no quorum. Bugs like these makes creating a cluster useless.
I know that things can go wrong all the time and I appreciate a free product like this, but how can I prevent such a situation from happening again? I can sustain loss of nodes for redundancy (even a single node can run my app, I've created a cluster for redundancy not for scaling). Do I need a read-only replica for hot-spare to became master? Could that fail too? How are you handling this situation?
Thanks everyone for the hard work
Gianni Angelozzi
added a comment - Is there a way to prevent a complete loss of the cluster when something this bad happens? Every node will fail hard at the same time and then the originating node will be the only one online, but won't accept queries because it has no quorum. Bugs like these makes creating a cluster useless.
I know that things can go wrong all the time and I appreciate a free product like this, but how can I prevent such a situation from happening again? I can sustain loss of nodes for redundancy (even a single node can run my app, I've created a cluster for redundancy not for scaling). Do I need a read-only replica for hot-spare to became master? Could that fail too? How are you handling this situation?
Thanks everyone for the hard work
2020-12-04 8:58:21 0 [Note] WSREP: IST preload starting at 11365559
2020-12-04 8:58:21 0 [Note] WSREP: Service thread queue flushed.
2020-12-04 8:58:21 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:11365558, protocol version: 5
2020-12-04 8:58:21 10 [ERROR] InnoDB: Conflicting lock on table: `app-scheduler`.`jobs` index: fk_schedule_id that has lock
RECORD LOCKS space id 3701 page no 1590 n bits 272 index fk_schedule_id of table `app-scheduler`.`jobs` trx id 24901901 lock_mode X locks rec but not gap
2020-12-04 8:58:21 10 [ERROR] WSREP: Thread BF trx_id: 24901901 thread: 6 seqno: 11365641 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: /* delete net.iponly.app_scheduler.entities.Job */ delete from jobs where id='11397042-f4c7-4243-8bfd-ad38c3cb8767'<A3><EB><C9>_^S^A
2020-12-04 08:58:21 0x7f93a80bf700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.8/storage/innobase/lock/lock0lock.cc line 655
We're using mariabackup for SST.
Christopher Svensson
added a comment - - edited We also experienced this after an upgrade from 10.5.5 to 10.5.8:
2020-12-04 8:58:21 0 [Note] WSREP: Receiving IST... 0.0% ( 0/160 events) complete.
2020-12-04 8:58:21 0 [Note] WSREP: IST preload starting at 11365559
2020-12-04 8:58:21 0 [Note] WSREP: Service thread queue flushed.
2020-12-04 8:58:21 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:11365558, protocol version: 5
2020-12-04 8:58:21 10 [ERROR] InnoDB: Conflicting lock on table: `app-scheduler`.`jobs` index: fk_schedule_id that has lock
RECORD LOCKS space id 3701 page no 1590 n bits 272 index fk_schedule_id of table `app-scheduler`.`jobs` trx id 24901901 lock_mode X locks rec but not gap
Record lock, heap no 121
2020-12-04 8:58:21 10 [ERROR] InnoDB: WSREP state:
2020-12-04 8:58:21 10 [ERROR] WSREP: Thread BF trx_id: 24901902 thread: 10 seqno: 11365642 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: /* delete net.iponly.app_scheduler.entities.Schedule */ delete from schedules where id='99576e11-b63b-44a8-a5b1-26cc583fe20d'<A3><EB><C9>_
^S^A
2020-12-04 8:58:21 10 [ERROR] WSREP: Thread BF trx_id: 24901901 thread: 6 seqno: 11365641 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: /* delete net.iponly.app_scheduler.entities.Job */ delete from jobs where id='11397042-f4c7-4243-8bfd-ad38c3cb8767'<A3><EB><C9>_^S^A
2020-12-04 08:58:21 0x7f93a80bf700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.8/storage/innobase/lock/lock0lock.cc line 655
We're using mariabackup for SST.
Same issue over here, currently downgrading to 10.3.25 (oldest available in the repos) from 10.3.27 at the moment to see if that resolves our issues.
Upgraded a few weeks ago from 10.3.18 which did not present any issues, but the secondary cluster node now crashes multiple times per week. We're running a master/master setup (with garb as witness)
Our cluster successfully does SST or IST, but just crashes at random when some locks conflict. This happened every now and then already in the past but never resulted in a crash of the cluster
Bart Smienk
added a comment - - edited Same issue over here, currently downgrading to 10.3.25 (oldest available in the repos) from 10.3.27 at the moment to see if that resolves our issues.
Upgraded a few weeks ago from 10.3.18 which did not present any issues, but the secondary cluster node now crashes multiple times per week. We're running a master/master setup (with garb as witness)
Our cluster successfully does SST or IST, but just crashes at random when some locks conflict. This happened every now and then already in the past but never resulted in a crash of the cluster
Submitted two pull requests to fix this issue and MDEV-23851, which appear to be duplicates. Reason for the crashes is recently added too strict assertion, which fires in safe execution. This is fixed by the pull request: https://github.com/MariaDB/server/pull/1723.
There is also a real bug resulting from a more rare scenario, where unique secondary key value is modified in certain way (often caused by REPLACE statement usage), resulting in secondary index gap locking and there happens simultaneous INSERT replicated from other node and inserting into the locked gap. This issue has a fix in pull request: https://github.com/MariaDB/server/pull/1724
Seppo Jaakola
added a comment - Submitted two pull requests to fix this issue and MDEV-23851 , which appear to be duplicates. Reason for the crashes is recently added too strict assertion, which fires in safe execution. This is fixed by the pull request: https://github.com/MariaDB/server/pull/1723 .
There is also a real bug resulting from a more rare scenario, where unique secondary key value is modified in certain way (often caused by REPLACE statement usage), resulting in secondary index gap locking and there happens simultaneous INSERT replicated from other node and inserting into the locked gap. This issue has a fix in pull request: https://github.com/MariaDB/server/pull/1724
Seppo Jaakola
added a comment - two pull requests submitted for this:
https://github.com/MariaDB/server/pull/1723
https://github.com/MariaDB/server/pull/1724
Thanks for your work Seppo, much appreciated.
I'll keep an eye on future releases to see if your fix has been included, so we can update the cluster again to the latest release.
Bart Smienk
added a comment - Thanks for your work Seppo, much appreciated.
I'll keep an eye on future releases to see if your fix has been included, so we can update the cluster again to the latest release.
Faced absolutely same issue. Nodes can't perform neither IST nor SST, regardless of how it was started. As a result, cluster members can't start as they can't sync after restart. Crash always refer same line:
Another node:
Running mariadb 10.4.17 with galera-26.4.6 on Gentoo.