[MDEV-24229] During Galera IST: Assertion failure in file lock0lock.cc at line 687 Created: 2020-11-17  Updated: 2021-04-26  Resolved: 2020-12-28

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.17
Fix Version/s: N/A

Type: Bug Priority: Blocker
Reporter: Alexander Glaser Assignee: Jan Lindström (Inactive)
Resolution: Duplicate Votes: 14
Labels: None
Environment:

Attachments: Text File mariadb10.3.27-crashlog.txt     Text File server_error.log    
Issue Links:
Duplicate
is duplicated by MDEV-24406 Galera crashes when deleting records Closed
is duplicated by MDEV-24437 Galera 4 read node crashes after DML ... Closed
PartOf
includes MDEV-23851 Galera assertion at lock0lock.cc line... Closed
Relates
relates to MDEV-23851 Galera assertion at lock0lock.cc line... Closed
relates to MDEV-24989 Galera assertion at lock0lock.cc line... Closed

 Description   

After a manual SST with mariabackup the IST fails with a assertion failure at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.17/storage/innobase/lock/lock0lock.cc line 687.

We tried the manual SST + IST twice and the WSREP Error before the assertion concerned both times the same table and index with the conflicting lock and the same insert (with different values).



 Comments   
Comment by Eugene [ 2020-11-18 ]

Faced absolutely same issue. Nodes can't perform neither IST nor SST, regardless of how it was started. As a result, cluster members can't start as they can't sync after restart. Crash always refer same line:

2020-11-17  1:10:56 19 [ERROR] WSREP: Thread BF trx_id: 4264440786 thread: 35 seqno: 1143042227 client_state: exec client_mode: high priority transaction_mode: committing applier: 1 toi: 0 local: 0 query: NULL
2020-11-17 01:10:56 0x7fb13c4c4640  InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb-10.4.17/work/mysql/storage/innobase/lock/lock0lock.cc line 687

Another node:

2020-11-18 19:18:05 24 [ERROR] WSREP: Thread BF trx_id: 4021935550 thread: 43 seqno: 1155955449 client_state: exec client_mode: high priority transaction_mode: committing applier: 1 toi: 0 local: 0 query: NULL
2020-11-18 19:18:05 0x7f0fb8fae640  InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb-10.4.17/work/mysql/storage/innobase/lock/lock0lock.cc line 687

Running mariadb 10.4.17 with galera-26.4.6 on Gentoo.

Comment by Alexander Glaser [ 2020-11-19 ]

We did some further testing and this error does not only happen during the IST. We stopped the Tool that creates the INSERT Statements which are causing the problem during backup and IST. The second node now can join. Then we started the tool again and after some time the second node crashed again with the same error.

2020-11-18 21:40:04 49 [ERROR] InnoDB: Conflicting lock on table: `web`.`locations` index: unique_locations that has lock 
RECORD LOCKS space id 32369 page no 177144 n bits 824 index unique_locations of table `web`.`locations` trx id 34438968344 lock mode S
Record lock, heap no 380 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 0000413f; asc   A?;;
 1: len 5; hex 99a12e5ca7; asc   .\ ;;
 2: len 4; hex 006fe223; asc  o #;;
 
Record lock, heap no 752 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 0000413d; asc   A=;;
 1: len 5; hex 99a7e4ea66; asc     f;;
 2: len 4; hex 06094458; asc   DX;;
 
2020-11-18 21:40:04 49 [ERROR] InnoDB: WSREP state: 
2020-11-18 21:40:04 49 [ERROR] WSREP: Thread BF trx_id: 34438968341 thread: 49 seqno: 85724971 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (4492,(SELECT current_km_offset FROM cars WHERE id=16701),'2020-11-18 00:20:47.0','2020-11-18 05:15:44.0','51.3959065','11.933976000000001',16701,'44657','2',1,40527,GREATEST(0,(SELECT ROUND(40527) - IFNULL(MAX(km),0) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 00:20:47.0' AND car_id=16701 ORDER BY timestamp_from DESC LIMIT 1) x)))£†µ_
2020-11-18 21:40:04 49 [ERROR] WSREP: Thread BF trx_id: 34438968344 thread: 33 seqno: 85724992 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (4492,(SELECT current_km_offset FROM cars WHERE id=16701),'2020-11-18 14:41:38.0',null,'51.39018214285715','11.945860714285715',16701,'44646','7',1,40530,GREATEST(0,(SELECT ROUND(40530) - IFNULL(MAX(km),0) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 14:41:38.0' AND car_id=16701 ORDER BY timestamp_from DESC LIMIT 1) x)))£†µ_
2020-11-18 21:40:04 0x7fc069fb0700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.17/storage/innobase/lock/lock0lock.cc line 687
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
201118 21:40:04 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.4.17-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=2
max_threads=302
thread_count=49
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 795659 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fbf300009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fc069fafd20 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x562997bf3ffe]
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x5629976859ef]
/lib64/libpthread.so.0(+0xf6d0)[0x7fd2537876d0]
:0(__GI_raise)[0x7fd251a5b277]
:0(__GI_abort)[0x7fd251a5c968]
/usr/sbin/mysqld(+0x5c2de8)[0x56299737dde8]
/usr/sbin/mysqld(+0xaba93a)[0x56299787593a]
/usr/sbin/mysqld(+0xabc144)[0x562997877144]
/usr/sbin/mysqld(+0xac30fb)[0x56299787e0fb]
/usr/sbin/mysqld(+0xbe5638)[0x5629979a0638]
/usr/sbin/mysqld(+0xb2ffde)[0x5629978eafde]
/usr/sbin/mysqld(+0xb30cd3)[0x5629978ebcd3]
/usr/sbin/mysqld(+0xb311a3)[0x5629978ec1a3]
/usr/sbin/mysqld(+0xb4370f)[0x5629978fe70f]
/usr/sbin/mysqld(+0xa83235)[0x56299783e235]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPKh+0x2d0)[0x5629976912f0]
/usr/sbin/mysqld(_ZN14Rows_log_event9write_rowEP14rpl_group_infob+0x235)[0x56299778d8c5]
/usr/sbin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEP14rpl_group_info+0x7d)[0x56299778dd2d]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x2dc)[0x562997781cfc]
/usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x23d)[0x5629975f7edd]
/usr/sbin/mysqld(_ZN21Wsrep_applier_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0x9d)[0x5629975e375d]
/usr/sbin/mysqld(+0xecf3cf)[0x562997c8a3cf]
/usr/sbin/mysqld(_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_metaERKNS_12const_bufferE+0xf5)[0x562997c8aef5]
/usr/sbin/mysqld(+0xee6de8)[0x562997ca1de8]
src/trx_handle.cpp:387(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*), wsrep_trx_meta const&, bool&))[0x7fd24dd6b560]
src/replicator_smm.cpp:504(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandleSlave&))[0x7fd24dda349a]
src/replicator_smm.cpp:2145(galera::ReplicatorSMM::process_trx(void*, boost::shared_ptr<galera::TrxHandleSlave> const&))[0x7fd24dda8fd1]
src/gcs_action_source.cpp:63(galera::GcsActionSource::process_writeset(void*, gcs_action const&, bool&))[0x7fd24dd825f9]
src/gcs_action_source.cpp:110(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&))[0x7fd24dd82c3a]
src/gcs_action_source.cpp:29(~Release)[0x7fd24dd82e6e]
src/replicator_smm.cpp:390(galera::ReplicatorSMM::async_recv(void*))[0x7fd24dda94bb]
src/wsrep_provider.cpp:263(galera_recv)[0x7fd24ddbbf38]
/usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0xe)[0x562997ca2c8e]
/usr/sbin/mysqld(+0x83e9f2)[0x5629975f99f2]
/usr/sbin/mysqld(_Z15start_wsrep_THDPv+0x35d)[0x5629975edd3d]
pthread_create.c:0(start_thread)[0x7fd25377fe25]
/lib64/libc.so.6(clone+0x6d)[0x7fd251b23bad]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fd10f9147db): INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (4492,(SELECT current_km_offset FROM cars WHERE id=16701),'2020-11-18 00:20:47.0','2020-11-18 05:15:44.0','51.3959065','11.933976000000001',16701,'44657','2',1,40527,GREATEST(0,(SELECT ROUND(40527) - IFNULL(MAX(km),0) FROM (SELECT km FROM locations WHERE timestamp_from < '2020-11-18 00:20:47.0' AND car_id=16701 ORDER BY timestamp_from DESC LIMIT 1) x)))
 
Connection ID (thread ID): 49
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             514253               514253               processes 
Max open files            300000               300000               files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       514253               514253               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h

Comment by Alexander Glaser [ 2020-11-20 ]

We have now rewritten the INSERT Statement so that it does not use the destination table in the VALUES part. The node still crashes on when using this query. The definition for index "unique_locations" is UNIQUE INDEX `unique_locations` (`car_id`, `timestamp_from`) USING BTREE

2020-11-20 13:56:55 31 [ERROR] InnoDB: Conflicting lock on table: `web`.`locations` index: unique_locations that has lock 
RECORD LOCKS space id 32369 page no 514311 n bits 952 index unique_locations of table `web`.`locations` trx id 34536873708 lock mode S
Record lock, heap no 80 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 00017fca; asc     ;;
 1: len 5; hex 99a6f6598a; asc    Y ;;
 2: len 4; hex 0471a87f; asc  q  ;;
 
Record lock, heap no 560 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 00017fc6; asc     ;;
 1: len 5; hex 99a7e85a49; asc    ZI;;
 2: len 4; hex 060c1142; asc    B;;
 
2020-11-20 13:56:55 31 [ERROR] InnoDB: WSREP state: 
2020-11-20 13:56:55 31 [ERROR] WSREP: Thread BF trx_id: 34536873707 thread: 31 seqno: 92889532 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO locations ( pointer_id, km_offset, timestamp_from, timestamp_to, lat, lon, car_id, uicstation_id, position_count, utilized, km, delta_km) VALUES (115597,(SELECT current_km_offset FROM cars WHERE id=98246),'2020-11-20 07:18:40.0',null,'51.399487','12.039409',98246,'44646','1',1,4595,2)n·_
2020-11-20 13:56:55 31 [ERROR] WSREP: Thread BF trx_id: 34536873708 thread: 12 seqno: 92889530 client_state: exec client_mode: high priority transaction_mode: committing applier: 1 toi: 0 local: 0 query: NULL
2020-11-20 13:56:55 0x7fcc9c357700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.17/storage/innobase/lock/lock0lock.cc line 687
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
201120 13:56:55 [ERROR] mysqld got signal 6 ;

Comment by Gianni Angelozzi [ 2020-11-23 ]

Is there a way to prevent a complete loss of the cluster when something this bad happens? Every node will fail hard at the same time and then the originating node will be the only one online, but won't accept queries because it has no quorum. Bugs like these makes creating a cluster useless.

I know that things can go wrong all the time and I appreciate a free product like this, but how can I prevent such a situation from happening again? I can sustain loss of nodes for redundancy (even a single node can run my app, I've created a cluster for redundancy not for scaling). Do I need a read-only replica for hot-spare to became master? Could that fail too? How are you handling this situation?

Thanks everyone for the hard work

Comment by Ronald Kurz [ 2020-11-24 ]

We have a similar error with the same version on galera cluster:

Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] InnoDB: *** Priority TRANSACTION:
Nov 22 23:01:25 server01.domain mysqld[59821]: TRANSACTION 69096951, ACTIVE 0 sec starting index read
Nov 22 23:01:25 server01.domain mysqld[59821]: mysql tables in use 2, locked 2
Nov 22 23:01:25 server01.domain mysqld[59821]: 1 lock struct(s), heap size 1128, 0 row lock(s)
Nov 22 23:01:25 server01.domain mysqld[59821]: MySQL thread id 12, OS thread handle 140242091112192, query id 7942930 Update_rows_log_event::find_row(3
Nov 22 23:01:25 server01.domain mysqld[59821]: INSERT INTO tab01 ( pk_userid, fk_orderinfoid, field01, homedir, clientip, logincount ) VALUES ( 'u
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] InnoDB: *** Victim TRANSACTION:
Nov 22 23:01:25 server01.domain mysqld[59821]: TRANSACTION 69096950, ACTIVE 0 sec
Nov 22 23:01:25 server01.domain mysqld[59821]: mysql tables in use 2, locked 2
Nov 22 23:01:25 server01.domain mysqld[59821]: 6 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1
Nov 22 23:01:25 server01.domain mysqld[59821]: MySQL thread id 54640, OS thread handle 140210684311296, query id 7942928 appserver01.domain
Nov 22 23:01:25 server01.domain mysqld[59821]: INSERT IGNORE INTO tab02 ( fk_userid, fk_netareaid ) VALUES ( 'value01', ( SELECT pk_ne
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] InnoDB: *** WAITING FOR THIS LOCK TO BE GRANTED:
Nov 22 23:01:25 server01.domain mysqld[59821]: RECORD LOCKS space id 257 page no 12 n bits 136 index PRIMARY of table `db01`.`tab01` trx id 690969
Nov 22 23:01:25 server01.domain mysqld[59821]: Record lock, heap no 64 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
Nov 22 23:01:25 server01.domain mysqld[59821]: 0: len 10; hex 75746131383035303033; asc value01;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 1: len 6; hex 000000000000; asc       ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 2: len 7; hex 80000000000000; asc        ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 3: len 4; hex 0052f5be; asc  R  ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 4: len 30; hex 243624726f756e64733d35303030244b595931736d623345734767734d56; asc $6$rounds=5000$KYY1smb3
Nov 22 23:01:25 server01.domain mysqld[59821]: 5: len 30; hex 2f646174612f72656c65617365642f3230313132322d3233303032362d75; asc /data/released/201122-2
Nov 22 23:01:25 server01.domain mysqld[59821]: 6: SQL NULL;
Nov 22 23:01:25 server01.domain mysqld[59821]: 7: len 3; hex 000000; asc    ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 8: len 4; hex 00014370; asc   Cp;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 9: len 2; hex 0000; asc   ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 10: len 4; hex 00014370; asc   Cp;;
Nov 22 23:01:25 server01.domain mysqld[59821]: [788B blob data]
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] InnoDB:  SQL2: INSERT IGNORE INTO tab02 ( fk_userid, fk_neta
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] WSREP: cluster conflict due to high priority abort for threads:
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] WSREP: Winning thread:
Nov 22 23:01:25 server01.domain mysqld[59821]: THD: 12, mode: high priority, state: exec, conflict: executing, seqno: 34231195
Nov 22 23:01:25 server01.domain mysqld[59821]: [748B blob data]
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] WSREP: Victim thread:
Nov 22 23:01:25 server01.domain mysqld[59821]: THD: 54640, mode: local, state: exec, conflict: certifying, seqno: -1
Nov 22 23:01:25 server01.domain mysqld[59821]: SQL: INSERT IGNORE INTO tab02 ( fk_userid, fk_netareaid ) VALUES ( 'value01', ( SELECT
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Note] WSREP: context: /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.17/storage/innobase/lock/lock0lock.cc line 687
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 12 [Warning] Aborted connection 54640 to db: db01
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 1 [ERROR] InnoDB: Conflicting lock on table: `db01`.`tab01` index: PRIMARY that
Nov 22 23:01:25 server01.domain mysqld[59821]: RECORD LOCKS space id 257 page no 12 n bits 136 index PRIMARY of table `db01`.`tab01` trx id 690969
Nov 22 23:01:25 server01.domain mysqld[59821]: Record lock, heap no 64 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
Nov 22 23:01:25 server01.domain mysqld[59821]: 0: len 10; hex 75746131383035303033; asc value01;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 1: len 6; hex 000000000000; asc       ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 2: len 7; hex 80000000000000; asc        ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 3: len 4; hex 0052f5be; asc  R  ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 4: len 30; hex 243624726f756e64733d35303030244b595931736d623345734767734d56; asc $6$rounds=5000$KYY1smb3
Nov 22 23:01:25 server01.domain mysqld[59821]: 5: len 30; hex 2f646174612f72656c65617365642f3230313132322d3233303032362d75; asc /data/released/201122-2
Nov 22 23:01:25 server01.domain mysqld[59821]: 6: SQL NULL;
Nov 22 23:01:25 server01.domain mysqld[59821]: 7: len 3; hex 000000; asc    ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 8: len 4; hex 00014370; asc   Cp;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 9: len 2; hex 0000; asc   ;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 10: len 4; hex 00014370; asc   Cp;;
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 1 [ERROR] InnoDB: WSREP state:
Nov 22 23:01:25 server01.domain mysqld[59821]: [362B blob data]
Nov 22 23:01:25 server01.domain mysqld[59821]: [942B blob data]
Nov 22 23:01:25 server01.domain mysqld[59821]: 2020-11-22 23:01:25 0x7f8558083700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_fo
Nov 22 23:01:25 server01.domain mysqld[59821]: InnoDB: We intentionally generate a memory trap.
Nov 22 23:01:25 server01.domain mysqld[59821]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Nov 22 23:01:25 server01.domain mysqld[59821]: InnoDB: If you get repeated assertion failures or crashes, even
Nov 22 23:01:25 server01.domain mysqld[59821]: InnoDB: immediately after the mysqld startup, there may be
Nov 22 23:01:25 server01.domain mysqld[59821]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Nov 22 23:01:25 server01.domain mysqld[59821]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
Nov 22 23:01:25 server01.domain mysqld[59821]: InnoDB: about forcing recovery.
Nov 22 23:01:25 server01.domain mysqld[59821]: 201122 23:01:25 [ERROR] mysqld got signal 6 ;
Nov 22 23:01:25 server01.domain mysqld[59821]: This could be because you hit a bug. It is also possible that this binary
Nov 22 23:01:25 server01.domain mysqld[59821]: or one of the libraries it was linked against is corrupt, improperly built,
Nov 22 23:01:25 server01.domain mysqld[59821]: or misconfigured. This error can also be caused by malfunctioning hardware.
Nov 22 23:01:25 server01.domain mysqld[59821]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Nov 22 23:01:25 server01.domain mysqld[59821]: We will try our best to scrape up some info that will hopefully help
Nov 22 23:01:25 server01.domain mysqld[59821]: diagnose the problem, but since we have already crashed,
Nov 22 23:01:25 server01.domain mysqld[59821]: something is definitely wrong and this may fail.
Nov 22 23:01:25 server01.domain mysqld[59821]: Server version: 10.4.17-MariaDB-log
Nov 22 23:01:25 server01.domain mysqld[59821]: key_buffer_size=134217728
Nov 22 23:01:25 server01.domain mysqld[59821]: read_buffer_size=262144
Nov 22 23:01:25 server01.domain mysqld[59821]: max_used_connections=20
Nov 22 23:01:25 server01.domain mysqld[59821]: max_threads=153
Nov 22 23:01:25 server01.domain mysqld[59821]: thread_count=37
Nov 22 23:01:25 server01.domain mysqld[59821]: It is possible that mysqld could use up to
Nov 22 23:01:25 server01.domain mysqld[59821]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 487338 K  bytes of memory
Nov 22 23:01:25 server01.domain mysqld[59821]: Hope that's ok; if not, decrease some variables in the equation.
Nov 22 23:01:25 server01.domain mysqld[59821]: Thread pointer: 0x7f8c80056148
Nov 22 23:01:25 server01.domain mysqld[59821]: Attempting backtrace. You can use the following information to find out
Nov 22 23:01:25 server01.domain mysqld[59821]: where mysqld died. If you see no messages after this, something went
Nov 22 23:01:25 server01.domain mysqld[59821]: terribly wrong...
Nov 22 23:01:25 server01.domain mysqld[59821]: stack_bottom = 0x7f8558082cf0 thread_stack 0x49000
Nov 22 23:01:26 server01.domain mysqld[59821]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55fb71cefffe]
Nov 22 23:01:26 server01.domain mysqld[59821]: /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55fb717819ef]
Nov 22 23:01:26 server01.domain mysqld[59821]: /lib64/libpthread.so.0(+0xf630)[0x7f8cd8875630]
Nov 22 23:01:27 server01.domain mysqld[59821]: :0(__GI_raise)[0x7f8cd6b45387]
Nov 22 23:01:27 server01.domain mysqld[59821]: :0(__GI_abort)[0x7f8cd6b46a78]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0x5c2de8)[0x55fb71479de8]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xaba93a)[0x55fb7197193a]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xabc144)[0x55fb71973144]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xac3a30)[0x55fb7197aa30]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xb2b196)[0x55fb719e2196]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xb2cf93)[0x55fb719e3f93]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xb30da1)[0x55fb719e7da1]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xb311a3)[0x55fb719e81a3]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xb4370f)[0x55fb719fa70f]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xa83235)[0x55fb7193a235]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN7handler12ha_write_rowEPKh+0x2d0)[0x55fb7178d2f0]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN14Rows_log_event9write_rowEP14rpl_group_infob+0x235)[0x55fb718898c5]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEP14rpl_group_info+0x7d)[0x55fb71889d2d]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x2dc)[0x55fb7187dcfc]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x23d)[0x55fb716f3edd]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN22Wsrep_replayer_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xecf3cf)[0x55fb71d863cf]
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_m
Nov 22 23:01:27 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0xee6de8)[0x55fb71d9dde8]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/lib64/galera-4/libgalera_smm.so(+0x167560)[0x7f8cd34f4560]
Nov 22 23:01:28 server01.domain mysqld[59821]: src/trx_handle.cpp:387(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle c
Nov 22 23:01:28 server01.domain mysqld[59821]: src/trx_handle.hpp:826(galera::TrxHandleMaster::lock())[0x7f8cd3545c3b]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v266replayERKNS_9ws_handleEPNS_21high_priority_serviceE+0x29)
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN20Wsrep_client_service6replayEv+0xf7)[0x55fb716df097]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN5wsrep11transaction6replayERNS_11unique_lockINS_5mutexEEE+0x58)[0x55fb71d98f88]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN5wsrep11transaction15after_statementEv+0x1d4)[0x55fb71d9a4d4]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_ZN5wsrep12client_state15after_statementEv+0x4d)[0x55fb71d757bd]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(+0x5adea0)[0x55fb71464ea0]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x2908)[0x55fb7157d3e8]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_Z10do_commandP3THD+0x109)[0x55fb7157d9a9]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x221)[0x55fb7165dae1]
Nov 22 23:01:28 server01.domain mysqld[59821]: /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55fb7165db9d]
Nov 22 23:01:28 server01.domain mysqld[59821]: pthread_create.c:0(start_thread)[0x7f8cd886dea5]
Nov 22 23:01:29 server01.domain mysqld[59821]: /lib64/libc.so.6(clone+0x6d)[0x7f8cd6c0d96d]
Nov 22 23:01:29 server01.domain mysqld[59821]: Trying to get some variables.
Nov 22 23:01:29 server01.domain mysqld[59821]: Some pointers may be invalid and cause the dump to abort.
Nov 22 23:01:29 server01.domain mysqld[59821]: Query (0x7f8cca895ff3): INSERT IGNORE INTO tab02 ( fk_userid, fk_netareaid ) VALUES ( 'db01
Nov 22 23:01:29 server01.domain mysqld[59821]: Connection ID (thread ID): 1
Nov 22 23:01:30 server01.domain systemd[1]: mariadb.service: main process exited, code=killed, status=6/ABRT
Nov 22 23:01:30 server01.domain systemd[1]: Unit mariadb.service entered failed state.
Nov 22 23:01:30 server01.domain systemd[1]: mariadb.service failed.
Nov 22 23:01:35 server01.domain systemd[1]: mariadb.service holdoff time over, scheduling restart.
Nov 22 23:01:35 server01.domain systemd[1]: Stopped MariaDB 10.4.17 database server.
Nov 22 23:01:35 server01.domain systemd[1]: Starting MariaDB 10.4.17 database server...
Nov 22 23:01:41 server01.domain sh[70368]: WSREP: Recovered position d62b1247-40e7-11e9-a393-137f199c8420:34231194
...
Nov 22 23:01:47 server01.domain mysqld[70635]: 2020-11-22 23:01:47 1 [Note] WSREP: Synchronized with group, ready for connections

Comment by MikaH [ 2020-11-30 ]

The s**t hit the fan also with 10.3.26 after we upgraded from 10.3.23:

2020-11-30  7:23:59 1 [Note] WSREP: Receiving IST: 55559 writesets, seqnos 338735304-338790863
2020-11-30  7:23:59 0 [Note] WSREP: Receiving IST...  0.0% (    0/55559 events) complete.
2020-11-30  7:23:59 0 [Note] Reading of all Master_info entries succeeded
2020-11-30  7:23:59 0 [Note] Added new Master_info '' to hash table
2020-11-30  7:23:59 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.3.26-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
2020-11-30  7:24:02 89 [ERROR] InnoDB: Conflicting lock on table: `webdb`.`ordered_contracts_orderrequest` index: PRIMARY that has lock
RECORD LOCKS space id 35143 page no 6345655 n bits 72 index PRIMARY of table `webdb`.`ordered_contracts_orderrequest` trx id 7193044632 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
 
2020-11-30  7:24:02 89 [ERROR] InnoDB: WSREP state:
2020-11-30  7:24:02 89 [ERROR] WSREP: Thread BF trx_id: 7193044625 thread: 89 seqno: 338782886 query_state: executing conf_state: no conflict exec_mode: applier applier: 1 query: INSERT INTO `ordered_contracts_orderrequest` (`contract_id`, `external_id`, `requested_at`, `finished_at`, `type`, `status`, `data`, `flow_data`) VALUES (xxxxxxxx, 'xxxxxxxxxxxxxxxxxxxxx', '2020-xx-xx xx:xx:32', NULL, 'create_contract', 'ongoing', '{\"tasks\": {\"nodes\": [{\"tasks\": [{\"task_type\": \"xxxxxxx\", \"data\": {\"pk\": xxxxxxxxxxxx, \"changes\": {\"confirmation_code\": \"xxxxxxxxxxxx\", \"offer\": \"C_ACQ_PRE_POST_B2C_ALL\", \"customer\": xxxxxxxxxxx, \"xxxxxxx\": {\"xxxxxxxx\": \"xxxxxxxxxxxx\", \"reseller\": \"xxxxxxxx\", \"chain\": \"xxxx\", \"channel\": \"xxxx\", \"sales_id\": \"xxxxxxxx\", \"sales_type\": \"acquisition\", \"xxxxx\": \"xxxxxxxxxxxx\", \"xxxxxxxxx\": null, \"xxxxxxxx\": null, \"extra_information\": {}, \"xxxxxxxxxx\": null}, \"products\": [{\"product_type\": \"xxxxxxxxx\",
2020-11-30  7:24:02 89 [ERROR] WSREP: Thread BF trx_id: 7193044632 thread: 1 seqno: 338782905 query_state: idle conf_state: no conflict exec_mode: applier applier: 1 query: NULL
2020-11-30 07:24:02 0x7eb44f94b700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.26/storage/innobase/lock/lock0lock.cc line 687
InnoDB: We intentionally generate a memory trap.

Note: Automatic IST with rsync.

Comment by Christopher Svensson [ 2020-12-04 ]

We also experienced this after an upgrade from 10.5.5 to 10.5.8:

2020-12-04  8:58:21 0 [Note] WSREP: Receiving IST...  0.0% (  0/160 events) complete.
2020-12-04  8:58:21 0 [Note] WSREP: IST preload starting at 11365559
2020-12-04  8:58:21 0 [Note] WSREP: Service thread queue flushed.
2020-12-04  8:58:21 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:11365558, protocol version: 5
2020-12-04  8:58:21 10 [ERROR] InnoDB: Conflicting lock on table: `app-scheduler`.`jobs` index: fk_schedule_id that has lock
RECORD LOCKS space id 3701 page no 1590 n bits 272 index fk_schedule_id of table `app-scheduler`.`jobs` trx id 24901901 lock_mode X locks rec but not gap
Record lock, heap no 121
2020-12-04  8:58:21 10 [ERROR] InnoDB: WSREP state:
2020-12-04  8:58:21 10 [ERROR] WSREP: Thread BF trx_id: 24901902 thread: 10 seqno: 11365642 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: /* delete net.iponly.app_scheduler.entities.Schedule */ delete from schedules where id='99576e11-b63b-44a8-a5b1-26cc583fe20d'<A3><EB><C9>_
^S^A
2020-12-04  8:58:21 10 [ERROR] WSREP: Thread BF trx_id: 24901901 thread: 6 seqno: 11365641 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: /* delete net.iponly.app_scheduler.entities.Job */ delete from jobs where id='11397042-f4c7-4243-8bfd-ad38c3cb8767'<A3><EB><C9>_^S^A
2020-12-04 08:58:21 0x7f93a80bf700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.8/storage/innobase/lock/lock0lock.cc line 655

We're using mariabackup for SST.

Comment by Bart Smienk [ 2020-12-17 ]

Same issue over here, currently downgrading to 10.3.25 (oldest available in the repos) from 10.3.27 at the moment to see if that resolves our issues.

Upgraded a few weeks ago from 10.3.18 which did not present any issues, but the secondary cluster node now crashes multiple times per week. We're running a master/master setup (with garb as witness)

Our cluster successfully does SST or IST, but just crashes at random when some locks conflict. This happened every now and then already in the past but never resulted in a crash of the cluster

Comment by Seppo Jaakola [ 2020-12-21 ]

Submitted two pull requests to fix this issue and MDEV-23851, which appear to be duplicates. Reason for the crashes is recently added too strict assertion, which fires in safe execution. This is fixed by the pull request: https://github.com/MariaDB/server/pull/1723.

There is also a real bug resulting from a more rare scenario, where unique secondary key value is modified in certain way (often caused by REPLACE statement usage), resulting in secondary index gap locking and there happens simultaneous INSERT replicated from other node and inserting into the locked gap. This issue has a fix in pull request: https://github.com/MariaDB/server/pull/1724

Comment by Seppo Jaakola [ 2020-12-21 ]

two pull requests submitted for this:
https://github.com/MariaDB/server/pull/1723
https://github.com/MariaDB/server/pull/1724

Comment by Bart Smienk [ 2020-12-21 ]

Thanks for your work Seppo, much appreciated.
I'll keep an eye on future releases to see if your fix has been included, so we can update the cluster again to the latest release.

Comment by Jan Lindström (Inactive) [ 2020-12-28 ]

Duplicate of MDEV-23851

Generated at Thu Feb 08 09:28:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.