[MDEV-33055] Replica crash loop after Assertion Created: 2023-12-18  Updated: 2024-01-23  Resolved: 2024-01-23

Status: Closed
Project: MariaDB Server
Component/s: Replication, Server
Affects Version/s: 10.6.14
Fix Version/s: 10.6.15

Type: Bug Priority: Major
Reporter: Andrea Ponzo Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS linux 7


Issue Links:
Relates
relates to MDEV-23713 Replication stops with "Index for tab... Stalled

 Description   

I'm facing this assertion failure on 2 REPLICA servers having same Master.
All severs are using 10.6.14-MariaDB-log version.

2023-12-15 06:13:27 0x7f9494dc9700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.6.14/storage/innobase
/row/row0ins.cc line 222
InnoDB: Failing assertion: !cursor->index()->is_committed()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
231215  6:13:27 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.6.14-MariaDB-log source revision: c93754d45e5d9379e3e23d7ada1d5f21d2711f66
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=17
max_threads=752
thread_count=22
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6407326 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f94bc015df8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f9494dc8c40 thread_stack 0x49000
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55725ea07cde]
mysys/stacktrace.c:216(my_print_stacktrace)[0x55725e45ac27]
sigaction.c:0(__restore_rt)[0x7f96304a1630]
/lib64/libc.so.6(gsignal+0x37)[0x7f962f8ec387]
/lib64/libc.so.6(abort+0x148)[0x7f962f8eda78]
/usr/sbin/mariadbd(+0x691602)[0x55725e100602]
ut/ut0rbt.cc:220(??)[0x55725e80792c]
row/row0ins.cc:2203(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x55725e80a484]
row/row0ins.cc:3322(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x55725e83f964]
row/row0upd.cc:2034(row_upd_sec_index_entry)[0x55725e83fda1]
row/row0upd.cc:2787(row_upd)[0x55725e81b1f1]
row/row0mysql.cc:1693(row_update_for_mysql(row_prebuilt_t*))[0x55725e75804c]
handler/ha_innodb.cc:8695(ha_innobase::update_row(unsigned char const*, unsigned char const*))[0x55725e469e8a]
sql/handler.cc:7684(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x55725e57f7b8]
sql/log_event_server.cc:8519(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x55725e573854]
sql/log_event_server.cc:5768(Rows_log_event::do_apply_event(rpl_group_info*))[0x55725e17d5cf]
sql/log_event.h:1500(Log_event::apply_event(rpl_group_info*))[0x55725e3925fe]
sql/rpl_parallel.cc:61(rpt_handle_event)[0x55725e0f0a9e]
sql/rpl_parallel.cc:1021(retry_event_group)[0x55725e395e86]
sql/rpl_parallel.cc:1407(handle_rpl_parallel_thread)[0x55725e6a71dc]
pthread_create.c:0(start_thread)[0x7f9630499ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f962f9b4b0d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 1854
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=off,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway.
Query:
 
Writing a core file...
Working directory at /usr/local/appian/local-data/mysqldata/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             126545               126545               processes
Max open files            100000               100000               files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       126545               126545               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: core
 
Kernel version: Linux version 3.10.0-1160.105.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Dec 7 15:39:45 UTC 2023

This triggered a crash loop generating many core files, so If needed i have the core dump of one of the core file created.

Cloud this issue be related to MDEV-22739 ?



 Comments   
Comment by Alice Sherepa [ 2023-12-19 ]

Is is possible for you to upgrade to the recent version-10.6.16 and check if the crash is still repeatable? ( yes, it might be related to MDEV-22739, that patch is in 10.6.15)

Comment by Andrea Ponzo [ 2023-12-20 ]

Thanks Alice,
will do and keep you posted here.

Comment by Andrea Ponzo [ 2024-01-23 ]

Hello,
sorry for the long delay answer.
Just to confirm that after upgrading to 10.6.15 we are not facing anymore this issue.
Thanks

Comment by Sergei Golubchik [ 2024-01-23 ]

Thanks for confirming!

Generated at Thu Feb 08 10:36:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.