[MDEV-20095] MariaDB server replication crashing Created: 2019-07-18  Updated: 2023-04-17

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.3.14, 10.3.16
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Will Reiske Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: crash, innodb, replication, slave_reconnect_skip_events
Environment:

CentOS Linux release 7.6.1810 (Core)
32x Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
mysql Ver 15.1 Distrib 10.3.16-MariaDB, for Linux (x86_64) using readline 5.1


Attachments: PNG File chrome_auXwiTGMPs.png     Text File mariadb-stacktrace-error.log.txt     Text File slave-status.txt    
Issue Links:
Duplicate
duplicates MDEV-14903 Slave applier segfaults after reconne... Open
Relates
relates to MDEV-14903 Slave applier segfaults after reconne... Open

 Description   

Hello!

We are having issues with a replica crashing. I have attached the error log as an attachment. We have a 1 master, 2 replica setup. One of our 2 replicas crashed (mysql died out of nowhere – keep in mind this only happened on 1 replica) over the weekend and will not catch up on replication. MySQL starts and we can initiate a 'start slave', however, after a period of time mysqld crashes with the backtrace attached and we have to 'start slave' again. The slave is now 5 days behind in replication and will not stay alive long enough to catch up.

I upgraded the replica from 10.3.14 to 10.3.16 to see if it would fix the issue, but it still occurs on 10.3.16.

Any ideas?

======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f7438388b67]
/lib64/libc.so.6(+0x115ce2)[0x7f7438386ce2]
/lib64/libc.so.6(+0x117ac7)[0x7f7438388ac7]
/usr/sbin/mysqld(my_addr_resolve+0xda)[0x5597de78c1aa]
/usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x5597de7757e2]
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x5597de216dff]
/lib64/libpthread.so.0(+0xf5d0)[0x7f7439fd45d0]
/usr/sbin/mysqld(+0x875be0)[0x5597de305be0]
/usr/sbin/mysqld(_Z19mysql_unlock_tablesP3THDP13st_mysql_lockb+0x12c)[0x5597de305e4c]
/usr/sbin/mysqld(_Z19close_thread_tablesP3THD+0x141)[0x5597ddfdfdc1]
/usr/sbin/mysqld(_ZN14rpl_group_info25slave_close_thread_tablesEP3THD+0x43)[0x5597de110af3]
/usr/sbin/mysqld(_ZN14rpl_group_info15cleanup_contextEP3THDb+0x28)[0x5597de110df8]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x479)[0x5597de30eaf9]
/usr/sbin/mysqld(+0x50af23)[0x5597ddf9af23]
/usr/sbin/mysqld(handle_slave_sql+0x161c)[0x5597ddfa439c]
/usr/sbin/mysqld(+0xc987cd)[0x5597de7287cd]
/lib64/libpthread.so.0(+0x7dd5)[0x7f7439fccdd5]
/lib64/libc.so.6(clone+0x6d)[0x7f743836f02d]



 Comments   
Comment by Andrei Elkin [ 2019-12-17 ]

Analyzed briefly to compare with already reported referred MDEV
to conclude in particular that this one duplicates the former which
thankfully approached to the root of the issue.

Generated at Thu Feb 08 08:56:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.