[MDEV-35054] Replication failure followed by Mariadb crash - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.11.5
Fix Version/s: 10.11
Component/s: Replication
Labels:
- crash
- replication

Description

Hello Team,

We observed a replication failure between the database servers db1 and db2, followed by a crash of the MariaDB service on db1 due to signal 11 (segmentation fault). Could you please investigate the root cause for both the replication failure and the subsequent crash?

Replication Failure (Sept 30, 2024 - 22:47:31):
Error Overview: The replication failure was triggered due to a checksum verification error while reading from the relay log.

Corruption in the relay log or the master’s binary log.

The replication SQL thread encountered an issue with parsing the relay log event, which eventually led to an I/O error and the termination of the slave SQL thread.

mysql logs:

*2024-09-30 22:47:31 27 [ERROR] Replication event checksum verification failed while reading from a log file
2024-09-30 22:47:31 27 [ERROR] Error in *Log_event::read_log_event(): 'Replication event checksum verification failed while reading from a log file', data_len: 680, event_type: 160
2024-09-30 22:47:31 27 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
2024-09-30 22:47:31 27 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Internal MariaDB error code: 1594
2024-09-30 22:47:31 27 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'db2-binary.231673' position 125252441
2024-09-30 22:47:31 27 [Note] Slave SQL thread exiting, replication stopped in log 'db2-binary.231673' at position 125272617, master: 10.76.24.77:3306
2024-09-30 22:48:18 26 [ERROR] Could not use ./db1-relay.272126 for logging (error 11). Turning logging off for the whole duration of the MariaDB server process. To turn it on again: fix the cause, shutdown the MariaDB server and restart it.
2024-09-30 22:48:18 26 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595
2024-09-30 22:48:18 26 [Note] Slave I/O thread exiting, read up to log 'db2-binary.231674', position 4, master 10.76.24.77:3306
2024-10-01 0:14:32 35483020 [Note] Slave I/O thread: Start asynchronous replication to master 'gtid@10.76.24.77:3306' in log 'db2-binary.231674' at position 4
2024-10-01 0:14:32 35483020 [Note] Slave I/O thread: connected to master 'gtid@10.76.24.77:3306',replication started in log 'db2-binary.231674' at position 4
241001 0:14:32 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.11.5-MariaDB-log source revision: 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c
key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=8
max_threads=802
thread_count=27
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831964 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f9be8001598
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f9f495c64b8 thread_stack 0x49000
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x558946a8a12e]
mysys/stacktrace.c:216(my_print_stacktrace)[0x5589465835b5]
/lib64/libpthread.so.0(+0x12cf0)[0x7fc54860ecf0]
bits/string_fortified.h:34(check_binlog_magic(st_io_cache*, char const**))[0x5589466795e8]
sql/rpl_rli.cc:717(init_relay_log_pos(Relay_log_info*, char const*, unsigned long long, bool, char const**, bool))[0x55894644786a]
sql/slave.cc:5493(handle_slave_sql)[0x55894628ee7a]
perfschema/pfs.cc:2204(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55894678ac9d]
/lib64/libpthread.so.0(+0x81ca)[0x7fc5486041ca]
/lib64/libc.so.6(clone+0x43)[0x7fc547954e73]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 35483021
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.

We think the query pointer is invalid, but we will try to print it anyway.
Query:

Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 2061710 2061710 processes
Max open files 32768 32768 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 2061710 2061710 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: /var/coredumps/core.%e.%p

Kernel version: Linux version 4.18.0-477.13.1.el8_8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)) #1 SMP Thu May 18 10:27:05 EDT 2023

2024-10-01 0:15:21 0 [Note] Starting MariaDB 10.11.5-MariaDB-log source revision 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c as process 194175
2024-10-01 0:15:21 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2024-10-01 0:15:21 0 [Note] InnoDB: Number of transaction pools: 1
2024-10-01 0:15:21 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2024-10-01 0:15:22 0 [Note] InnoDB: Using Linux native AIO
2024-10-01 0:15:22 0 [Note] InnoDB: Initializing buffer pool, total size = 150.000GiB, chunk size = 2.344GiB
2024-10-01 0:15:22 0 [Note] InnoDB: Completed initialization of buffer pool
2024-10-01 0:15:22 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
2024-10-01 0:15:23 0 [Note] InnoDB: End of log at LSN=48180725989630
2024-10-01 0:15:23 0 [Note] InnoDB: 128 rollback segments are active.
2024-10-01 0:15:24 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2024-10-01 0:15:24 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
2024-10-01 0:15:24 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
2024-10-01 0:15:24 0 [Note] InnoDB: log sequence number 48180725989630; transaction id 151460736751
2024-10-01 0:15:24 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2024-10-01 0:15:24 0 [Note] Plugin 'FEEDBACK' is disabled.
2024-10-01 0:15:24 0 [Warning] 'thread-concurrency' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
2024-10-01 0:15:24 0 [Warning] 'innodb-thread-concurrency' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
2024-10-01 0:15:24 0 [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
2024-10-01 0:15:24 0 [Note] Recovering after a crash using db1-binary
2024-10-01 0:15:24 0 [Note] Starting table crash recovery...
2024-10-01 0:15:24 0 [Note] Crash table recovery finished.
2024-10-01 0:15:24 0 [Note] Server socket created on IP: '0.0.0.0'.
2024-10-01 0:15:24 0 [Note] Server socket created on IP: '::'.
2024-10-01 0:15:24 0 [Warning] 'user' entry '@alepo-db2-ilb' ignored in --skip-name-resolve mode.
2024-10-01 0:15:24 0 [Warning] 'user' entry 'root@localhost.localdomain' ignored in --skip-name-resolve mode.
2024-10-01 0:15:24 0 [Warning] 'proxies_priv' entry '@% root@alepo-db2-ilb' ignored in --skip-name-resolve mode.
2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb3.info' Relay_info:'relay-log-s@002ddb3.info'
2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb3.info'
2024-10-01 0:15:24 4 [Note] Master 's-db3': Slave I/O thread: Start asynchronous replication to master '@:3306' in log '' at position 4
2024-10-01 0:15:24 4 [ERROR] Master 's-db3': Slave I/O: Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. Internal MariaDB error code: 1593
2024-10-01 0:15:24 4 [Note] Master 's-db3': Slave I/O thread killed while connecting to master
2024-10-01 0:15:24 4 [Note] Master 's-db3': Slave I/O thread exiting, read up to log 'FIRST', position 4, master :3306
2024-10-01 0:15:24 3 [Note] Started replication for 's-db3'
2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb4.info' Relay_info:'relay-log-s@002ddb4.info'
2024-10-01 0:15:24 5 [Note] Master 's-db3': Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './db1-relay-s@002ddb3.000040' position: 4
2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb4.info'
2024-10-01 0:15:24 18 [Note] Master 's-db4': Slave I/O thread: Start asynchronous replication to master 'gtid@10.76.25.11:3306' in log 'db4-binary.208330' at position 44444885
2024-10-01 0:15:24 3 [Note] Started replication for 's-db4'
2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb2.info' Relay_info:'relay-log-s@002ddb2.info'
2024-10-01 0:15:24 19 [Note] Master 's-db4': Slave SQL thread initialized, starting replication in log 'db4-binary.208330' at position 44444885, relay log './db1-relay-s@002ddb4.115156' position: 783
2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb2.info'
2024-10-01 0:15:24 20 [Note] Master 's-db2': Slave I/O thread: Start asynchronous replication to master '@:3306' in log '' at position 4
2024-10-01 0:15:24 20 [ERROR] Master 's-db2': Slave I/O: Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. Internal MariaDB error code: 1593
2024-10-01 0:15:24 20 [Note] Master 's-db2': Slave I/O thread killed while connecting to master
2024-10-01 0:15:24 20 [Note] Master 's-db2': Slave I/O thread exiting, read up to log 'FIRST', position 4, master :3306
2024-10-01 0:15:24 3 [Note] Started replication for 's-db2'
2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb5.info' Relay_info:'relay-log-s@002ddb5.info'
2024-10-01 0:15:24 21 [Note] Master 's-db2': Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './db1-relay-s@002ddb2.000039' position: 4
2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb5.info'
2024-10-01 0:15:24 22 [Note] Master 's-db5': Slave I/O thread: Start asynchronous replication to master '@:3306' in log '' at position 4
2024-10-01 0:15:24 22 [ERROR] Master 's-db5': Slave I/O: Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. Internal MariaDB error code: 1593
2024-10-01 0:15:24 22 [Note] Master 's-db5': Slave I/O thread killed while connecting to master
2024-10-01 0:15:24 22 [Note] Master 's-db5': Slave I/O thread exiting, read up to log 'FIRST', position 4, master :3306
2024-10-01 0:15:24 3 [Note] Started replication for 's-db5'
2024-10-01 0:15:24 23 [Note] Master 's-db5': Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './db1-relay-s@002ddb5.000031' position: 4
2024-10-01 0:15:24 25 [Note] Slave I/O thread: Start asynchronous replication to master 'gtid@10.76.24.77:3306' in log 'db2-binary.231673' at position 209716353
2024-10-01 0:15:24 0 [Note] /usr/sbin/mariadbd: ready for connections.
Version: '10.11.5-MariaDB-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
2024-10-01 0:15:24 26 [Note] Slave SQL thread initialized, starting replication in log 'db2-binary.231673' at position 125272617, relay log './db1-relay.272125' position: 58163882
2024-10-01 0:15:24 25 [Note] Slave I/O thread: connected to master 'gtid@10.76.24.77:3306',replication started in log 'db2-binary.231673' at position 209716353
2024-10-01 0:15:24 18 [Note] Master 's-db4': Slave I/O thread: connected to master 'gtid@10.76.25.11:3306',replication started in log 'db4-binary.208330' at position 44444885
2024-10-01 0:16:16 330 [Note] Start binlog_dump to slave_server(2), pos(db1-binary.231004, 82365201), using_gtid(0), gtid('')
2024-10-01 0:16:16 331 [Note] Start binlog_dump to slave_server(3), pos(db1-binary.231004, 82365201), using_gtid(0), gtid('')
2024-10-01 0:16:16 333 [Note] Start binlog_dump to slave_server(4), pos(db1-binary.231004, 82365201), using_gtid(0), gtid('')
2024-10-01 0:17:44 0 [Note] InnoDB: Buffer pool(s) load completed at 241001 0:17:44
2024-10-01 2:00:02 330 [Warning] Aborted connection 330 to db: 'unconnected' user: 'gtid' host: '10.76.24.77' (Got an error writing communication packets)
2024-10-01 2:00:02 352786 [Note] Start binlog_dump to slave_server(2), pos(db1-binary.231093, 148119556), using_gtid(0), gtid('')

[root@ALEPO-DB1-ILB coredumps]# systemctl status mysqld
● mariadb.service - MariaDB 10.11.5 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf

Active: active (running) since Tue 2024-10-01 00:15:24 PKT; 11h ago*
Docs: man:mariadbd(8)
https://mariadb.com/kb/en/library/systemd/
Process: 194226 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCC>
Process: 193951 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/g>
Process: 193949 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCE>
Main PID: 194175 (mariadbd)
Status: "Taking your SQL requests now..."
Tasks: 38 (limit: 3298736)
Memory: 165.9G
CGroup: /system.slice/mariadb.service
└─194175 /usr/sbin/mariadbd

Oct 01 00:15:21 ALEPO-DB1-ILB systemd[1]: Starting MariaDB 10.11.5 database server...
Oct 01 00:15:21 ALEPO-DB1-ILB mariadbd[194175]: 2024-10-01 0:15:21 0 [Warning] Could not increase number of max_open_fi>
Oct 01 00:15:24 ALEPO-DB1-ILB systemd[1]: Started MariaDB 10.11.5 database server.
lines 1-20/20 (END)

MYSQL VERSION:
mariadb Ver 15.1 Distrib 10.11.5-MariaDB, for Linux (x86_64) using readline 5.1

Kindly find attached mysqlbinlog file, core dump file, coredump analysis, show slave status output for your reference:

Core file :
https://easyupload.io/bxx7f9
password : 12345678

mysqlbin log file:
https://easyupload.io/d7ogcw
password:12345678

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

gdb.txt
2.63 MB
2024-10-01 10:05
messages_01102024
140 kB
2024-10-01 10:05
mysqld_logs_Oct_1
14 kB
2024-10-01 10:05
sar30
1.18 MB
2024-10-01 10:05
slave status.txt
3 kB
2024-10-01 10:14

Activity

Priya Kunte added a comment - 2024-10-01 10:14

slave status.txt

Priya Kunte added a comment - 2024-10-01 10:14 slave status.txt

MariaDB Server

Replication failure followed by Mariadb crash

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration