Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.11.14
-
None
-
OS: Debian 12
-
Can result in unexpected behaviour
Description
In a MariaDB Primary - Replica setup our replica server stopped receiving new data from the primary DB despite both the Slave_IO_Running and Slave_SQL_Running variables reporting "Yes" as their status.
The issue was noticed when we saw that "SHOW SLAVE HOSTS;" on the primary only reported one replica instead of the expected two.
In an attempt to fix the issue, the "STOP SLAVE" and "START SLAVE" commands were ran on the replica. When the replica status was checked after restart the replica reported the error appended below.
We believe that this replica has been effectively stopped by this issue for multiple months, but due to the fact that MariaDB never fully failed, or reported the "Slave_IO_Running" and "Slave_SQL_Running" variables as "No", our monitoring software failed to detect an issue.
We are unsure what specific steps to follow to replicate the issue.
MariaDB [(none)]> show slave hosts;
|
+-----------+----------------+------+-----------+
|
| Server_id | Host | Port | Master_id |
|
+-----------+----------------+------+-----------+
|
| 2 | db2 | 3306 | 1 |
|
+-----------+----------------+------+-----------+
|
1 row in set (0.000 sec)
|
DB3 (Replica) "SHOW SLAVE STATUS;" before/after restart output:
MariaDB [(none)]> show slave status\G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for the slave SQL thread to free enough relay log space
|
Master_Host: db1.domain.com
|
Master_User: rep_user
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: bin-log.000108
|
Read_Master_Log_Pos: 169830505
|
Relay_Log_File: bin-relay.000017
|
Relay_Log_Pos: 169830760
|
Relay_Master_Log_File: bin-log.000108
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 169830463
|
Relay_Log_Space: 1077914767
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 0
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 1
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 0-1-424511
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: optimistic
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
|
Slave_DDL_Groups: 3441
|
Slave_Non_Transactional_Groups: 96
|
Slave_Transactional_Groups: 105912
|
Replicate_Rewrite_DB:
|
1 row in set (0.000 sec)
|
|
|
MariaDB [(none)]> stop slave;
|
|
|
Query OK, 0 rows affected (5 min 7.683 sec)
|
|
|
MariaDB [(none)]> start slave;
|
Query OK, 0 rows affected (0.059 sec)
|
|
|
MariaDB [(none)]> show slave status\G
|
*************************** 1. row ***************************
|
Slave_IO_State:
|
Master_Host: db1.domain.com
|
Master_User: rep_user
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: bin-log.000108
|
Read_Master_Log_Pos: 169830463
|
Relay_Log_File: bin-relay.000001
|
Relay_Log_Pos: 4
|
Relay_Master_Log_File: bin-log.000108
|
Slave_IO_Running: No
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 169830463
|
Relay_Log_Space: 296
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: NULL
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 1236
|
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged.'
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 1
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 0-1-424511
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: optimistic
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
|
Slave_DDL_Groups: 3441
|
Slave_Non_Transactional_Groups: 96
|
Slave_Transactional_Groups: 105912
|
Replicate_Rewrite_DB:
|
1 row in set (0.000 sec)
|
|
|
MariaDB [(none)]>
|
db3.err, truncated for brevity:
2026-01-30 21:54:35 356 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
|
2026-01-30 21:54:35 356 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000100' at position 804997841; GTID position '0-1-424259', GTID event skip 48998
|
2026-01-30 21:54:36 356 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424260. It is to skip 48998 already received events including the gtid one
|
2026-01-30 22:00:50 356 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
|
2026-01-30 22:00:50 356 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000104' at position 1077960884; GTID position '0-1-424421', GTID event skip 33015
|
2026-01-30 22:00:50 356 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424422. It is to skip 33015 already received events including the gtid one
|
2026-01-30 22:27:50 357 [Note] Error reading relay log event: slave SQL thread was killed
|
2026-01-30 22:27:50 357 [Note] Slave SQL thread exiting, replication stopped in log 'bin-log.000104' at position 163705474; GTID position '0-1-424421', master: db1.domain.com:3306
|
2026-01-30 22:32:09 356 [ERROR] Slave I/O thread aborted while waiting for relay log space
|
2026-01-30 22:32:09 356 [Note] Slave I/O thread exiting, read up to log 'bin-log.000104', position 163705516; GTID position 0-1-424421, master db1.domain.com:3306
|
2026-01-30 22:32:15 5212 [Note] Slave I/O thread: Start asynchronous replication to master 'rep_user@db1.domain.com:3306' in log 'bin-log.000104' at position 163705474
|
2026-01-30 22:32:15 5213 [Note] Slave SQL thread initialized, starting replication in log 'bin-log.000104' at position 163705474, relay log '/db/mysql/log-relay/bin-relay.000001' position: 4; GTID position '0-1-424421'
|
2026-01-30 22:32:15 5212 [Note] Slave I/O thread: connected to master 'rep_user@db1.domain.com:3306',replication starts at GTID position '0-1-424421'
|
2026-01-30 22:36:22 5212 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
|
2026-01-30 22:36:22 5212 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000105' at position 145331453; GTID position '0-1-424433', GTID event skip 6627
|
2026-01-30 22:36:22 5212 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424434. It is to skip 6627 already received events including the gtid one
|
2026-01-30 22:37:55 5212 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
|
2026-01-30 22:37:55 5212 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000108' at position 479305461; GTID position '0-1-424511', GTID event skip 9403
|
2026-01-30 22:37:55 5212 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424512. It is to skip 9403 already received events including the gtid one
|
2026-01-30 22:44:21 5212 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
|
2026-01-30 22:44:21 5212 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000108' at position 1077911864; GTID position '0-1-424511', GTID event skip 27933
|
2026-01-30 22:44:21 5212 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424512. It is to skip 27933 already received events including the gtid one
|
|
|
...
|
|
|
2026-04-14 14:09:45 5213 [Note] Error reading relay log event: slave SQL thread was killed
|
2026-04-14 14:09:45 5213 [Note] Slave SQL thread exiting, replication stopped in log 'bin-log.000108' at position 169830463; GTID position '0-1-424511', master: db1.domain.com:3306
|
2026-04-14 14:14:52 5212 [ERROR] Slave I/O thread aborted while waiting for relay log space
|
2026-04-14 14:14:52 5212 [Note] Slave I/O thread exiting, read up to log 'bin-log.000108', position 169830505; GTID position 0-1-424511, master db1.domain.com:3306
|
2026-04-14 14:15:01 29825 [Note] Slave I/O thread: Start asynchronous replication to master 'rep_user@db1.domain.com:3306' in log 'bin-log.000108' at position 169830463
|
2026-04-14 14:15:01 29826 [Note] Slave SQL thread initialized, starting replication in log 'bin-log.000108' at position 169830463, relay log '/db/mysql/log-relay/bin-relay.000001' position: 4; GTID position '0-1-424511'
|
2026-04-14 14:15:01 29825 [Note] Slave I/O thread: connected to master 'rep_user@db1.domain.com:3306',replication starts at GTID position '0-1-424511'
|
2026-04-14 14:15:01 29825 [ERROR] Error reading packet from server: Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged. (server_errno=1236)
|
2026-04-14 14:15:01 29825 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged.', Internal MariaDB error code: 1236
|
2026-04-14 14:15:01 29825 [Note] Slave I/O thread exiting, read up to log 'bin-log.000108', position 169830463; GTID position 0-1-424511, master db1.domain.com:3306
|
MariaDB configuration file for DB3:
[mysqld]
|
datadir = '/db/mysql/data'
|
socket = '/db/mysql/mysql.sock'
|
tmpdir = '/db/mysql/tmp'
|
bind_address = '0.0.0.0'
|
port = '3306'
|
|
|
server-id = '3'
|
gtid_strict_mode = '1'
|
gtid_ignore_duplicates = '1'
|
log_slave_updates = '1'
|
relay-log-space-limit = '1G'
|
relay_log_purge = '1'
|
read_only = '1'
|
skip-slave-start
|
relay-log = '/db/mysql/log-relay/bin-relay'
|
report_host = 'db3'
|
|
|
log-error
|
log-warnings
|
log_bin = '/db/mysql/log-bin/bin-log'
|
expire_logs_days = '14'
|
log_bin_trust_function_creators = '1'
|
max_binlog_size = '250M'
|
binlog_format = 'ROW'
|
sync_binlog = '1'
|
sql_mode = ''
|
|
|
thread_cache_size = '100'
|
max_connections = '1000'
|
tmp_table_size = '100M'
|
max_heap_table_size = '100M'
|
max_allowed_packet = '1G'
|
query_cache_size = '0'
|
query_cache_type = '0'
|
table_definition_cache = '50000'
|
table_open_cache = '50000'
|
open_files_limit = '100000'
|
wait_timeout = '3600'
|
|
|
user = 'mysql'
|
|
|
innodb-buffer-pool-instances = '4'
|
innodb_file_per_table = '1'
|
innodb_data_home_dir = '/db/mysql/innodb'
|
innodb_log_group_home_dir = '/db/mysql/innodb'
|
innodb_buffer_pool_size = '1G'
|
innodb_log_file_size = '1G'
|
innodb_log_files_in_group = '2'
|
innodb_thread_concurrency = '8'
|
innodb_flush_method = 'O_DIRECT'
|
innodb_flush_log_at_trx_commit = '1'
|
innodb_io_capacity = '2100'
|
innodb_open_files = '50000'
|
|
|
concurrent_insert = 2
|
|
|
# Full Text Search
|
ft_min_word_len = '3'
|
ft_max_word_len = '35'
|
|
|
# Encryption
|
loose-innodb-encryption-threads = '4'
|
loose-innodb-encryption-rotate-key-age = '1'
|
|
|
# UTF-8 Support
|
init_connect='SET collation_connection = utf8_unicode_ci'
|
init_connect='SET NAMES utf8'
|
character-set-server = utf8
|
collation-server = utf8_unicode_ci
|
skip-character-set-client-handshake
|
|
|
skip-name-resolve
|
|
|
|
|
|
|
!include /etc/mysql/mariadb.conf.d/file_key_management.cnf
|
|
|
[client]
|
socket = '/db/mysql/mysql.sock'
|
socket = /db/mysql/mysql.sock
|
|
Attachments
Issue Links
- relates to
-
MDEV-38906 Do not resume IO Threads in the middle of an event group
-
- Open
-