[MDEV-28306] Slave I/O: Fatal error: Failed to run 'after_read_event' hook, Internal MariaDB error code: 1593 Created: 2022-04-13  Updated: 2023-10-24

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.6.7
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Harvey Cooper Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS Stream 8 x86_64


Issue Links:
Relates
relates to MDEV-32551 "Read semi-sync reply magic number er... Closed

 Description   

SHOW SLAVE STATUS \G
                   Master_Host: 172.31.31.10
                   Master_User: orchestrator
                   Master_Port: 3306
              Slave_IO_Running: No
             Slave_SQL_Running: Yes
                 Last_IO_Errno: 1593
                 Last_IO_Error: Fatal error: Failed to run 'after_read_event' hook
 
2022-04-11 18:00:11 83 [Note] Slave I/O thread: connected to master 'orchestrator@172.31.31.10:3306',replication starts at GTID position '1-1-338'
2022-04-11 18:00:11 83 [ERROR] Missing magic number for semi-sync packet, packet len: 53
2022-04-11 18:00:11 83 [ERROR] Slave I/O: Fatal error: Failed to run 'after_read_event' hook, Internal MariaDB error code: 1593
2022-04-11 18:00:11 83 [Note] Slave I/O thread exiting, read up to log 'mysql-bin-13306.000001', position 86652; GTID position 1-1-338, master 172.31.31.10:3306

I have a simple MariaDB cluster, the master(172.31.31.10), and a slave(172.31.31.12).
And I randomly drop ingress packets in master and slave by the following commands:

iptables -A INPUT -p tcp --destination-port 3306 -j DROP
iptables -A INPUT -p icmp -j DROP

also I'll randomly remove above iptables rules to simulate network jitter.

And under some circumstances, I met the Fatal error: Failed to run 'after_read_event' hook, I don't know about the root cause, it seems some kinda race condition when initializing the semi-sync replication.

Also note that, when I met this kinda error, I ran START SLAVE IO_THREAD the replication will back to normal.



 Comments   
Comment by Daniel Black [ 2022-04-13 ]

Thanks for the bug report. Can you share any replication setup system variables used and perhaps the contents of the binary log at the point at which it failed.

If the binary log contents are too private to be public, then https://mariadb.com/kb/en/meta/mariadb-ftp-server/ is available for communication to developers directly.

Comment by Harvey Cooper [ 2022-04-13 ]

Replication setup

# Slave config
[mariadb]
# Disable reverse DNS lookup for new connection
skip-name-resolve
# Enable GTID
gtid-domain-id=1
# Always enale gtid strict mode to avoid data inconsistency
# when the master is changed to another host
gtid_strict_mode=1
server-id=3
# Binlog name
log-bin=mysql-bin-13308
# Make binlog crash safe
sync_binlog=1
# Updates on a replica received from a primary during replication are logged in the replica's binary log
# i.e. daisy-chain replication
log_slave_updates=1
binlog_format=ROW
relay_log=mysql-relay-bin-13308
rpl_semi_sync_master_enabled=OFF
rpl_semi_sync_slave_enabled=ON
# Wait one slave ACK then commit in master's storage engine
rpl_semi_sync_master_wait_point=AFTER_SYNC
# Timeout in milliseconds, for semi-synchronous replication in the primary.
# If this timeout is exceeded in waiting on a commit for acknowledgement from a replica,
# the primary will revert to asynchronous replication.
rpl_semi_sync_master_timeout=9223372036854775807
read_only=1
 
binlog_row_image=FULL
innodb_buffer_pool_size=1G
innodb_buffer_pool_chunk_size=1G
 
character_set_server = utf8mb4
collation_server = utf8mb4_general_ci

mysql> SHOW VARIABLES LIKE '%rpl%';
+---------------------------------------+---------------------+
| Variable_name                         | Value               |
+---------------------------------------+---------------------+
| rpl_semi_sync_master_enabled          | OFF                 |
| rpl_semi_sync_master_timeout          | 9223372036854775807 |
| rpl_semi_sync_master_trace_level      | 32                  |
| rpl_semi_sync_master_wait_no_slave    | ON                  |
| rpl_semi_sync_master_wait_point       | AFTER_SYNC          |
| rpl_semi_sync_slave_delay_master      | OFF                 |
| rpl_semi_sync_slave_enabled           | ON                  |
| rpl_semi_sync_slave_kill_conn_timeout | 5                   |
| rpl_semi_sync_slave_trace_level       | 32                  |
+---------------------------------------+---------------------+
9 rows in set (0.00 sec)

But the binlog isn't available, I'll try to reproduce this glitch and then fetch the binlog ...

Generated at Thu Feb 08 09:59:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.