Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.2.17, 10.2.18, 10.3.10, 10.8.5, 10.4.28
-
CentOS 7.x
Description
We have a 10.2.18 master and 4 slaves with 10.2.18. Because of this bug that I've reported last week (https://jira.mariadb.org/browse/MDEV-17420) we tried to upgrade one of the replicas to 10.3.10 to see if the bug was still there in that version.
After the upgrade, the slave began to sync binary log events from the master, but after a few seconds it stopped with the following errors:
mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave IO thread did not receive an expected Rows-log end-of-statement for event starting at log 'main.010755' position 89390527 whose last block was seen at log 'main.010755' position 89390527. The end-of-statement should have been delivered before the current one at log 'main.010755' position 89390610
|
mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595
|
We checked the binary log in the master and does not seem corrupted and the other 3 replicas are working ok (still with 10.2.18). Also the relay log on the slave seems ok.
The slave is stopped with the following status:
mainro [(none)]> show slave status\G
|
*************************** 1. row ***************************
|
Master_Log_File: main.010755
|
Read_Master_Log_Pos: 89411517
|
Relay_Log_File: relay-bin.000002
|
Relay_Log_Pos: 550
|
Relay_Master_Log_File: main.010755
|
Slave_IO_Running: No
|
Slave_SQL_Running: Yes
|
Exec_Master_Log_Pos: 89410432
|
Relay_Log_Space: 1657
|
Last_IO_Errno: 1595
|
Last_IO_Error: Relay log write failure: could not queue event from master
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
|
Slave_DDL_Groups: 16763
|
Slave_Non_Transactional_Groups: 7974
|
Slave_Transactional_Groups: 107415
|
We have already tried with RESET SLAVE, clearing relay logs and executing CHANGE MASTER TO again. But the result is the same.
I've dumped the master binary log at that position and, interestingly, slave IO thread stops in a binlog block (master binlog format is MIXED).
The master/slave set that is failing is the main database server for our organization. It has 1.2 TB of data with 38 databases, views, stored procedures, triggers and complex queries.
In the same servers, we have 3 other small sets of master/slaves with the same combination (10.2.18 master and 10.3.10 slaves) and are working perfectly. Of course, these sets are smaller and simpler than the instance that is failing.
Thanks
Attachments
Issue Links
- blocks
-
MDEV-17420 MariaDB slave 10.2 leaks temporary tables
- Open