Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.2.17, 10.2.18, 10.3.10, 10.8.5, 10.4.28
-
CentOS 7.x
Description
We have a 10.2.18 master and 4 slaves with 10.2.18. Because of this bug that I've reported last week (https://jira.mariadb.org/browse/MDEV-17420) we tried to upgrade one of the replicas to 10.3.10 to see if the bug was still there in that version.
After the upgrade, the slave began to sync binary log events from the master, but after a few seconds it stopped with the following errors:
mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave IO thread did not receive an expected Rows-log end-of-statement for event starting at log 'main.010755' position 89390527 whose last block was seen at log 'main.010755' position 89390527. The end-of-statement should have been delivered before the current one at log 'main.010755' position 89390610
|
mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595
|
We checked the binary log in the master and does not seem corrupted and the other 3 replicas are working ok (still with 10.2.18). Also the relay log on the slave seems ok.
The slave is stopped with the following status:
mainro [(none)]> show slave status\G
|
*************************** 1. row ***************************
|
Master_Log_File: main.010755
|
Read_Master_Log_Pos: 89411517
|
Relay_Log_File: relay-bin.000002
|
Relay_Log_Pos: 550
|
Relay_Master_Log_File: main.010755
|
Slave_IO_Running: No
|
Slave_SQL_Running: Yes
|
Exec_Master_Log_Pos: 89410432
|
Relay_Log_Space: 1657
|
Last_IO_Errno: 1595
|
Last_IO_Error: Relay log write failure: could not queue event from master
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
|
Slave_DDL_Groups: 16763
|
Slave_Non_Transactional_Groups: 7974
|
Slave_Transactional_Groups: 107415
|
We have already tried with RESET SLAVE, clearing relay logs and executing CHANGE MASTER TO again. But the result is the same.
I've dumped the master binary log at that position and, interestingly, slave IO thread stops in a binlog block (master binlog format is MIXED).
The master/slave set that is failing is the main database server for our organization. It has 1.2 TB of data with 38 databases, views, stored procedures, triggers and complex queries.
In the same servers, we have 3 other small sets of master/slaves with the same combination (10.2.18 master and 10.3.10 slaves) and are working perfectly. Of course, these sets are smaller and simpler than the instance that is failing.
Thanks
Attachments
Issue Links
- blocks
-
MDEV-17420 MariaDB slave 10.2 leaks temporary tables
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue blocks |
Description |
We have a 10.2.18 master and 4 slaves with 10.2.18. Because of this bug that I've reported last week (https://jira.mariadb.org/browse/MDEV-17420) we tried to upgrade one of the replicas to 10.3.10 to see if the bug was still there in that version.
After the upgrade, the slave began to sync binary log events from the master, but after a few seconds it stopped with the following errors: {{mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave IO thread did not receive an expected Rows-log end-of-statement for event starting at log 'main.010755' position 89390527 whose last block was seen at log 'main.010755' position 89390527. The end-of-statement should have been delivered before the current one at log 'main.010755' position 89390610 mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595}} We checked the binary log in the master and does not seem corrupted and the other 3 replicas are working ok (still with 10.2.18). Also the relay log on the slave seems ok. The slave is stopped with the following status: {{mainro [(none)]> show slave status\G *************************** 1. row *************************** Master_Log_File: main.010755 Read_Master_Log_Pos: 89411517 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 550 Relay_Master_Log_File: main.010755 Slave_IO_Running: No Slave_SQL_Running: Yes Exec_Master_Log_Pos: 89410432 Relay_Log_Space: 1657 Last_IO_Errno: 1595 Last_IO_Error: Relay log write failure: could not queue event from master Last_SQL_Errno: 0 Last_SQL_Error: Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Slave_DDL_Groups: 16763 Slave_Non_Transactional_Groups: 7974 Slave_Transactional_Groups: 107415}} We have already tried with RESET SLAVE, clearing relay logs and executing CHANGE MASTER TO again. But the result is the same. I've dumped the master binary log at that position and, interestingly, slave IO thread stops in a binlog block (master binlog format is MIXED). The master/slave set that is failing is the main database server for our organization. It has 1.2 TB of data with 38 databases, views, stored procedures, triggers and complex queries. In the same servers, we have 3 other small sets of master/slaves with the same combination (10.2.18 master and 10.3.10 slaves) and are working perfectly. Of course, these sets are smaller and simpler than the instance that is failing. Thanks |
We have a 10.2.18 master and 4 slaves with 10.2.18. Because of this bug that I've reported last week (https://jira.mariadb.org/browse/MDEV-17420) we tried to upgrade one of the replicas to 10.3.10 to see if the bug was still there in that version.
After the upgrade, the slave began to sync binary log events from the master, but after a few seconds it stopped with the following errors: {noformat}mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave IO thread did not receive an expected Rows-log end-of-statement for event starting at log 'main.010755' position 89390527 whose last block was seen at log 'main.010755' position 89390527. The end-of-statement should have been delivered before the current one at log 'main.010755' position 89390610 mysqld[15138]: 2018-10-17 9:22:28 156 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595{noformat} We checked the binary log in the master and does not seem corrupted and the other 3 replicas are working ok (still with 10.2.18). Also the relay log on the slave seems ok. The slave is stopped with the following status: {noformat}mainro [(none)]> show slave status\G *************************** 1. row *************************** Master_Log_File: main.010755 Read_Master_Log_Pos: 89411517 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 550 Relay_Master_Log_File: main.010755 Slave_IO_Running: No Slave_SQL_Running: Yes Exec_Master_Log_Pos: 89410432 Relay_Log_Space: 1657 Last_IO_Errno: 1595 Last_IO_Error: Relay log write failure: could not queue event from master Last_SQL_Errno: 0 Last_SQL_Error: Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Slave_DDL_Groups: 16763 Slave_Non_Transactional_Groups: 7974 Slave_Transactional_Groups: 107415{noformat} We have already tried with RESET SLAVE, clearing relay logs and executing CHANGE MASTER TO again. But the result is the same. I've dumped the master binary log at that position and, interestingly, slave IO thread stops in a binlog block (master binlog format is MIXED). The master/slave set that is failing is the main database server for our organization. It has 1.2 TB of data with 38 databases, views, stored procedures, triggers and complex queries. In the same servers, we have 3 other small sets of master/slaves with the same combination (10.2.18 master and 10.3.10 slaves) and are working perfectly. Of course, these sets are smaller and simpler than the instance that is failing. Thanks |
Labels | binlog replication slave | binlog need_feedback replication slave |
Labels | binlog need_feedback replication slave | binlog replication slave |
Assignee | Andrei Elkin [ elkin ] |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] |
Fix Version/s | 10.4 [ 22408 ] |
Workflow | MariaDB v3 [ 90165 ] | MariaDB v4 [ 140983 ] |
Fix Version/s | 10.2 [ 14601 ] |
Affects Version/s | 10.8.5 [ 28308 ] |
Affects Version/s | 10.4.28 [ 28509 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Assignee | Andrei Elkin [ elkin ] | suresh ramagiri [ suresh.ramagiri@mariadb.com ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Fix Version/s | 10.3 [ 22126 ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Incomplete [ 4 ] | |
Status | Needs Feedback [ 10501 ] | Closed [ 6 ] |
Zendesk Related Tickets | 165626 172355 |
Can you show the binary log around that position? If you don't want to make it public can you upload it to ftps://ftp.mariadb.com (doc: https://mariadb.com/kb/en/meta/mariadb-ftp-server/ ). You can make an extract binary log by taking the small header and appending it to extracted chunks (using dd) using the offset as a literal file offset. mysqlbinlog can show the required offsets to use and can test the resulting file.