[MDEV-8428] Mangled DML statements on 2nd level slave when enabling binlog checksums Created: 2015-07-06 Updated: 2015-07-27 Resolved: 2015-07-27 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.0.19 |
| Fix Version/s: | 10.0.21 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Hartmut Holzgraefe | Assignee: | Michael Widenius |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Environment: |
CentOS 6.x Linux |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
In a two level replication setup that had originally been running with binlog_checksum=NONE enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master. DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text. Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes. So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own. Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format. Binlog configuration settings were (after binlog checksums were activated):
The failure only occurred in production and could not be reproduced in a local test setup yet ... |
| Comments |
| Comment by Hartmut Holzgraefe [ 2015-07-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The actual problem seems to be with "CREATE TABLE ... SELECT" ... I can reproduce this with three servers
and minimalistic configuration
by executing this on the master:
2nd level slave fails with
In the master binlog the CREATE event for t2 looks like this:
In the intermediate slaves relay log it looks the same (same content and checksum), the only difference is the position within the relay log file (as expected) In the intermediate slaves binlog on the other hand:
So there we see four extra bytes between the CREATE statement and the checksum bytes, but these are not the original checksum bytes from the upstream binlog event ... | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-07-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Not reproducible with MySQL 5.6.25 ... | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I did another test run, this time with a debug server and tracing enabled. Cheksum for the 2nd CREATE event was 0x41b3a8d6 (1102293206) this time, and the extra 4 bytes were 0x58c76234 (1489461812).
Both values can be found as "info" output from my_checksum() (unfortunately the DBUG_PRINT in there prints the checksum value in decimal, not hex):
So the extra four bytes were indeed generated as a CRC32 checksum and later overridden by a 2nd event checksum ... | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mysqld.trace from intermediate slave and all binlog/relaylog files from test run | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2015-07-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The issue seams to be that the slave doesn't take into account that on the slave CREATE ... SELECT is done as one transaction: BEGIN I have now fixed the CREATE ... SELECT issue so that checksum is calculated for the whole transaction and not separately for the CREATE part. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2015-07-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fix pushed into 10.0 tree |