[MDEV-8428] Mangled DML statements on 2nd level slave when enabling binlog checksums Created: 2015-07-06  Updated: 2015-07-27  Resolved: 2015-07-27

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.19
Fix Version/s: 10.0.21

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Michael Widenius
Resolution: Fixed Votes: 1
Labels: None
Environment:

CentOS 6.x Linux


Attachments: File files.tar.gz    
Issue Links:
Blocks
blocks MDEV-8429 Change binlog_checksum default to mat... Closed

 Description   

In a two level replication setup that had originally been running with binlog_checksum=NONE enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master.

DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text.

Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes.

So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own.

Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format.

Binlog configuration settings were (after binlog checksums were activated):

log-bin          = ../log/binlog
binlog_format    = ROW
sync_binlog      = 1
expire_logs_days = 7
log_slave_updates
max_binlog_size  = 100M
binlog_checksum = CRC32
binlog_ignre_db = ...one database that was not related to the failing statements ...
relay_log = ../log/relaylog
max_relay_log_size = 100M
slave_compressed_protocol = OFF

The failure only occurred in production and could not be reproduced in a local test setup yet ...



 Comments   
Comment by Hartmut Holzgraefe [ 2015-07-06 ]

The actual problem seems to be with "CREATE TABLE ... SELECT" ...

I can reproduce this with three servers

  1 -> 2 -> 3

and minimalistic configuration

[mysqld]
server_id = ...1,2 or 3..
user = mysql
binlog_format = ROW
binlog_checksum = CRC32
log_slave_updates

by executing this on the master:

USE test;
CREATE TABLE t1(id INT PRIMARY KEY);
CREATE TABLE t2 SELECT * FROM t1;

2nd level slave fails with

               Last_SQL_Errno: 1064
               Last_SQL_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '?' at line 3' on query. Default database: 'test'. Query: 'CREATE TABLE `t2` (
  `id` int(11) NOT NULL
)ˠ�'

In the master binlog the CREATE event for t2 looks like this:

# at 799
#150706 19:53:32 server id 1  end_log_pos 619 CRC32 0x310827fa 
# Position
#           |Timestamp   |Type |Master ID   |Size        |Master Pos  |Flags
#      31f  |bc dc 9a 55 |02   |01 00 00 00 |70 00 00 00 |6b 02 00 00 |00 00
#
#      332  06 00 00 00 00 00 00 00  04 00 00 1a 00 00 00 00  |................|
#      342  00 00 01 00 00 00 00 00  00 00 00 06 03 73 74 64  |.............std|
#      352  04 21 00 21 00 08 00 74  65 73 74 00 43 52 45 41  |.!.!...test.CREA|
#      362  54 45 20 54 41 42 4c 45  20 60 74 32 60 20 28 0a  |TE TABLE `t2` (.|
#      372  20 20 60 69 64 60 20 69  6e 74 28 31 31 29 20 4e  |  `id` int(11) N|
#      382  4f 54 20 4e 55 4c 4c 0a  29 fa 27 08 31           |OT NULL.).'.1|
#
# Event:        Query   thread_id=6     exec_time=0     error_code=0
SET TIMESTAMP=1436212412/*!*/;
CREATE TABLE `t2` (
  `id` int(11) NOT NULL
)
/*!*/;

In the intermediate slaves relay log it looks the same (same content and checksum), the only difference is the position within the relay log file (as expected)

In the intermediate slaves binlog on the other hand:

# at 513
#150706 19:53:32 server id 1  end_log_pos 629 CRC32 0x41b3a8d6 
# Position
#           |Timestamp   |Type |Master ID   |Size        |Master Pos  |Flags
#      201  |bc dc 9a 55 |02   |01 00 00 00 |74 00 00 00 |75 02 00 00 |00 00
#
#      214  06 00 00 00 55 00 00 00  04 00 00 1a 00 00 00 00  |....U...........|
#      224  00 00 01 00 00 00 00 00  00 00 00 06 03 73 74 64  |.............std|
#      234  04 21 00 21 00 08 00 74  65 73 74 00 43 52 45 41  |.!.!...test.CREA|
#      244  54 45 20 54 41 42 4c 45  20 60 74 32 60 20 28 0a  |TE TABLE `t2` (.|
#      254  20 20 60 69 64 60 20 69  6e 74 28 31 31 29 20 4e  |  `id` int(11) N|
#      264  4f 54 20 4e 55 4c 4c 0a  29 cb a0 c2 16 d6 a8 b3  |OT NULL.).......|
#      274  41                                                |A|
#
# Event:        Query   thread_id=6     exec_time=85    error_code=0
SET TIMESTAMP=1436212412/*!*/;
CREATE TABLE `t2` (
  `id` int(11) NOT NULL
)<CB><A0><C2>^V
/*!*/;

So there we see four extra bytes between the CREATE statement and the checksum bytes, but these are not the original checksum bytes from the upstream binlog event ...

Comment by Hartmut Holzgraefe [ 2015-07-07 ]

Not reproducible with MySQL 5.6.25 ...

Comment by Hartmut Holzgraefe [ 2015-07-08 ]

I did another test run, this time with a debug server and tracing enabled.

Cheksum for the 2nd CREATE event was 0x41b3a8d6 (1102293206) this time, and the extra 4 bytes were 0x58c76234 (1489461812).

# at 513
#150708 12:33:33 server id 1  end_log_pos 629 CRC32 0x41b3a8d6 
# Position
#           |Timestamp   |Type |Master ID   |Size        |Master Pos  |Flags
#      201  |9d 18 9d 55 |02   |01 00 00 00 |74 00 00 00 |75 02 00 00 |00 00
#
#      214  06 00 00 00 4c 00 00 00  04 00 00 1a 00 00 00 00  |....L...........|
#      224  00 00 01 00 00 00 00 00  00 00 00 06 03 73 74 64  |.............std|
#      234  04 21 00 21 00 08 00 74  65 73 74 00 43 52 45 41  |.!.!...test.CREA|
#      244  54 45 20 54 41 42 4c 45  20 60 74 32 60 20 28 0a  |TE TABLE `t2` (.|
#      254  20 20 60 69 64 60 20 69  6e 74 28 31 31 29 20 4e  |  `id` int(11) N|
#      264  4f 54 20 4e 55 4c 4c 0a  29 34 62 c7 58 d6 a8 b3  |OT NULL.)4b.X...|
#      274  41                                                |A|
#
# Event: 	Query	thread_id=6	exec_time=76	error_code=0
SET TIMESTAMP=1436358813/*!*/;
CREATE TABLE `t2` (
  `id` int(11) NOT NULL
)4b�X
/*!*/;

Both values can be found as "info" output from my_checksum() (unfortunately the DBUG_PRINT in there prints the checksum value in decimal, not hex):

$ grep -n 1102293206 mysqld.trace 
115043:T@12   : | | | | | | | | | info: crc: 1102293206
$  grep -n 1489461812 mysqld.trace 
114375:T@12   : | | | | | | | | | info: crc: 1489461812

So the extra four bytes were indeed generated as a CRC32 checksum and later overridden by a 2nd event checksum ...

Comment by Hartmut Holzgraefe [ 2015-07-08 ]

mysqld.trace from intermediate slave and all binlog/relaylog files from test run

Comment by Michael Widenius [ 2015-07-18 ]

The issue seams to be that the slave doesn't take into account that on the slave CREATE ... SELECT is done as one transaction:

BEGIN
CREATE...
INSERT row data into table
COMMIT

I have now fixed the CREATE ... SELECT issue so that checksum is calculated for the whole transaction and not separately for the CREATE part.
Will push into 10.0 soon

Comment by Michael Widenius [ 2015-07-27 ]

Fix pushed into 10.0 tree

Generated at Thu Feb 08 07:27:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.