Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8428

Mangled DML statements on 2nd level slave when enabling binlog checksums

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.0.19
    • 10.0.21
    • Replication
    • None
    • CentOS 6.x Linux

    Description

      In a two level replication setup that had originally been running with binlog_checksum=NONE enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master.

      DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text.

      Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes.

      So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own.

      Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format.

      Binlog configuration settings were (after binlog checksums were activated):

      log-bin          = ../log/binlog
      binlog_format    = ROW
      sync_binlog      = 1
      expire_logs_days = 7
      log_slave_updates
      max_binlog_size  = 100M
      binlog_checksum = CRC32
      binlog_ignre_db = ...one database that was not related to the failing statements ...
      relay_log = ../log/relaylog
      max_relay_log_size = 100M
      slave_compressed_protocol = OFF

      The failure only occurred in production and could not be reproduced in a local test setup yet ...

      Attachments

        Issue Links

          Activity

            The actual problem seems to be with "CREATE TABLE ... SELECT" ...

            I can reproduce this with three servers

              1 -> 2 -> 3

            and minimalistic configuration

            [mysqld]
            server_id = ...1,2 or 3..
            user = mysql
            binlog_format = ROW
            binlog_checksum = CRC32
            log_slave_updates

            by executing this on the master:

            USE test;
            CREATE TABLE t1(id INT PRIMARY KEY);
            CREATE TABLE t2 SELECT * FROM t1;

            2nd level slave fails with

                           Last_SQL_Errno: 1064
                           Last_SQL_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '?' at line 3' on query. Default database: 'test'. Query: 'CREATE TABLE `t2` (
              `id` int(11) NOT NULL
            )ˠ�'

            In the master binlog the CREATE event for t2 looks like this:

            # at 799
            #150706 19:53:32 server id 1  end_log_pos 619 CRC32 0x310827fa 
            # Position
            #           |Timestamp   |Type |Master ID   |Size        |Master Pos  |Flags
            #      31f  |bc dc 9a 55 |02   |01 00 00 00 |70 00 00 00 |6b 02 00 00 |00 00
            #
            #      332  06 00 00 00 00 00 00 00  04 00 00 1a 00 00 00 00  |................|
            #      342  00 00 01 00 00 00 00 00  00 00 00 06 03 73 74 64  |.............std|
            #      352  04 21 00 21 00 08 00 74  65 73 74 00 43 52 45 41  |.!.!...test.CREA|
            #      362  54 45 20 54 41 42 4c 45  20 60 74 32 60 20 28 0a  |TE TABLE `t2` (.|
            #      372  20 20 60 69 64 60 20 69  6e 74 28 31 31 29 20 4e  |  `id` int(11) N|
            #      382  4f 54 20 4e 55 4c 4c 0a  29 fa 27 08 31           |OT NULL.).'.1|
            #
            # Event:        Query   thread_id=6     exec_time=0     error_code=0
            SET TIMESTAMP=1436212412/*!*/;
            CREATE TABLE `t2` (
              `id` int(11) NOT NULL
            )
            /*!*/;

            In the intermediate slaves relay log it looks the same (same content and checksum), the only difference is the position within the relay log file (as expected)

            In the intermediate slaves binlog on the other hand:

            # at 513
            #150706 19:53:32 server id 1  end_log_pos 629 CRC32 0x41b3a8d6 
            # Position
            #           |Timestamp   |Type |Master ID   |Size        |Master Pos  |Flags
            #      201  |bc dc 9a 55 |02   |01 00 00 00 |74 00 00 00 |75 02 00 00 |00 00
            #
            #      214  06 00 00 00 55 00 00 00  04 00 00 1a 00 00 00 00  |....U...........|
            #      224  00 00 01 00 00 00 00 00  00 00 00 06 03 73 74 64  |.............std|
            #      234  04 21 00 21 00 08 00 74  65 73 74 00 43 52 45 41  |.!.!...test.CREA|
            #      244  54 45 20 54 41 42 4c 45  20 60 74 32 60 20 28 0a  |TE TABLE `t2` (.|
            #      254  20 20 60 69 64 60 20 69  6e 74 28 31 31 29 20 4e  |  `id` int(11) N|
            #      264  4f 54 20 4e 55 4c 4c 0a  29 cb a0 c2 16 d6 a8 b3  |OT NULL.).......|
            #      274  41                                                |A|
            #
            # Event:        Query   thread_id=6     exec_time=85    error_code=0
            SET TIMESTAMP=1436212412/*!*/;
            CREATE TABLE `t2` (
              `id` int(11) NOT NULL
            )<CB><A0><C2>^V
            /*!*/;

            So there we see four extra bytes between the CREATE statement and the checksum bytes, but these are not the original checksum bytes from the upstream binlog event ...

            hholzgra Hartmut Holzgraefe added a comment - The actual problem seems to be with "CREATE TABLE ... SELECT" ... I can reproduce this with three servers 1 -> 2 -> 3 and minimalistic configuration [mysqld] server_id = ...1,2 or 3.. user = mysql binlog_format = ROW binlog_checksum = CRC32 log_slave_updates by executing this on the master: USE test; CREATE TABLE t1(id INT PRIMARY KEY); CREATE TABLE t2 SELECT * FROM t1; 2nd level slave fails with Last_SQL_Errno: 1064 Last_SQL_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '?' at line 3' on query. Default database: 'test'. Query: 'CREATE TABLE `t2` ( `id` int(11) NOT NULL )ˠ�' In the master binlog the CREATE event for t2 looks like this: # at 799 #150706 19:53:32 server id 1 end_log_pos 619 CRC32 0x310827fa # Position # |Timestamp |Type |Master ID |Size |Master Pos |Flags # 31f |bc dc 9a 55 |02 |01 00 00 00 |70 00 00 00 |6b 02 00 00 |00 00 # # 332 06 00 00 00 00 00 00 00 04 00 00 1a 00 00 00 00 |................| # 342 00 00 01 00 00 00 00 00 00 00 00 06 03 73 74 64 |.............std| # 352 04 21 00 21 00 08 00 74 65 73 74 00 43 52 45 41 |.!.!...test.CREA| # 362 54 45 20 54 41 42 4c 45 20 60 74 32 60 20 28 0a |TE TABLE `t2` (.| # 372 20 20 60 69 64 60 20 69 6e 74 28 31 31 29 20 4e | `id` int(11) N| # 382 4f 54 20 4e 55 4c 4c 0a 29 fa 27 08 31 |OT NULL.).'.1| # # Event: Query thread_id=6 exec_time=0 error_code=0 SET TIMESTAMP=1436212412/*!*/; CREATE TABLE `t2` ( `id` int(11) NOT NULL ) /*!*/; In the intermediate slaves relay log it looks the same (same content and checksum), the only difference is the position within the relay log file (as expected) In the intermediate slaves binlog on the other hand: # at 513 #150706 19:53:32 server id 1 end_log_pos 629 CRC32 0x41b3a8d6 # Position # |Timestamp |Type |Master ID |Size |Master Pos |Flags # 201 |bc dc 9a 55 |02 |01 00 00 00 |74 00 00 00 |75 02 00 00 |00 00 # # 214 06 00 00 00 55 00 00 00 04 00 00 1a 00 00 00 00 |....U...........| # 224 00 00 01 00 00 00 00 00 00 00 00 06 03 73 74 64 |.............std| # 234 04 21 00 21 00 08 00 74 65 73 74 00 43 52 45 41 |.!.!...test.CREA| # 244 54 45 20 54 41 42 4c 45 20 60 74 32 60 20 28 0a |TE TABLE `t2` (.| # 254 20 20 60 69 64 60 20 69 6e 74 28 31 31 29 20 4e | `id` int(11) N| # 264 4f 54 20 4e 55 4c 4c 0a 29 cb a0 c2 16 d6 a8 b3 |OT NULL.).......| # 274 41 |A| # # Event: Query thread_id=6 exec_time=85 error_code=0 SET TIMESTAMP=1436212412/*!*/; CREATE TABLE `t2` ( `id` int(11) NOT NULL )<CB><A0><C2>^V /*!*/; So there we see four extra bytes between the CREATE statement and the checksum bytes, but these are not the original checksum bytes from the upstream binlog event ...

            Not reproducible with MySQL 5.6.25 ...

            hholzgra Hartmut Holzgraefe added a comment - Not reproducible with MySQL 5.6.25 ...

            I did another test run, this time with a debug server and tracing enabled.

            Cheksum for the 2nd CREATE event was 0x41b3a8d6 (1102293206) this time, and the extra 4 bytes were 0x58c76234 (1489461812).

            # at 513
            #150708 12:33:33 server id 1  end_log_pos 629 CRC32 0x41b3a8d6 
            # Position
            #           |Timestamp   |Type |Master ID   |Size        |Master Pos  |Flags
            #      201  |9d 18 9d 55 |02   |01 00 00 00 |74 00 00 00 |75 02 00 00 |00 00
            #
            #      214  06 00 00 00 4c 00 00 00  04 00 00 1a 00 00 00 00  |....L...........|
            #      224  00 00 01 00 00 00 00 00  00 00 00 06 03 73 74 64  |.............std|
            #      234  04 21 00 21 00 08 00 74  65 73 74 00 43 52 45 41  |.!.!...test.CREA|
            #      244  54 45 20 54 41 42 4c 45  20 60 74 32 60 20 28 0a  |TE TABLE `t2` (.|
            #      254  20 20 60 69 64 60 20 69  6e 74 28 31 31 29 20 4e  |  `id` int(11) N|
            #      264  4f 54 20 4e 55 4c 4c 0a  29 34 62 c7 58 d6 a8 b3  |OT NULL.)4b.X...|
            #      274  41                                                |A|
            #
            # Event: 	Query	thread_id=6	exec_time=76	error_code=0
            SET TIMESTAMP=1436358813/*!*/;
            CREATE TABLE `t2` (
              `id` int(11) NOT NULL
            )4b�X
            /*!*/;

            Both values can be found as "info" output from my_checksum() (unfortunately the DBUG_PRINT in there prints the checksum value in decimal, not hex):

            $ grep -n 1102293206 mysqld.trace 
            115043:T@12   : | | | | | | | | | info: crc: 1102293206
            $  grep -n 1489461812 mysqld.trace 
            114375:T@12   : | | | | | | | | | info: crc: 1489461812

            So the extra four bytes were indeed generated as a CRC32 checksum and later overridden by a 2nd event checksum ...

            hholzgra Hartmut Holzgraefe added a comment - I did another test run, this time with a debug server and tracing enabled. Cheksum for the 2nd CREATE event was 0x41b3a8d6 (1102293206) this time, and the extra 4 bytes were 0x58c76234 (1489461812). # at 513 #150708 12:33:33 server id 1 end_log_pos 629 CRC32 0x41b3a8d6 # Position # |Timestamp |Type |Master ID |Size |Master Pos |Flags # 201 |9d 18 9d 55 |02 |01 00 00 00 |74 00 00 00 |75 02 00 00 |00 00 # # 214 06 00 00 00 4c 00 00 00 04 00 00 1a 00 00 00 00 |....L...........| # 224 00 00 01 00 00 00 00 00 00 00 00 06 03 73 74 64 |.............std| # 234 04 21 00 21 00 08 00 74 65 73 74 00 43 52 45 41 |.!.!...test.CREA| # 244 54 45 20 54 41 42 4c 45 20 60 74 32 60 20 28 0a |TE TABLE `t2` (.| # 254 20 20 60 69 64 60 20 69 6e 74 28 31 31 29 20 4e | `id` int(11) N| # 264 4f 54 20 4e 55 4c 4c 0a 29 34 62 c7 58 d6 a8 b3 |OT NULL.)4b.X...| # 274 41 |A| # # Event: Query thread_id=6 exec_time=76 error_code=0 SET TIMESTAMP=1436358813/*!*/; CREATE TABLE `t2` ( `id` int(11) NOT NULL )4b�X /*!*/; Both values can be found as "info" output from my_checksum() (unfortunately the DBUG_PRINT in there prints the checksum value in decimal, not hex): $ grep -n 1102293206 mysqld.trace 115043:T@12 : | | | | | | | | | info: crc: 1102293206 $ grep -n 1489461812 mysqld.trace 114375:T@12 : | | | | | | | | | info: crc: 1489461812 So the extra four bytes were indeed generated as a CRC32 checksum and later overridden by a 2nd event checksum ...

            mysqld.trace from intermediate slave and all binlog/relaylog files from test run

            hholzgra Hartmut Holzgraefe added a comment - mysqld.trace from intermediate slave and all binlog/relaylog files from test run

            The issue seams to be that the slave doesn't take into account that on the slave CREATE ... SELECT is done as one transaction:

            BEGIN
            CREATE...
            INSERT row data into table
            COMMIT

            I have now fixed the CREATE ... SELECT issue so that checksum is calculated for the whole transaction and not separately for the CREATE part.
            Will push into 10.0 soon

            monty Michael Widenius added a comment - The issue seams to be that the slave doesn't take into account that on the slave CREATE ... SELECT is done as one transaction: BEGIN CREATE... INSERT row data into table COMMIT I have now fixed the CREATE ... SELECT issue so that checksum is calculated for the whole transaction and not separately for the CREATE part. Will push into 10.0 soon

            Fix pushed into 10.0 tree

            monty Michael Widenius added a comment - Fix pushed into 10.0 tree

            People

              monty Michael Widenius
              hholzgra Hartmut Holzgraefe
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.