Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8428

Mangled DML statements on 2nd level slave when enabling binlog checksums

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.0.19
    • 10.0.21
    • Replication
    • None
    • CentOS 6.x Linux

    Description

      In a two level replication setup that had originally been running with binlog_checksum=NONE enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master.

      DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text.

      Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes.

      So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own.

      Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format.

      Binlog configuration settings were (after binlog checksums were activated):

      log-bin          = ../log/binlog
      binlog_format    = ROW
      sync_binlog      = 1
      expire_logs_days = 7
      log_slave_updates
      max_binlog_size  = 100M
      binlog_checksum = CRC32
      binlog_ignre_db = ...one database that was not related to the failing statements ...
      relay_log = ../log/relaylog
      max_relay_log_size = 100M
      slave_compressed_protocol = OFF

      The failure only occurred in production and could not be reproduced in a local test setup yet ...

      Attachments

        Issue Links

          Activity

            hholzgra Hartmut Holzgraefe created issue -
            hholzgra Hartmut Holzgraefe made changes -
            Field Original Value New Value
            Description In a two level replication setup enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master.

            DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text.

            Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes.

            So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own.

            Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format.

            Binlog configuration settings were (after binlog checksums were activated):

            log-bin = ../log/binlog
            binlog_format = ROW
            sync_binlog = 1
            expire_logs_days = 7
            log_slave_updates
            max_binlog_size = 100M
            binlog_checksum = CRC32
            binlog_ignre_db = ...one database that was not related to the failing statements ...
            relay_log = ../log/relaylog
            max_relay_log_size = 100M
            slave_compressed_protocol = OFF

            The failure only occurred in production and could not be reproduced in a local test setup yet ...
            In a two level replication setup enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master.

            DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text.

            Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes.

            So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own.

            Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format.

            Binlog configuration settings were (after binlog checksums were activated):

            {noformat}
            log-bin = ../log/binlog
            binlog_format = ROW
            sync_binlog = 1
            expire_logs_days = 7
            log_slave_updates
            max_binlog_size = 100M
            binlog_checksum = CRC32
            binlog_ignre_db = ...one database that was not related to the failing statements ...
            relay_log = ../log/relaylog
            max_relay_log_size = 100M
            slave_compressed_protocol = OFF
            {noformat}

            The failure only occurred in production and could not be reproduced in a local test setup yet ...
            hholzgra Hartmut Holzgraefe made changes -
            Description In a two level replication setup enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master.

            DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text.

            Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes.

            So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own.

            Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format.

            Binlog configuration settings were (after binlog checksums were activated):

            {noformat}
            log-bin = ../log/binlog
            binlog_format = ROW
            sync_binlog = 1
            expire_logs_days = 7
            log_slave_updates
            max_binlog_size = 100M
            binlog_checksum = CRC32
            binlog_ignre_db = ...one database that was not related to the failing statements ...
            relay_log = ../log/relaylog
            max_relay_log_size = 100M
            slave_compressed_protocol = OFF
            {noformat}

            The failure only occurred in production and could not be reproduced in a local test setup yet ...
            In a two level replication setup that had originally been running with binlog_checksum=NONE enabling binlog_checksum=CRC32 DDL statements executed on the primary master caused SQL execution errors on 2nd level slaves, but not the intermediate master.

            DDL statements (seen on CREATE TABLE and ALTER TABLE) failed as there were extra random characters added at the end of the SQL statement text.

            Looking at the failing statement with "mysqlbinlog --hexdump" showed that in addition to the expected four checksum bytes (as seen in the binlog event header) four additional bytes had been added between the end of the statement and the checksum bytes.

            So it looks as if the intermediate master somehow added a 2nd CRC32 checksum to these events instead of replacing it with a new checksum of its own.

            Only DDL statements were affected, probably as only these are replicated as SQL text in ROW format.

            Binlog configuration settings were (after binlog checksums were activated):

            {noformat}
            log-bin = ../log/binlog
            binlog_format = ROW
            sync_binlog = 1
            expire_logs_days = 7
            log_slave_updates
            max_binlog_size = 100M
            binlog_checksum = CRC32
            binlog_ignre_db = ...one database that was not related to the failing statements ...
            relay_log = ../log/relaylog
            max_relay_log_size = 100M
            slave_compressed_protocol = OFF
            {noformat}

            The failure only occurred in production and could not be reproduced in a local test setup yet ...
            hholzgra Hartmut Holzgraefe made changes -
            elenst Elena Stepanova made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            elenst Elena Stepanova made changes -
            Fix Version/s 10.0 [ 16000 ]
            Assignee Kristian Nielsen [ knielsen ]
            Priority Major [ 3 ] Critical [ 2 ]
            hholzgra Hartmut Holzgraefe made changes -
            Attachment files.tar.gz [ 38700 ]
            nirbhay_c Nirbhay Choubey (Inactive) made changes -
            Assignee Kristian Nielsen [ knielsen ] Nirbhay Choubey [ nirbhay_c ]
            nirbhay_c Nirbhay Choubey (Inactive) made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            monty Michael Widenius made changes -
            Assignee Nirbhay Choubey [ nirbhay_c ] Michael Widenius [ monty ]
            monty Michael Widenius made changes -
            Fix Version/s 10.0.21 [ 19406 ]
            Fix Version/s 10.0 [ 16000 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 70501 ] MariaDB v4 [ 149339 ]

            People

              monty Michael Widenius
              hholzgra Hartmut Holzgraefe
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.