Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33849

Failed to read a certain GTID_EVENT , which has event length as 2e

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • 10.11
    • Replication
    • None

    Description

      We have a client application to read from the mariadb binary log on MariaDB server version 10.11.5-MariaDB-log.

      We started getting below error on trying to ready the mariadb binlog event,

      Failed to replace GTID event with backwards-compatible event: corrupt event.; the first event 'archive_mysql.005039' at 16814957, the last event read from 'archive_mysql.005039' at 16814957, the last byte read from 'archive_mysql.005039' at 16815003.

      1)
      The application works fine with all other GTID_EVENT "START TRANSACTION", but problem is there in reading only this particular event with length 2e. This event has come only once per binlog. Below is that event,

      1. at 16814957
        #240314 17:59:51 server id 1 end_log_pos 16815003 CRC32 0x81e297df
      2. Position
      3. Timestamp Type Master ID Size Master Pos Flags
      4. 100936d |17 3b f3 65 |a2 |01 00 00 00 |2e 00 00 00 |9b 93 00 01 |08 00
        #
      5. 1009380 38 94 2d 0b 00 00 00 00 00 00 00 00 0a 96 9e 61 |8.-............a|
      6. 1009390 04 00 00 00 00 01 ff df 97 e2 81 |...........|
        #
      7. Event: GTID 0-1-187536440 cid=73506454
        /Unable to render embedded object: File (100001 SET @@session.gtid_seq_no=187536440*//*) not found./;
        START TRANSACTION
        /!/;

      There is no problem reading other GTID_EVENT "START TRANSACTION" event as given below,

      1. at 17157329
        #240314 18:00:02 server id 1 end_log_pos 17157373 CRC32 0xd95a7577
      2. Position
      3. Timestamp Type Master ID Size Master Pos Flags
      4. 105ccd1 |22 3b f3 65 |a2 |01 00 00 00 |2c 00 00 00 |fd cc 05 01 |08 00
        #
      5. 105cce4 5c 96 2d 0b 00 00 00 00 00 00 00 00 0a 1e af 61 |\.-............a|
      6. 105ccf4 04 00 00 00 00 77 75 5a d9 |.....wuZ.|
        #
      7. Event: GTID 0-1-187536988 cid=73510686
        /Unable to render embedded object: File (100001 SET @@session.gtid_seq_no=187536988*//*) not found./;
        START TRANSACTION
        /!/;

      So far we are unable to understand the nature of this GTID_EVENT that has 2e length and what could cause this event to appear in the binlog. Can you please provide some information regarding the nature of the event and how does it appear in the binary log.
      Attaching the binary log 5039 , that has the binlog event GTID_EVENT in question at offset 16814957. Also Attaching the output "show variables" command to get the db configurations.

      1.1) What exactly is this GTID_EVENT with length 2e ?
      1.2) How to get this GTID_EVENT in the binlog ?

      2) On searching over , there are two issues reported related to this,

      https://jira.mariadb.org/browse/CONC-650
      https://jira.mariadb.org/browse/MDEV-5754

      But not none of them seems same. The CONC-650 seems related to XA transactions and MDEV-5754 fixed in 10.0.9. The version we are using is 10.11.5.

      In CONC-650 , it is suggested to set the mariadb_slave_capability to 4 , which I understand it MARIA_SLAVE_CAPABILITY_GTID. After setting the mariadb_slave_capability to 4, the client application is receiving that GTID_EVENT properly , without any problem. But we are not certain what would be the implications of setting this capability.

      2.1) What is the default capability of the slave application ?
      2.2) What are the effects of setting mariadb_slave_capability to 4 ? Could there be a problem somewhere , in some scenarios ?

      Attachments

        Issue Links

          Activity

            knielsen Kristian Nielsen added a comment - - edited

            > We have a client application to read from the mariadb binary log on
            > MariaDB server version 10.11.5-MariaDB-log.

            From the description, it sounds like this application is connecting to the
            mariadb server to read binlog data from the running server using
            COM_BINLOG_DUMP.

            > 2.2) What are the effects of setting mariadb_slave_capability to 4 ? Could
            > there be a problem somewhere , in some scenarios ?

            The consequence of setting mariadb_slave_capability to 4
            (MARIA_SLAVE_CAPABILITY_GTID) is that the server will send its binlog data
            unmodified to your application, including mariadb-specific events such as
            ANNOTATE_ROWS_EVENT, BINLOG_CHECKPOINT_EVENT, and GTID_EVENT. It sounds like
            this is what you want to do.

            The consequence of not setting mariadb-specific to 4 is that the server
            will think your application is a MySQL or MariaDB pre-5.3 slave that does
            not know about these new events and would break if it receives them. Thus
            the server will re-write the data on the fly to replace these events with
            something that the old servers can handle.

            > 2.1) What is the default capability of the slave application ?

            The capability sent by all MariaDB versions since 10.0 is 4
            (MARIA_SLAVE_CAPABILITY_GTID).

            So this is a backwards-compatibility mechanism, to allow MariaDB to
            replicate to server versions earlier than 10.0 (or MySQL versions), without
            the old slave getting an error from event types it doesn't recognize.

            > Failed to replace GTID event with backwards-compatible event: corrupt event.; the first event 'archive_mysql.005039' at 16814957, the last event read from 'archive_mysql.005039' at 16814957, the last byte read from 'archive_mysql.005039' at 16815003.
            

            This is a bug in the MariaDB code. It causes certain kinds of GTID events to
            fail to be rewritten for old slave versions. And your application, since it
            is not setting mariadb_slave_capability and thus treated the same as an old
            slave.

            This bug is actually fixed in this patch:

            https://github.com/MariaDB/server/pull/3035

            But that has not yet been merged, and looks like it will only be fixed in
            11.4.

            bnestere, for your info, it looks like your patch for MDEV-7850 will fix
            this bug as well.

            > Can you please provide some information regarding the nature of the event
            > and how does it appear in the binary log.

            >     100936d |17 3b f3 65 |a2 |01 00 00 00 |2e 00 00 00 |9b 93 00 01 |08 00
            >     1009380 38 94 2d 0b 00 00 00 00 00 00 00 00 0a 96 9e 61 |8.-............a|
            >     1009390 04 00 00 00 00 01 ff df 97 e2 81 |...........|
            

            This GTID event has a couple extra bytes 01 ff at the end (before the CRC32
            checksum).
            These are the flags_extra containing FL_EXTRA_MULTI_ENGINE and the value
            UCHAR_MAX for extra_engines.

            The bug is that such GTID events with FL_EXTRA_MULTI_ENGINE set are not
            converted correctly for old slaves, instead throwing the error you see.

            > Attaching the binary log 5039 , that has the binlog event GTID_EVENT in
            > question at offset 16814957.

            Thanks for attaching the binlog. There we can see that this an empty
            transaction, probably resulting from some quirk in the server code when
            rolling back a savepoint:

              START TRANSACTION
              SAVEPOINT `save_invoice_draftc516c65faf7b06712782c44132be0efb3e20f5cea0550fc86e990e20`
              COMMIT
            

            The value UCHAR_MAX for extra_engines means the number of storage engines
            participating in the (empty) transactions is 0.

            > 1.1) What exactly is this GTID_EVENT with length 2e ?
            > 1.2) How to get this GTID_EVENT in the binlog ?

            This is a GTID event containing the flag FL_EXTRA_MULTI_ENGINE as well as
            the field extra_engines.

            It looks like one can obtain this kind of event from committing an empty
            transaction after rolling back to a savepoint. Probably it can also be
            obtained from a transaction that changes both InnoDB and RocksDB tables in
            the same commit.

            Note that for this to trigger, the FL_GROUP_COMMIT_ID flag also has to be
            set (seen as cid=73506454 in your binlog output). This happens when the
            transaction was group-committed to the binlog along with some other
            transaction(s); whether this happens or not depends on the exact timing of
            different commits running in parallel on the master.

            > After setting the mariadb_slave_capability to 4, the client application is
            > receiving that GTID_EVENT properly , without any problem. But we are not
            > certain what would be the implications of setting this capability.

            Yes, because then the server knows your application can understand the GTID
            event, so the event is sent without rewriting, bypassing the bug. There
            should be no bad implications for your application since it already knows
            how to parse the GTID events.

            knielsen Kristian Nielsen added a comment - - edited > We have a client application to read from the mariadb binary log on > MariaDB server version 10.11.5-MariaDB-log. From the description, it sounds like this application is connecting to the mariadb server to read binlog data from the running server using COM_BINLOG_DUMP. > 2.2) What are the effects of setting mariadb_slave_capability to 4 ? Could > there be a problem somewhere , in some scenarios ? The consequence of setting mariadb_slave_capability to 4 (MARIA_SLAVE_CAPABILITY_GTID) is that the server will send its binlog data unmodified to your application, including mariadb-specific events such as ANNOTATE_ROWS_EVENT, BINLOG_CHECKPOINT_EVENT, and GTID_EVENT. It sounds like this is what you want to do. The consequence of not setting mariadb-specific to 4 is that the server will think your application is a MySQL or MariaDB pre-5.3 slave that does not know about these new events and would break if it receives them. Thus the server will re-write the data on the fly to replace these events with something that the old servers can handle. > 2.1) What is the default capability of the slave application ? The capability sent by all MariaDB versions since 10.0 is 4 (MARIA_SLAVE_CAPABILITY_GTID). So this is a backwards-compatibility mechanism, to allow MariaDB to replicate to server versions earlier than 10.0 (or MySQL versions), without the old slave getting an error from event types it doesn't recognize. > Failed to replace GTID event with backwards-compatible event: corrupt event.; the first event 'archive_mysql.005039' at 16814957, the last event read from 'archive_mysql.005039' at 16814957, the last byte read from 'archive_mysql.005039' at 16815003. This is a bug in the MariaDB code. It causes certain kinds of GTID events to fail to be rewritten for old slave versions. And your application, since it is not setting mariadb_slave_capability and thus treated the same as an old slave. This bug is actually fixed in this patch: https://github.com/MariaDB/server/pull/3035 But that has not yet been merged, and looks like it will only be fixed in 11.4. bnestere , for your info, it looks like your patch for MDEV-7850 will fix this bug as well. > Can you please provide some information regarding the nature of the event > and how does it appear in the binary log. > 100936d |17 3b f3 65 |a2 |01 00 00 00 |2e 00 00 00 |9b 93 00 01 |08 00 > 1009380 38 94 2d 0b 00 00 00 00 00 00 00 00 0a 96 9e 61 |8.-............a| > 1009390 04 00 00 00 00 01 ff df 97 e2 81 |...........| This GTID event has a couple extra bytes 01 ff at the end (before the CRC32 checksum). These are the flags_extra containing FL_EXTRA_MULTI_ENGINE and the value UCHAR_MAX for extra_engines. The bug is that such GTID events with FL_EXTRA_MULTI_ENGINE set are not converted correctly for old slaves, instead throwing the error you see. > Attaching the binary log 5039 , that has the binlog event GTID_EVENT in > question at offset 16814957. Thanks for attaching the binlog. There we can see that this an empty transaction, probably resulting from some quirk in the server code when rolling back a savepoint: START TRANSACTION SAVEPOINT `save_invoice_draftc516c65faf7b06712782c44132be0efb3e20f5cea0550fc86e990e20` COMMIT The value UCHAR_MAX for extra_engines means the number of storage engines participating in the (empty) transactions is 0. > 1.1) What exactly is this GTID_EVENT with length 2e ? > 1.2) How to get this GTID_EVENT in the binlog ? This is a GTID event containing the flag FL_EXTRA_MULTI_ENGINE as well as the field extra_engines. It looks like one can obtain this kind of event from committing an empty transaction after rolling back to a savepoint. Probably it can also be obtained from a transaction that changes both InnoDB and RocksDB tables in the same commit. Note that for this to trigger, the FL_GROUP_COMMIT_ID flag also has to be set (seen as cid=73506454 in your binlog output). This happens when the transaction was group-committed to the binlog along with some other transaction(s); whether this happens or not depends on the exact timing of different commits running in parallel on the master. > After setting the mariadb_slave_capability to 4, the client application is > receiving that GTID_EVENT properly , without any problem. But we are not > certain what would be the implications of setting this capability. Yes, because then the server knows your application can understand the GTID event, so the event is sent without rewriting, bypassing the bug. There should be no bad implications for your application since it already knows how to parse the GTID events.
            ashikkum19 Ashish added a comment - - edited

            Thanks a lot for all the explanations , very helpful. Just have one question, if we do not send the slave capability level to 4 to the server in the client application , what capability level the server shall understand for such client ?

            Thanks,

            ashikkum19 Ashish added a comment - - edited Thanks a lot for all the explanations , very helpful. Just have one question, if we do not send the slave capability level to 4 to the server in the client application , what capability level the server shall understand for such client ? Thanks,

            Sending no capability is the same as sending capability 0 (MARIA_SLAVE_CAPABILITY_UNKNOWN). This corresponds to a MariaDB server older than 5.3, or a MySQL server.

            There are some comments in the source code in sql/log_event.h that describe the available capabilities:

            /*
              These are capability numbers for MariaDB slave servers.
             
              Newer MariaDB slaves set this to inform the master about their capabilities.
              This allows the master to decide which events it can send to the slave
              without breaking replication on old slaves that maybe do not understand
              all events from newer masters.
             
              As new releases are backwards compatible, a given capability implies also
              all capabilities with smaller number.
             
              Older MariaDB slaves and other MySQL slave servers do not set this, so they
              are recorded with capability 0.
            */
             
            /* MySQL or old MariaDB slave with no announced capability. */
            #define MARIA_SLAVE_CAPABILITY_UNKNOWN 0
            /* MariaDB >= 5.3, which understands ANNOTATE_ROWS_EVENT. */
            #define MARIA_SLAVE_CAPABILITY_ANNOTATE 1
            /*
              MariaDB >= 5.5. This version has the capability to tolerate events omitted
              from the binlog stream without breaking replication (MySQL slaves fail
              because they mis-compute the offsets into the master's binlog).
            */
            #define MARIA_SLAVE_CAPABILITY_TOLERATE_HOLES 2
            /* MariaDB >= 10.0, which knows about binlog_checkpoint_log_event. */
            #define MARIA_SLAVE_CAPABILITY_BINLOG_CHECKPOINT 3
            /* MariaDB >= 10.0.1, which knows about global transaction id events. */
            #define MARIA_SLAVE_CAPABILITY_GTID 4
            

            knielsen Kristian Nielsen added a comment - Sending no capability is the same as sending capability 0 (MARIA_SLAVE_CAPABILITY_UNKNOWN). This corresponds to a MariaDB server older than 5.3, or a MySQL server. There are some comments in the source code in sql/log_event.h that describe the available capabilities: /* These are capability numbers for MariaDB slave servers.   Newer MariaDB slaves set this to inform the master about their capabilities. This allows the master to decide which events it can send to the slave without breaking replication on old slaves that maybe do not understand all events from newer masters.   As new releases are backwards compatible, a given capability implies also all capabilities with smaller number.   Older MariaDB slaves and other MySQL slave servers do not set this, so they are recorded with capability 0. */   /* MySQL or old MariaDB slave with no announced capability. */ #define MARIA_SLAVE_CAPABILITY_UNKNOWN 0 /* MariaDB >= 5.3, which understands ANNOTATE_ROWS_EVENT. */ #define MARIA_SLAVE_CAPABILITY_ANNOTATE 1 /* MariaDB >= 5.5. This version has the capability to tolerate events omitted from the binlog stream without breaking replication (MySQL slaves fail because they mis-compute the offsets into the master's binlog). */ #define MARIA_SLAVE_CAPABILITY_TOLERATE_HOLES 2 /* MariaDB >= 10.0, which knows about binlog_checkpoint_log_event. */ #define MARIA_SLAVE_CAPABILITY_BINLOG_CHECKPOINT 3 /* MariaDB >= 10.0.1, which knows about global transaction id events. */ #define MARIA_SLAVE_CAPABILITY_GTID 4
            ashikkum19 Ashish added a comment -

            Thank You Ver Much.

            ashikkum19 Ashish added a comment - Thank You Ver Much.

            People

              bnestere Brandon Nesterenko
              ashikkum19 Ashish
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.