Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32570

Fragment ROW replication events larger than max_packet_size

Details

    Description

      When ROW based replication is used and the transaction size, converted to base64 encoding, exceeds max_packet_size, replication breaks:

      Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mariadb-bin.109824' at 4, the last event read from 'mariadb-bin.109825' at 15113554, the last byte read from 'mariadb-bin.109825' at 15113573.'
      

      Unfortunately, max_packet_size maximum allowed value is 1G.

      A blob over 500M large will cause the binlogged event to be larger than 1G so that it will crash replication, with NO workaround other than manually extracting the transaction from master's binlog and apply to slave.

      This ticket involves fragmenting row events larger than max_packet_size into multiple events. Additionally, Update_rows_log_events can be split into two series of events: 1) a series of deletes that cover the OLD rows of the update, and then 2) a series of inserts that cover the NEW rows of the update.

      Original solution kept for reference:

      My proposal: before binlogging the event on master, when the computed base64-encoded size is greater than max_packet_size, just convert format to STATEMENT for this event only.

      Link to corresponding Zulip topic is here.

      Attachments

        Issue Links

          Activity

            Couldn't Query_log_event with the BINLOG statement inside serve as a generic "container event" which is perfectly compatible with both old and new slaves?

            serg Sergei Golubchik added a comment - Couldn't Query_log_event with the BINLOG statement inside serve as a generic "container event" which is perfectly compatible with both old and new slaves?
            Elkin Andrei Elkin added a comment - - edited

            serg, do you mean Query_log_event(query=BINLOG '...',...) with the proposed @f1, @f2... as a part if its context?

            Elkin Andrei Elkin added a comment - - edited serg , do you mean Query_log_event(query=BINLOG '...',...) with the proposed @f1, @f2... as a part if its context?

            Apart from the issue of solving the problem for old slaves, the right
            solution is to introduce proper events for the specific purpose. So better
            to have a real encapsulation events than some simulation using BINLOG.
            Similar to how it was a mistake that we use special query events for COMMIT
            and ROLLBACK and so on instead of real end-of-event-group event.

            But can a BINLOG statement be used to solve the problem without changes on
            the slave? I looked at the code, and it seems there are are few problems:

            1. The BINLOG statement has to contain the complete event. It is not
            possible to do BINLOG <1st part>; BINLOG <2nd part>; and the server will
            combine the data into a single event.

            2. There is a special syntax BINLOG @a, @b; which can take only two specific
            user variables and combine the data. So SET @a=<1st part>; SET @b=<2nd part>;
            BINLOG @a, @b; can extend the limit to around 1.5 GB considering the
            overhead of base64.

            3. Using something like SET @a= <1st part>; SET @a= CONCAT(@a, <2nd part>);
            ... ; SET @b=(<last part>); BINLOG @a, @b; can extend the limit to around 4
            GB (2GB for Update_rows_log_event with before and after image), the hard
            limit for the length of an event. This feels a bit too hackish though,
            probably has a O(N**2) overhead on the size of the event.

            4. Even if the old slave can work with an event the new master fragmented
            using BINLOG like in (3), the old slave will still write its own binlog in a
            way that will break a next-level slave.

            To me, these problems make the approach of fragmenting using BINLOG not
            worth it. I don't think it is sufficiently important to make old slave work
            with >1GB blob to implement this trickery, even if it was possible somehow.
            And as soon as we need to make any change to a slave to make it work, we are
            better off implementing the clean solution with a real fragmented event.

            It is good to consider the possibility though, of course.

            knielsen Kristian Nielsen added a comment - Apart from the issue of solving the problem for old slaves, the right solution is to introduce proper events for the specific purpose. So better to have a real encapsulation events than some simulation using BINLOG. Similar to how it was a mistake that we use special query events for COMMIT and ROLLBACK and so on instead of real end-of-event-group event. But can a BINLOG statement be used to solve the problem without changes on the slave? I looked at the code, and it seems there are are few problems: 1. The BINLOG statement has to contain the complete event. It is not possible to do BINLOG <1st part>; BINLOG <2nd part>; and the server will combine the data into a single event. 2. There is a special syntax BINLOG @a, @b; which can take only two specific user variables and combine the data. So SET @a=<1st part>; SET @b=<2nd part>; BINLOG @a, @b; can extend the limit to around 1.5 GB considering the overhead of base64. 3. Using something like SET @a= <1st part>; SET @a= CONCAT(@a, <2nd part>); ... ; SET @b=(<last part>); BINLOG @a, @b; can extend the limit to around 4 GB (2GB for Update_rows_log_event with before and after image), the hard limit for the length of an event. This feels a bit too hackish though, probably has a O(N**2) overhead on the size of the event. 4. Even if the old slave can work with an event the new master fragmented using BINLOG like in (3), the old slave will still write its own binlog in a way that will break a next-level slave. To me, these problems make the approach of fragmenting using BINLOG not worth it. I don't think it is sufficiently important to make old slave work with >1GB blob to implement this trickery, even if it was possible somehow. And as soon as we need to make any change to a slave to make it work, we are better off implementing the clean solution with a real fragmented event. It is good to consider the possibility though, of course.

            Yes, while I meant to use BINLOG as a generic way to wrap events, the purpose was, of course. to pass longer events through using 2) from the above. I'd think it'd be an ok solution for the corner case that it is.

            But if you're right about 4) then it'd be a deal breaker, of course. It doesn't make sense to try to support old slaves if it'll break replication nevertheless. Are you sure they'll replicate the original large row event and not SQL BINLOG statement they've received from the master?

            serg Sergei Golubchik added a comment - Yes, while I meant to use BINLOG as a generic way to wrap events, the purpose was, of course. to pass longer events through using 2) from the above. I'd think it'd be an ok solution for the corner case that it is. But if you're right about 4) then it'd be a deal breaker, of course. It doesn't make sense to try to support old slaves if it'll break replication nevertheless. Are you sure they'll replicate the original large row event and not SQL BINLOG statement they've received from the master?

            I'm not sure if it's even possible for the slave to replicate a BINLOG statement, I don't think a BINLOG statement can ever be binlogged on the master in current code, so it's probably not tested, and the code is using specially crafted THD::rli_fake and THD::rgi_fake to apply the event instead of the real slave's rli and rgi.

            But yes, the BINLOG statement is binlogged not as a BINLOG query event, but as separate row events corresponding to the row events in the BINLOG base64 data. Test case:

            --source include/have_innodb.inc
            --source include/have_binlog_format_row.inc
             
            CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
             
            BINLOG '
            lJlBZw8BAAAA/AAAAAABAAAAAAQAMTEuNC4yLU1hcmlhREItZGVidWctbG9nAAAAAAAAAAAAAAAA
            AAAAAAAAAAAAAAAAAACUmUFnEzgNAAgAEgAEBAQEEgAA5AAEGggAAAAICAgCAAAACgoKAAAAAAAA
            AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
            AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
            AAAAAAAAAAAEEwQADQgICAoKCgEmgElp
            ';
            START TRANSACTION;
            BINLOG '
            lJlBZxMBAAAALQAAAAAAAAAAACAAAAAAAAEABHRlc3QAAnQxAAEDAABYaDFY
            lJlBZxcBAAAAJgAAAAAAAAAAACAAAAAAAAEAAf/+AQAAAEeHK7A=
            ';
            COMMIT/*!*/;
            SELECT * FROM t1 ORDER BY a;
             
            DROP TABLE t1;
            

            The resulting binlog:

            # at 496
            #241123 10:00:04 server id 1  end_log_pos 538 CRC32 0x4b2c2bf2 	GTID 0-1-2 trans
            /*!100001 SET @@session.gtid_seq_no=2*//*!*/;
            START TRANSACTION
            /*!*/;
            # at 538
            # at 685
            #241123 10:00:04 server id 1  end_log_pos 0 CRC32 0xd3b110c6 	Annotate_rows:
            #Q> BINLOG '
            #Q> lJlBZxMBAAAALQAAAAAAAAAAACAAAAAAAAEABHRlc3QAAnQxAAEDAABYaDFY
            #Q> lJlBZxcBAAAAJgAAAAAAAAAAACAAAAAAAAEAAf/+AQAAAEeHK7A=
            #241123 10:00:04 server id 1  end_log_pos 0 CRC32 0x58316858 	Table_map: `test`.`t1` mapped to number 32
            # at 730
            #241123 10:00:04 server id 1  end_log_pos 0 CRC32 0xb02b8747 	Write_rows: table id 32 flags: STMT_END_F
             
            BINLOG '
            lJlBZxMBAAAALQAAAAAAAAAAACAAAAAAAAEABHRlc3QAAnQxAAEDAABYaDFY
            lJlBZxcBAAAAJgAAAAAAAAAAACAAAAAAAAEAAf/+AQAAAEeHK7A=
            '/*!*/;
            # Number of rows: 1
            # at 768
            #241123 10:00:04 server id 1  end_log_pos 799 CRC32 0xa2ba3881 	Xid = 29
            COMMIT/*!*/;
            

            knielsen Kristian Nielsen added a comment - I'm not sure if it's even possible for the slave to replicate a BINLOG statement, I don't think a BINLOG statement can ever be binlogged on the master in current code, so it's probably not tested, and the code is using specially crafted THD::rli_fake and THD::rgi_fake to apply the event instead of the real slave's rli and rgi. But yes, the BINLOG statement is binlogged not as a BINLOG query event, but as separate row events corresponding to the row events in the BINLOG base64 data. Test case: --source include/have_innodb.inc --source include/have_binlog_format_row.inc   CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;   BINLOG ' lJlBZw8BAAAA/AAAAAABAAAAAAQAMTEuNC4yLU1hcmlhREItZGVidWctbG9nAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAACUmUFnEzgNAAgAEgAEBAQEEgAA5AAEGggAAAAICAgCAAAACgoKAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAEEwQADQgICAoKCgEmgElp '; START TRANSACTION; BINLOG ' lJlBZxMBAAAALQAAAAAAAAAAACAAAAAAAAEABHRlc3QAAnQxAAEDAABYaDFY lJlBZxcBAAAAJgAAAAAAAAAAACAAAAAAAAEAAf/+AQAAAEeHK7A= '; COMMIT/*!*/; SELECT * FROM t1 ORDER BY a;   DROP TABLE t1; The resulting binlog: # at 496 #241123 10:00:04 server id 1 end_log_pos 538 CRC32 0x4b2c2bf2 GTID 0-1-2 trans /*!100001 SET @@session.gtid_seq_no=2*//*!*/; START TRANSACTION /*!*/; # at 538 # at 685 #241123 10:00:04 server id 1 end_log_pos 0 CRC32 0xd3b110c6 Annotate_rows: #Q> BINLOG ' #Q> lJlBZxMBAAAALQAAAAAAAAAAACAAAAAAAAEABHRlc3QAAnQxAAEDAABYaDFY #Q> lJlBZxcBAAAAJgAAAAAAAAAAACAAAAAAAAEAAf/+AQAAAEeHK7A= #241123 10:00:04 server id 1 end_log_pos 0 CRC32 0x58316858 Table_map: `test`.`t1` mapped to number 32 # at 730 #241123 10:00:04 server id 1 end_log_pos 0 CRC32 0xb02b8747 Write_rows: table id 32 flags: STMT_END_F   BINLOG ' lJlBZxMBAAAALQAAAAAAAAAAACAAAAAAAAEABHRlc3QAAnQxAAEDAABYaDFY lJlBZxcBAAAAJgAAAAAAAAAAACAAAAAAAAEAAf/+AQAAAEeHK7A= '/*!*/; # Number of rows: 1 # at 768 #241123 10:00:04 server id 1 end_log_pos 799 CRC32 0xa2ba3881 Xid = 29 COMMIT/*!*/;

            People

              bnestere Brandon Nesterenko
              rpizzi Rick Pizzi (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.