Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12038

Replication is stuck on a wrongly formated Binlog

Details

    Description

      Sql thread is running on the slave but seems to be blocked on a specific position.

      when looking at that position on the master , the format seems to be broken (overlap between #at and end_log_pos lines:

      MariaDB [(none)]> show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 10.193.9.31
                        Master_User: repl
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.001976
                Read_Master_Log_Pos: 2042964
                     Relay_Log_File: slave-relay.000004
                      Relay_Log_Pos: 66666208
              Relay_Master_Log_File: mysql-bin.001965
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 0
                         Last_Error:
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 66665920
      

      On the master:

       mysqlbinlog -v mysql-bin.001965
       
      # at 66665893
      #170208 17:51:24 server id 1  end_log_pos 66665920      Xid = 4949069
      COMMIT/*!*/;
      # at 66665920
      #170208 17:51:21 server id 1  end_log_pos 66665958      GTID 0-1-21861609 trans
      /*!100001 SET @@session.gtid_seq_no=21861609*//*!*/;
      BEGIN
      /*!*/;
      # at 66665958
      #170208 17:51:21 server id 1  end_log_pos 66666035      Table_map: `customers`.`pmig_contracts_contractperiod` mapped to number 832
      # at 66666035
      # at 66667051
      # at 66668067
      # at 66669083
      # at 66670099
      # at 66671115
      # at 66672131
      # at 66673147
      # at 66674163
      # at 66675179
      # at 66676195
      # at 66677211
      # at 66678227
      # at 66679243
      # at 66680259
      # at 66681275
      # at 66682291
      # at 66683307
      ...
      # at 93191763
      # at 93192779
      # at 93193795
      # at 93194811
      # at 93195827
      # at 93196843
      # at 93197859
      #170208 17:51:21 server id 1  end_log_pos 66667051      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66668067      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66669083      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66670099      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66671115      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66672131      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66673147      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66674163      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 66675179      Update_rows: table id 832
      ...
      #170208 17:51:21 server id 1  end_log_pos 93196843      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 93197859      Update_rows: table id 832
      #170208 17:51:21 server id 1  end_log_pos 93198671      Update_rows: table id 832 flags: STMT_END_F
      ...
      ### UPDATE `customers`.`pmig_contracts_contractperiod`
      ### WHERE
      ###   @1=5726323
      ###   @2=99.00
      ###   @3=0.00
      ###   @4=12
      ### SET
      ###   @1=5726323
      ###   @2=99.00
      ###   @3=8.25
      ###   @4=12
      ### UPDATE `customers`.`pmig_contracts_contractperiod`
      ### WHERE
      ###   @1=5726325
      ###   @2=49.00
      ###   @3=0.00
      ###   @4=24
      ### SET
      ###   @1=5726325
      ###   @2=49.00
      ###   @3=2.04
      ###   @4=24
      # at 93198671
      #170208 17:51:21 server id 1  end_log_pos 93198698      Xid = 4947268
      COMMIT/*!*/;
      

      => Everything seems to be in the file, but not in the correct order. Slave is not complaining but does nothing

      MariaDB [(none)]> show processlist;
      +-----+-------------+-------------------+------+-------------+-------+--------------------------------------------------------------------------------+------------------+----------+
      | Id  | User        | Host              | db   | Command     | Time  | State                                                                          | Info             | Progress |
      +-----+-------------+-------------------+------+-------------+-------+--------------------------------------------------------------------------------+------------------+----------+
      |   8 | repl        | 10.193.9.23:51741 | NULL | Binlog Dump | 96271 | Master has sent all binlog to slave; waiting for binlog to be updated          | NULL             |    0.000 |
      | 142 | system user |                   | NULL | Connect     | 87110 | Waiting for master to send event                                               | NULL             |    0.000 |
      | 340 | system user |                   | NULL | Connect     | 54207 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      | 341 | system user |                   | NULL | Connect     | 54207 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      | 342 | system user |                   | NULL | Connect     | 54207 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      | 343 | system user |                   | NULL | Connect     | 54207 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      | 344 | system user |                   | NULL | Connect     | 54207 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      | 345 | system user |                   | NULL | Connect     | 54207 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      | 346 | system user |                   | NULL | Connect     | 54207 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      | 347 | system user |                   | NULL | Connect     |     5 | Update_rows_log_event::find_row(-1)                                            | NULL             |    0.000 |
      

      Attachments

        Issue Links

          Activity

            pmoiroux,

            Sorry, could you please elaborate on "overlap between #at and end_log_pos lines"? I don't see what you mean at the first glance, maybe I'm missing something obvious.

            In fact, I don't see any problem at all from the paste in the description.
            Unfortunately, the excerpt from SHOW SLAVE STATUS is missing Seconds_Behind_Master, which is really important here. How long the server has spent on this position?
            The event is updating over 1,5 mln rows, and since it's a row event, it basically means 1,5+ mln individual updates, so it's bound to take quite long. How long – it depends. Do you have a primary key on pmig_contracts_contractperiod? If not, it can be insanely long.

            elenst Elena Stepanova added a comment - pmoiroux , Sorry, could you please elaborate on "overlap between #at and end_log_pos lines"? I don't see what you mean at the first glance, maybe I'm missing something obvious. In fact, I don't see any problem at all from the paste in the description. Unfortunately, the excerpt from SHOW SLAVE STATUS is missing Seconds_Behind_Master , which is really important here. How long the server has spent on this position? The event is updating over 1,5 mln rows, and since it's a row event, it basically means 1,5+ mln individual updates, so it's bound to take quite long. How long – it depends. Do you have a primary key on pmig_contracts_contractperiod ? If not, it can be insanely long.

            Hi Elena,

            Thank you for your reply. I checked the code of our developer and reproduced the scenario, and this is the current status:

            Seconds_Behind_Master: 48170

            This is what is done on the master:

            drop table if exists pmig_contracts_contractperiod;

            create table pmig_contracts_contractperiod as select id, reference_recurring_charge, reference_recurring_charge_monthly, period from contracts_contractperiod; – backup the original table

            update pmig_contracts_contractperiod set reference_recurring_charge_monthly = FLOOR(reference_recurring_charge/period*100)/100;

            CREATE INDEX pmig_contracts_contractperiod_idx ON pmig_contracts_contractperiod (id);

            And on the slave indeed it will be 921954 individual updates like this before creating the index:

                1. UPDATE `customers`.`pmig_contracts_contractperiod`
                2. WHERE
                3. @1=1002
                4. @2=60.00
                5. @3=0.00
                6. @4=24
                7. SET
                8. @1=1002
                9. @2=60.00
                10. @3=2.50
                11. @4=24

            Maybe it will be enough to create the INDEX before the update ? Any other suggestion? Thanks again for your time.

            pmoiroux Patrick Moiroux added a comment - Hi Elena, Thank you for your reply. I checked the code of our developer and reproduced the scenario, and this is the current status: Seconds_Behind_Master: 48170 This is what is done on the master: drop table if exists pmig_contracts_contractperiod; create table pmig_contracts_contractperiod as select id, reference_recurring_charge, reference_recurring_charge_monthly, period from contracts_contractperiod; – backup the original table update pmig_contracts_contractperiod set reference_recurring_charge_monthly = FLOOR(reference_recurring_charge/period*100)/100; CREATE INDEX pmig_contracts_contractperiod_idx ON pmig_contracts_contractperiod (id); And on the slave indeed it will be 921954 individual updates like this before creating the index: UPDATE `customers`.`pmig_contracts_contractperiod` WHERE @1=1002 @2=60.00 @3=0.00 @4=24 SET @1=1002 @2=60.00 @3=2.50 @4=24 Maybe it will be enough to create the INDEX before the update ? Any other suggestion? Thanks again for your time.

            pmoiroux,

            I didn't see the structure of the table, but in any case – whenever you use binlog_format=row|mixed, it's crucially important to have a PK/unique index on every table of a considerable size which receives updates, otherwise you will have horrible performance problems. In unfortunate circumstances it can indeed be many hours per update.
            That's an infamous problem of row-based replication. You can find more about it, for example, in this upstream bug report. Even though it appears to be fixed, in fact the "fix" was just addition of a warning in the error log, while the problem itself still exists (to my knowledge, in all popular flavors of the server).

            So yes, please try to add the PK or unique index and check if it helps.

            elenst Elena Stepanova added a comment - pmoiroux , I didn't see the structure of the table, but in any case – whenever you use binlog_format=row|mixed , it's crucially important to have a PK/unique index on every table of a considerable size which receives updates, otherwise you will have horrible performance problems. In unfortunate circumstances it can indeed be many hours per update. That's an infamous problem of row-based replication. You can find more about it, for example, in this upstream bug report . Even though it appears to be fixed, in fact the "fix" was just addition of a warning in the error log, while the problem itself still exists (to my knowledge, in all popular flavors of the server). So yes, please try to add the PK or unique index and check if it helps.

            If you have further information on this issue, please comment and the report will be re-opened if needed.

            elenst Elena Stepanova added a comment - If you have further information on this issue, please comment and the report will be re-opened if needed.

            People

              Unassigned Unassigned
              pmoiroux Patrick Moiroux
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.