Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16787

optimistic parallel replication fails on spider

    XMLWordPrintable

    Details

      Description

      Parallel optimitic replication to a spider engine can fail on timeout.

      (XA is enabled on the spider head node, spider_internal_xa)

      When two transactions from the binary log (1 and 2) are running in parallel on tables using spider engine, the second transaction may succeed before the first, resulting in being in a XA prepared state holding locks that the first transaction needs. But since the second transaction cannot commit until after the first is committed and the first cannot get locks on the row (innodb row lock on the datanode) to complete its changes the replication fails with error 1205 'Lock wait timeout exceeded; try restarting transaction'.

      Since there is an issue the first transaction (lock wait timeout), why is not the second transaction rolled back before retrying?

      Binary log consists of:

      # at 1590965
      BEGIN
      /*!*/;
      # at 1591178
      ### UPDATE `db`.`t1`
      ### WHERE
      ###   @1=313
      ###   ...
      ### SET
      ###   @1=313
      ###   ...
      # Number of rows: 1
      # at 1592269
      ### UPDATE `db`.`t2`
      ### WHERE
      ###   @1=313
      ###   ...
      ### SET
      ###   @1=313
      ###   ...
      # Number of rows: 1
      # at 1592980
      ### UPDATE `db`.`t3`
      ### WHERE
      ###   @1=313
      ###   @2=7502
      ###   ...
      ### SET
      ###   @1=313
      ###   @2=7502
      ###   ...
      # Number of rows: 1
      # at 1594723
      ### UPDATE `db`.`t3`
      ### WHERE
      ###   @1=313
      ###   @2=7502
      ###   ...
      ### SET
      ###   @1=313
      ###   @2=7502
      ###   ...
      # Number of rows: 1
      # at 1596345
      COMMIT/*!*/;
      # at 1596376
      BEGIN
      /*!*/;
      # at 1596537
      ### UPDATE `db`.`t2`
      ### WHERE
      ###   @1=313
      ### ...
      ### SET
      ###   @1=313
      ### ...
      # Number of rows: 1
      # at 1597073
      COMMIT/*!*/;
      

      General log including spider entries (478743 matches the first transaction in the binary log, 478744 matches the second/later transaction in the binary log):

      180717 17:50:12	478743 Query	BEGIN
      		478744 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478744 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653138,0x62353663393630,1
      		478744 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478744 Query	mysql datanode4.example.com update `db`.`t2` set `id` = 1017,...,`t1_id` = 313,... where `id` = 1017 and `t1_id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      		478744 Query	COMMIT /* implicit, from Xid_log_event */
      		478744 Query	mysql datanode4.example.com xa end 0x3734653138,0x62353663393630,1
      		478744 Query	mysql datanode4.example.com xa prepare 0x3734653138,0x62353663393630,1
      180717 17:51:03	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:51:54	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:52:45	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:53:36	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:54:27	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:55:18	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:56:09	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:57:00	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:57:51	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:58:42	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478743 Query	BEGIN
      		478743 Query	mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
      		478743 Query	mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
      		478743 Query	mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      180717 17:59:33	478743 Query	mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
      		478743 Query	mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
      		478744 Query	mysql datanode4.example.com xa rollback 0x3734653138,0x62353663393630,1
      

      error log on spider head node (slave):

      20180717 17:51:03 [SEND SPIDER SQL] from 478743 to [datanode4.example.com] 206940404:  sql: select `id``t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      20180717 17:51:03 [ERROR SPIDER RESULT] to 478743: 1205 Lock wait timeout exceeded; try restarting transaction
      ...
      20180717 17:51:54 [SEND SPIDER SQL] from 478743 to [datanode4.example.com] 206940581:  sql: select `id``t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      20180717 17:51:54 [ERROR SPIDER RESULT] to 478743: 1205 Lock wait timeout exceeded; try restarting transaction
      ...
      20180717 17:59:33 [SEND SPIDER SQL] from 478743 to [datanode4.example.com] 206942165:  sql: select `id``t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
      20180717 17:59:33 [ERROR SPIDER RESULT] to 478743: 1205 Lock wait timeout exceeded; try restarting transaction
      2018-07-17 17:59:33 478743 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
      2018-07-17 17:59:33 478743 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 0-190237024-207448702, Internal MariaDB error code: 1205
      2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
      2018-07-17 17:59:33 478744 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
      2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
      2018-07-17 17:59:33 478744 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
      2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
      2018-07-17 17:59:33 478744 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
      2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      ...
      2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590
      965
      2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      ...
      2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590
      965
      2018-07-17 17:59:33 478742 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.003265' at position 1590965
      

      Also why so many duplicated lines?

        Attachments

          Activity

            People

            Assignee:
            Kentoku Kentoku Shiba
            Reporter:
            mattiasjonsson Mattias Jonsson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: