Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.3(EOL)
-
None
-
linux
Description
Parallel optimitic replication to a spider engine can fail on timeout.
(XA is enabled on the spider head node, spider_internal_xa)
When two transactions from the binary log (1 and 2) are running in parallel on tables using spider engine, the second transaction may succeed before the first, resulting in being in a XA prepared state holding locks that the first transaction needs. But since the second transaction cannot commit until after the first is committed and the first cannot get locks on the row (innodb row lock on the datanode) to complete its changes the replication fails with error 1205 'Lock wait timeout exceeded; try restarting transaction'.
Since there is an issue the first transaction (lock wait timeout), why is not the second transaction rolled back before retrying?
Binary log consists of:
# at 1590965
|
BEGIN
|
/*!*/;
|
# at 1591178
|
### UPDATE `db`.`t1`
|
### WHERE
|
### @1=313
|
### ...
|
### SET
|
### @1=313
|
### ...
|
# Number of rows: 1
|
# at 1592269
|
### UPDATE `db`.`t2`
|
### WHERE
|
### @1=313
|
### ...
|
### SET
|
### @1=313
|
### ...
|
# Number of rows: 1
|
# at 1592980
|
### UPDATE `db`.`t3`
|
### WHERE
|
### @1=313
|
### @2=7502
|
### ...
|
### SET
|
### @1=313
|
### @2=7502
|
### ...
|
# Number of rows: 1
|
# at 1594723
|
### UPDATE `db`.`t3`
|
### WHERE
|
### @1=313
|
### @2=7502
|
### ...
|
### SET
|
### @1=313
|
### @2=7502
|
### ...
|
# Number of rows: 1
|
# at 1596345
|
COMMIT/*!*/;
|
# at 1596376
|
BEGIN
|
/*!*/;
|
# at 1596537
|
### UPDATE `db`.`t2`
|
### WHERE
|
### @1=313
|
### ...
|
### SET
|
### @1=313
|
### ...
|
# Number of rows: 1
|
# at 1597073
|
COMMIT/*!*/;
|
General log including spider entries (478743 matches the first transaction in the binary log, 478744 matches the second/later transaction in the binary log):
180717 17:50:12 478743 Query BEGIN
|
478744 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478744 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653138,0x62353663393630,1
|
478744 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478744 Query mysql datanode4.example.com update `db`.`t2` set `id` = 1017,...,`t1_id` = 313,... where `id` = 1017 and `t1_id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
478744 Query COMMIT /* implicit, from Xid_log_event */
|
478744 Query mysql datanode4.example.com xa end 0x3734653138,0x62353663393630,1
|
478744 Query mysql datanode4.example.com xa prepare 0x3734653138,0x62353663393630,1
|
180717 17:51:03 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:51:54 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:52:45 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:53:36 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:54:27 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:55:18 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:56:09 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:57:00 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:57:51 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:58:42 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478743 Query BEGIN
|
478743 Query mysql datanode4.example.com set session transaction isolation level repeatable read;set session autocommit = 1;xa start 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com select `id`, ... from `db`.`t1` where `id` = 313 for update
|
478743 Query mysql datanode4.example.com update `db`.`t1` set `id` = 313 ... where `id` = 313 and ... limit 1
|
478743 Query mysql datanode4.example.com select `id`,...,`t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
180717 17:59:33 478743 Query mysql datanode4.example.com xa end 0x3734653137,0x62353663393630,1
|
478743 Query mysql datanode4.example.com xa rollback 0x3734653137,0x62353663393630,1
|
478744 Query mysql datanode4.example.com xa rollback 0x3734653138,0x62353663393630,1
|
error log on spider head node (slave):
20180717 17:51:03 [SEND SPIDER SQL] from 478743 to [datanode4.example.com] 206940404: sql: select `id``t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
20180717 17:51:03 [ERROR SPIDER RESULT] to 478743: 1205 Lock wait timeout exceeded; try restarting transaction
|
...
|
20180717 17:51:54 [SEND SPIDER SQL] from 478743 to [datanode4.example.com] 206940581: sql: select `id``t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
20180717 17:51:54 [ERROR SPIDER RESULT] to 478743: 1205 Lock wait timeout exceeded; try restarting transaction
|
...
|
20180717 17:59:33 [SEND SPIDER SQL] from 478743 to [datanode4.example.com] 206942165: sql: select `id``t1_id`,... from `db`.`t2` where `t1_id` = 313 for update
|
20180717 17:59:33 [ERROR SPIDER RESULT] to 478743: 1205 Lock wait timeout exceeded; try restarting transaction
|
2018-07-17 17:59:33 478743 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
|
2018-07-17 17:59:33 478743 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 0-190237024-207448702, Internal MariaDB error code: 1205
|
2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
|
2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
|
2018-07-17 17:59:33 478744 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
|
2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
|
2018-07-17 17:59:33 478744 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
|
2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
|
2018-07-17 17:59:33 478744 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590965
|
2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
...
|
2018-07-17 17:59:33 478744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590
|
965
|
2018-07-17 17:59:33 478743 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
...
|
2018-07-17 17:59:33 478743 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2018-07-17 17:59:33 478743 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.003265' position 1590
|
965
|
2018-07-17 17:59:33 478742 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.003265' at position 1590965
|
Also why so many duplicated lines?