Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Duplicate
-
10.2.27
-
None
Description
With optimistic mode of parallel replication, if a query fails, then you can end up with log messages like this:
2019-09-17 19:27:07 140190015801088 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2019-09-17 19:27:07 140190015801088 [Warning] Slave: Duplicate entry '...' for key 'name' Error_code: 1062
|
2019-09-17 19:27:07 140190015801088 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
|
2019-09-17 19:27:07 140190015801088 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2019-09-17 19:27:07 140190015801088 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'
|
2019-09-17 19:27:07 140190315325184 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2019-09-17 19:27:07 140190315325184 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
|
2019-09-17 19:27:07 140190315325184 [Warning] Slave: Connection was killed Error_code: 1927
|
2019-09-17 19:27:07 140190315325184 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
|
2019-09-17 19:27:07 140190315325184 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
2019-09-17 19:27:07 140190315325184 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'
|
These messages are printed in slave_output_error_info. See here:
https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/slave.cc#L4756
With optimistic mode of parallel replication, the slave_output_error_info function can be called when a transaction has failed because another transaction that it depends on has failed. See here:
https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/rpl_parallel.cc#L1221
Some errors are considered "temporary errors". The only temporary errors are currently just deadlocks and timeouts. See here:
https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/slave.cc#L3381
In the case of a "temporary error", the event group can actually be retried. See here:
https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/rpl_parallel.cc#L1305
These retries happen if slave_transaction_retries is greater than 0.
In the example shown above, the errors were deadlocks, so the transactions were indeed retried. Notice that the following message was still included in the above log snippet:
2019-09-17 19:27:07 140190315325184 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'
This message is not quite correct in this case.
- The SQL thread was not aborted.
- The SQL thread does not need to be manually restarted.
It seems like this error message in slave_output_error_info should be slightly different if the SQL thread encounters a temporary error. In that case, maybe it should be a Warning instead of an Error, and maybe it should say something like this instead:
2019-09-17 19:27:07 140190315325184 [Warning] Temporary error running query. The slave SQL thread will retry the query. We encountered the temporary error at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'
Attachments
Issue Links
- is blocked by
-
MDEV-33303 slave_parallel_mode=optimistic should not report the mode's specific temporary errors
- Closed
- relates to
-
MDEV-13577 slave_parallel_mode=optimistic should not report the mode's specific temporary errors
- Closed
-
MDEV-30010 Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
- Closed
-
MDEV-20604 Duplicate key value is silently truncated to 64 characters in print_keydup_error
- Closed
-
MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration
- Closed