[MDEV-20628] If temporary error occurs with optimistic mode of parallel replication, error message has false information Created: 2019-09-19 Updated: 2024-01-24 Resolved: 2020-10-09 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.2.27 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Sachin Setiya (Inactive) |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
With optimistic mode of parallel replication, if a query fails, then you can end up with log messages like this:
These messages are printed in slave_output_error_info. See here: https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/slave.cc#L4756 With optimistic mode of parallel replication, the slave_output_error_info function can be called when a transaction has failed because another transaction that it depends on has failed. See here: https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/rpl_parallel.cc#L1221 Some errors are considered "temporary errors". The only temporary errors are currently just deadlocks and timeouts. See here: https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/slave.cc#L3381 In the case of a "temporary error", the event group can actually be retried. See here: https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/rpl_parallel.cc#L1305 These retries happen if slave_transaction_retries is greater than 0. In the example shown above, the errors were deadlocks, so the transactions were indeed retried. Notice that the following message was still included in the above log snippet:
This message is not quite correct in this case.
It seems like this error message in slave_output_error_info should be slightly different if the SQL thread encounters a temporary error. In that case, maybe it should be a Warning instead of an Error, and maybe it should say something like this instead:
|
| Comments |
| Comment by Andrei Elkin [ 2019-09-19 ] | ||
|
GeoffMontee: Thanks for looking into this! When a temporary error is encountered the message should be of the [Warning] level, incl the last unsuccessful retry I think 'cos we have a run-out-of-retries conclusive error. | ||
| Comment by Sachin Setiya (Inactive) [ 2020-08-18 ] | ||
|
Hi GeoffMontee I tried to simulate the same error log as mentioned , But for me slave was stopped And one more thing if this condition is true | ||
| Comment by Geoff Montee (Inactive) [ 2020-08-18 ] | ||
Why would the slave stop? Aren't the errors in the log snippet "temporary errors", so they should be automatically retried? Or is the "duplicate key error" in the log snippet a "real error" and not a "temporary error"? By the way, that "duplicate key error" was probably caused by | ||
| Comment by Sachin Setiya (Inactive) [ 2020-08-20 ] | ||
|
Hi GeoffMontee! I am will try to simulate the
| ||
| Comment by Geoff Montee (Inactive) [ 2020-08-20 ] | ||
|
Yes, that is correct. In the case of For this specific issue, I don't remember how I verified that the slave threads did not stop. I probably encountered this problem while reproducing Maybe in some edge cases, the problems that lead to | ||
| Comment by Julien Fritsch [ 2020-10-09 ] | ||
|
Duplicate of | ||
| Comment by Andrei Elkin [ 2023-01-27 ] | ||
|
rob.schwyzer@mariadb.com, the reason I blamed |