[MDEV-31177] SHOW SLAVE STATUS Last_SQL_Errno Race Condition on Errored Slave Restart Created: 2023-05-03 Updated: 2023-11-01 Resolved: 2023-09-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Tests |
| Affects Version/s: | 10.5, 10.6, 10.8, 10.9, 10.10, 10.11, 11.0 |
| Fix Version/s: | 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Angelique Sklavounos (Inactive) | Assignee: | Brandon Nesterenko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
From knielsen:
This can be observed by the following test failure with rpl.rpl_xa_prepare_gtid_fail: https://buildbot.mariadb.org/#/builders/208/builds/12453
The slave error log shows an abnormal server shutdown with "initiated by: unknown":
|
| Comments |
| Comment by Brandon Nesterenko [ 2023-08-21 ] | ||||||||||||
|
Prioritized to critical because of the high frequency that this test failure is seen. | ||||||||||||
| Comment by Kristian Nielsen [ 2023-08-24 ] | ||||||||||||
|
This appears to be a real (if small) bug in the code. There is a small window between when the SQL thread reports itself running, and when it clears any error set by a previous error stop. Thus a race exists where include/rpl_end.inc can see the previous error being set, causing it to fail the test. The failure can be reproduced reliably with this small patch:
| ||||||||||||
| Comment by Brandon Nesterenko [ 2023-08-28 ] | ||||||||||||
|
Hey Andrei! This is ready for review: PR-2741 | ||||||||||||
| Comment by Andrei Elkin [ 2023-08-29 ] | ||||||||||||
|
Review done on GH. | ||||||||||||
| Comment by Brandon Nesterenko [ 2023-09-13 ] | ||||||||||||
|
Pushed into 10.4 as 1407f9996. No merge conflicts or test failures observed by cherry-picking into 11.3. |