[MDEV-25590] Deadlock during ongoing transaction and RSU Created: 2021-05-04 Updated: 2022-02-14 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | wsrep |
| Affects Version/s: | 10.5.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Kamil Holubicki | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | None | ||
| Environment: |
Test case Galera provider loaded, one cluster node is enough. session_1: session_2: use test; session_1: commit; Result: Expected result: I investigated it a little bit and here are my findings: I also see that 37deed3f37561f264f65e162146bbc2ad35fb1a2 introduced Galera 4. With Galera 3 when we called wsrep_to_isolation_begin(), regardless of TOI or RSU we set thd->wsrep_exec_mode = TOTAL_ORDER. Then when session 2 detects deadlock, abort action is done if thd->wsrep_exec_mode == TOTAL_ORDER. So we perform abort for both TOI and RSU. Now with Galera4 we don't have wsrep_exec_mode. |
||
| Description |
|
Test case Galera provider loaded, one cluster node is enough. session_1: session_2: use test; session_1: commit; Result: Expected result: I investigated it a little bit and here are my findings: I also see that 37deed3f37561f264f65e162146bbc2ad35fb1a2 introduced Galera 4. With Galera 3 when we called wsrep_to_isolation_begin(), regardless of TOI or RSU we set thd->wsrep_exec_mode = TOTAL_ORDER. Then when session 2 detects deadlock, abort action is done if thd->wsrep_exec_mode == TOTAL_ORDER. So we perform abort for both TOI and RSU. Now with Galera4 we don't have wsrep_exec_mode. |
| Comments |
| Comment by Arthur van Kleef [ 2021-05-28 ] | |||||||||||||||||
|
I experience the same issue in 10.4.14 and 10.4.19. It's easy to reproduce using metadata locking example and setting wsrep_osu_method=RSU in the session running the DDL. Metadata locks after starting the transaction in session 1:
And after starting the DDL operation in session 2 and doing COMMIT in session 1:
Now both sessions wait forever, until I abort the DDL (ctrl+c) in session 2. Then the COMMIT in session 1 succeeds immediately. | |||||||||||||||||
| Comment by Arthur van Kleef [ 2022-01-19 ] | |||||||||||||||||
|
It looks like this issue got fixed downstream in Percona, is it be possible to apply the same fix PXC-3645 here? | |||||||||||||||||
| Comment by Nanne Huiges [ 2022-02-14 ] | |||||||||||||||||
|
This is a bug that comes up from time to time in running migrations, and it is really quite a problem. It would be good to get the Percona fix in! |