[MDEV-31755] Replica's DML event deadlocks with online alter table Created: 2023-07-20 Updated: 2023-08-16 Resolved: 2023-08-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table |
| Affects Version/s: | None |
| Fix Version/s: | 11.2.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Nikita Malyavin | Assignee: | Nikita Malyavin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
A deadlock of two phase binlogged "start" ALTER with the following in binlog order DML statement on the parallel slave is caused
|
| Comments |
| Comment by Kristian Nielsen [ 2023-07-20 ] |
|
How can I view the details of this issue? The Google drive link doesn't seem to work. |
| Comment by Elena Stepanova [ 2023-07-20 ] |
|
The google drive link should have never been there. I hope nikitamalyavin will write a proper description soon enough. I cannot however attach the actual binlog which I guess is the most important, if you need it, |
| Comment by Kristian Nielsen [ 2023-07-21 ] |
|
Thanks, Elena, Nikita. I was a bit confused at first having only threads_slave_full to look into, as the GTIDs and sub_ids are a bit strange. But it's possible that they are just incorrect values shown by GDB. It makes sense that different locking for the ALTER on master and slave can cause this kind of hang. It's something that worries be in general: the START ALTER feels fragile since it can hang if any kind of different lock conflict occurs against a later query. It feels like we need the lock wait report and kill for metadata locks that we currently have for InnoDB row locks. (But that's probably a separate issue from this particular issue). - Kristian. |
| Comment by Nikita Malyavin [ 2023-07-21 ] |
|
knielsen I have counted 6 locking systems in the server. Some of them (like innodb's row/table locks, or my_safe_mutex, or MDL) have their own lock detection systems, others don't, at all. It'd be nice to generalize it all in one deadlock detection module and spread to the locks not covered. Not sure though, can we do it here – in general, condvars can't be deadlock-detected (we don't know who else can signal), but here is the only producer that is known to us, so we should be able. |