Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Duplicate
-
10.6.5, 10.6.8
-
None
-
Debian 11
XCP-ng 8.2.1
Description
Hello,
The first 10.6 release we used was 10.6.5. After upgrading to 10.6.5, whenever we perform live VM migrations we sometimes see that InnoDB will at some point no longer respond. We also see this behaviour in 10.6.8.
We perform a live VM migration on XCP-ng (xenserver). We have done this for about ~300 VMs, and about ~40 had issues. We would see the issues ranging from about a few minutes after the migration up to ~2 weeks after the migration (usually on less busy servers it would manifest later).
Queries remain stuck in the states Updating, Sending Data, Statistics, Filling Schema table, Commit and just never complete.
Running the command "Show engine innodb status" will hang indefinitely and never give any output.
A restart of mysql also will not work. we have to kill -9 in order to restart mysql. After that it works again.
We have not noticed this on MariaDB 10.5 and 10.4.
We have noticed this on single instances as well as instances running galera.
I did make a gcore of one of the instances that has issues (it is almost 5GB). Perhaps I can do anything with that, but I'm not sure what.
Any ideas on what is wrong here?
Attachments
Issue Links
- duplicates
-
MDEV-28665 aio_uring::thread_routine terminates prematurely, causing InnoDB to hang
-
- Closed
-
Could this be related to https://github.com/MariaDB/server/commit/db0fde3f24b37cfac9a4125ce888f1650a20db7b ?
I've been trying to see if I can pin this to a specific commit where it starts, but I'm having a difficult time understanding git bisect on the MariaDB repo. I keep ending up on wrong branches (suddenly building for 10.5 for example despite being on the 10.6 branch). I don't often use bisect so perhaps I am just using it wrong.
Either way, I think that so-far I cannot reproduce this on the latest commit in the 10.6 branch. Doing a specific checkout on commit db0fde3f24b37cfac9a4125ce888f1650a20db7b I can also not reproduce it. Looking at https://github.com/MariaDB/server/commits/10.6?after=654236c06d231461c66e2f3c5c4fd3b35cba3869+139&branch=10.6&qualified_name=refs%2Fheads%2F10.6 - it seems that commit a0e4853eff028fa9db9ba0421309e2bd1124ab26 comes just prior to db0fde3f24b37cfac9a4125ce888f1650a20db7b but this compiles to MariaDB 10.5 so I'm probably doing something wrong there.
I can reproduce it on commit 57e66dc7e60 (which seems close to commit db0fde3f24b ?) where I cannot reproduce it so-far.
I'm curious on your opinion on whether the commit I linked seems related to this one to you.