[MDEV-29349] I/O from MariaDB causes FIFREEZE ioctl system call to hang on NVME devices Created: 2022-08-22 Updated: 2023-09-19 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.8 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Thomas Deutschmann | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
nspawn container running mariadb.org binary distribution 10.6.8-MariaDB-1:10.6.8+maria~bullseye $ cat /etc/debian_version "Host" system is running: $ cat /etc/debian_version $ uname -r $ lscpu $ nvme list |
||
| Description |
|
Hi, while trying to backup a Dell R7525 system (AMD EPYC 7002 Serie, codename "Rome") using LVM snapshots I noticed that the system will 'freeze' sometimes (not all the times) when creating the snapshot. First I thought this was related to LVM so I created https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html Long story short: I was even able to reproduce with fsfreeze, see last strace lines
so I started to bisect kernel and found the following bad commit:
After reverting this commit (and follow up commit 0f9650bd838efe5c52f7e5f40c3204ad59f1964d) v5.18.15 and v5.19 worked for me again. However, I am seeing the same problem when using the NVME device directly, i.e. when no mdraid is involved. After I reported this upstream to kernel mailing list, I was asked to run bisect again against the single NVME device. I tried that but I am failing: Bisect will always end with
...but this doesn't make any sense, right? Latest kernel is working for me when I just do
For some reason, SUBLEVEL = 99 causes the failure again even for 5.15... I am currently out of ideas. I was asked to find a different reproducer because maybe mysqld is doing something depending on $KV but I am unable to reproduce with fio yet. However, using MariaDB will always trigger the problem for:
The problem also occurs when I stop mysqld before doing the FIFREEZE ioctl system call. I hope someone has an idea or can help me creating a reproducer not depending on mysqld. |
| Comments |
| Comment by Marko Mäkelä [ 2022-08-22 ] | |||
|
I don’t remember us doing any extensive testing with file system snapshots. Would this be reproducible if you disable the use of O_DIRECT, e.g., innodb_flush_method=fsync to disable the effect of | |||
| Comment by Thomas Deutschmann [ 2022-08-22 ] | |||
|
Yes, I just found This helped me to find https://github.com/axboe/fio/issues/1195 so I now have a working reproducer for fio. | |||
| Comment by Daniel Black [ 2022-08-23 ] | |||
|
FYI I've managed to provide container based reproducers for kernel folks. It allows the entire userspace to be contained to test a kernel with a very small and generic runtime component. Sorry for the MariaDB runtime check disrupting a productive kernel bisect. | |||
| Comment by Thomas Deutschmann [ 2022-08-28 ] | |||
|
We found the kernel issue and a patch was already merged into mainline: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e053aaf4da56cbf0afb33a0fda4a62188e2c0637 Backport for 5.15 and 5.19 is available but pending, https://www.spinics.net/lists/stable/msg588671.html. One thing I noticed which maybe could get improved in MariaDB: After finding the bad commits, I reverted them which caused that io_uring was no longer available/working on my system which I noticed when fio failed to use io_uring engine. However, mysqld didn't tell me anything about that: It started normally showing "[Note] InnoDB: Using liburing" in logs but wasn't using asynchronous I/O anymore since the system didn't freeze anymore. Maybe it is possible for MariaDB to detect non-working liburing and log that which could be helpful when you expect that asynchronous I/O is used but isn't for some reasons. | |||
| Comment by Daniel Black [ 2022-08-29 ] | |||
|
There is a message when liburing fails to initialize. It seems this is the only case where aio is disabled. There are some synchronous IO bits of innodb. Are you after a:
Detecting non-working uring, like an end-to-end test of uring after initialization like the case sensitive filesystems test? It wouldn't catch your FIFREEZE case I think, but it might still catch the case of a non-fully implemented fs case. What cases aren't the kernel folks testing is the hard question. | |||
| Comment by Thomas Deutschmann [ 2022-09-01 ] | |||
|
> Detecting non-working uring, like an end-to-end test of uring after initialization like the case sensitive filesystems test? Yes, something like that. > It wouldn't catch your FIFREEZE case I think Right but I think this is impossible. I mean, AIO was working and used, system just failed after that. I was surprised that fio failed once I "broke" AIO (=io_uring engine wasn't usable anymore for fio) but MariaDB started, told me "Yep, I am using native IO" and run my reproducer without any comment. If I didn't know that AIO was broken due to my changes to the kernel (and to be honest I didn't know that the NOWAIT part was critical and removing would cause AIO to break), I can imagine scenarios where you won't notice and just wonder why system A doesn't behave like system B... |