For the tables that report checksum failures, it would be interesting to see the first 4 bytes of each corrupted page, to see if they match the pattern that had been observed on the first page of some corrupted files.
I have one more hypothesis regarding what could cause a corruption. In MDEV-24854 (MariaDB Server 10.6) we enabled the use of O_DIRECT access to InnoDB data files by default. In Linux, man 2 open mentions the following:
O_DIRECT I/Os should never be run concurrently with the fork(2) system call, if the memory buffer is a private mapping (i.e., any mapping created with the mmap(2) MAP_PRIVATE flag; this includes memory allocated on the heap and statically allocated buffers). Any such I/Os, whether submitted via an asynchronous I/O interface or from another thread in the process, should be completed before fork(2) is called. Failure to do so can result in data corruption and undefined behavior in parent and child processes.
The InnoDB buffer pool is a MAP_PRIVATE mapping. The built-in crash handler of MariaDB Server, which is enabled by default, attempts to create a stack trace of the current thread. As the first step, it would invoke fork(2), without waiting for any pending O_DIRECT writes to complete. I tracked down the history of this change to the following commit in https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/:
commit 1847167b8b7d85a4d52acb86f4cb3755a4abcebd
|
Author: Nick Piggin
|
AuthorDate: Wed May 9 17:50:54 2012 +1200
|
Commit: Michael Kerrisk
|
CommitDate: Wed May 9 19:18:43 2012 +1200
|
|
open.2: Describe race of direct I/O and fork()
|
|
Rework 04cd7f64, which didn't capture the details correctly.
|
See the April/May 2012 linux-man@ mail thread "[PATCH]
|
Describe race of direct read and fork for unaligned buffers"
|
http://thread.gmane.org/gmane.linux.kernel.mm/77571
|
|
Acked-by: KOSAKI Motohiro
|
Cowritten-by: Jan Kara
|
Cowritten-by: Hugh Dickins
|
Signed-off-by: Michael Kerrisk
|
I don’t know if there is an archive of that mailing list available. The scenario that was described in this change was an O_DIRECT read that would run concurrently with a fork(). It was claimed that the result of the read could be split between the parent and child processes. I would imagine that under this kind of a scenario, InnoDB would "do the right thing" and refuse access to a corrupted page.
In MDEV-35886, stephen.hames reported that a hang of the server (due to a bug in a Debian maintained version of the Linux kernel) would lead to data corruption like this. xan@biblionix.com did a great job of tracking down that hang. I’m not at all familiar with the kernel internals, but I got concerned that we could get data corruption due to a race between an O_DIRECT asynchronous write and fork(2).
The way I read the current Linux man 2 open, it would seem to be unsafe to invoke fork(2) in any multi-threaded program that may access files that have been opened with O_DIRECT. InnoDB is opening such files with the O_CLOEXEC flag, so one might assume that any race condition between O_DIRECT file access and fork(2) would be limited to the point of time where the execve(2) system call has not been invoked yet (and the memory mappings of the parent process have not been destroyed). The fork(2) call was introduced in MariaDB Server in March 2012 (2 months before the above mentioned documentation change) and revised in 2018. In any case, even if the built-in stack trace reporter weren’t behind this corruption, it is known to hang depending on when it is being triggered (MDEV-21010).
Hi Marko,
I may be able to get back the original file from the last snapshot of the VM, but the affected table ws matomo_log_visits, which contains the most data protection relevant data
In any case, the corrumption may have been happened already before the issues with checksum at beginning of files wrong.
The databases are now clean and I checked all files (after shutdown) with: ls *.ibd | xargs -L1 -t innochecksum (this originally printed corrumption for 3 tables, all very large and all privacy-sensitive.
The mediawiki on the other server was clean, so it looks like the issues may had other reasons.
I had an SQL-dump of the whole database created previously. And as this worked without errors, the optimize should not have caused issues.
Yes that's the Ubuntu LTS version of MariaDB. The problem with Ubuntu is that for LTS releases they never ever change the version number of any package and just patch bugs. The Debian changelog shows some bugfixes, but there were no recent updates of the package. So the bug could be there. I will maybe change to an official PPA. Thanks for the warning!