[MDEV-28143] Data table corruption/crashing on btrfs Created: 2022-03-21 Updated: 2023-09-25 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.7, 10.7.3 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | K | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
debian/devuan |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Creating a DB from a SQL dump (or reading an existing table) is returning various errors - 1877, "cannot open table", or simply segfaulting when a CHECK TABLE command is issued. This has happened on multiple versions of MariaDB and on multiple machines, including during a SQL dump/restore - in which case the offending table was EMPTY on both the source and destination machine. Attached is a stack trace and log dump which may be of use. |
| Comments |
| Comment by K [ 2022-03-22 ] |
|
Another crash loading a SQL dump. |
| Comment by Marko Mäkelä [ 2022-03-22 ] |
|
Which file system and Linux kernel version are you using? The failed page reads might be a duplicate of |
| Comment by K [ 2022-03-22 ] |
|
btrfs mounted w/ nodatacow |
| Comment by K [ 2022-03-22 ] |
|
I've been made aware a fix to this issue was pushed into 10.7.4, so disregard my prior comment. |
| Comment by K [ 2022-03-23 ] |
|
I'm not sure that |
| Comment by Marko Mäkelä [ 2022-03-23 ] |
|
Would setting innodb_flush_method=fsync work around the problem? If that is not enough, please set also innodb_use_native_aio=OFF. Both options may reduce performance. |
| Comment by K [ 2022-03-24 ] |
|
Okay, loading the SQL dump failed the first time - so, to eliminate BTRFS as a confounding factor, I formatted the drive to XFS and attempted to load the SQL dump again into 10.7.3; it ran for several hours before hanging in an NON KILLABLE state (did not respond to repeated kill -9 calls), while emiting the following kernel error: INFO: task mysql:12211 blocked for more than 1208 seconds. Please note that while /var/lib/mysql was formatted as XFS, the volume which housed the SQL dump (which I imported via source /path/to/dump.sql) was btrfs, so this apparently happened while reading the dump, not while saving it to the DB data directory. So I formatted the drive to XFS again, reinstalled the DB, and re-imported the dump, this time with the following settings enabled: This permitted the dump to import successfully; as far as I can tell, the data loaded correctly, although I still need to run CHECK TABLE on the database, but as there are nearly 700 tables that will take some time. |
| Comment by Marko Mäkelä [ 2022-03-24 ] |
|
vector_gorgoth, thank you. I think that danblack has the best knowledge of the Linux kernel bugs and changes in this area, or whether the default innodb_flush_method=O_DIRECT could cause trouble on XFS (which, like btrfs, supports file system snapshots and copy-on-write). It definitely does cause trouble on btrfs and reiserfs (MDEV-28100). Disabling the change buffer is a good idea in any case; see One more work-around might be innodb_page_size=4k, if your schema is compatible with that. I do not know it for sure, but I would expect that if your drive has a physical block size of 4096 bytes or if it is an SSD (whose flash translation layer internally performs copy-on-write), then the InnoDB doublewrite buffer could be safely disabled. I’d like to know whether XFS works for you with both asynchronous I/O and O_DIRECT enabled. We mostly use ext4 in our internal testing, without any problems. The O_DIRECT trouble took me by surprise. |
| Comment by K [ 2022-03-25 ] |
|
For various reasons I attempted another dump/restore - as before, the volume containing the SQL dump is btrfs; the mysql data volume is xfs, and the server version is 10.7.3 on a 5.16.4 kernel. the 3 settings I had enabled before are still enabled: But this time I got another hang, as before: INFO: task mysql:43584 blocked for more than 1208 seconds. I'm going to attempt again with innodb_page_size=4k enabled - just for the sake of experimentation. In the meantime, it appears that something is either seriously wrong with the btrfs driver (the volume itself was freshly created immediately prior to placing the SQL dump on it, on a brand new EBS volume) or with the way mariadb reads data when parsing dumps. |
| Comment by Marko Mäkelä [ 2022-06-22 ] |
|
vector_gorgoth, thank you for the updates. How did the experiment with innodb_page_size=4k work out? I wonder if a recent development snapshot of 10.6 (or any of the 10.10 preview releases), which contain a fix of When it comes to the root cause of this, I would shift the blame to the btrfs implementation in the Linux kernel that you are using. It is possible that the bug has been fixed in a newer kernel. I am reassigning this to danblack, who is our operating system ‘liaison officer’. |
| Comment by K [ 2022-06-22 ] |
|
No combination of config options helped reliably - eventually I simply moved everything to XFS. |