[MDEV-18349] InnoDB file size changes are not safe when file system crashes Created: 2019-01-23  Updated: 2019-01-25  Resolved: 2019-01-23

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.0.11, 10.1.0, 10.2.0, 5.5.55, 10.3.0, 10.4.0
Fix Version/s: 10.4.2, 10.1.38, 5.5.63, 10.0.38, 10.2.22, 10.3.13

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: recovery

Issue Links:
Blocks
blocks MDEV-18338 Merge new release of InnoDB 5.7.25 to... Closed
Relates
relates to MDEV-4338 Support FusionIO/directFS atomic writes. Closed
relates to MDEV-11520 Extending an InnoDB data file unneces... Closed

 Description   

When InnoDB is invoking posix_fallocate() to extend data files, it is missing a call to fsync() to update the file system metadata. If file system recovery is needed, the file size could be incorrect.

Furthermore, when the setting innodb_flush_method=O_DIRECT_NO_FSYNC that was introduced in MariaDB 10.0.11 (and MySQL 5.6) is enabled, InnoDB would stop calling fsync() after extending files.

This report is motivated by a MySQL 5.7.25 change Bug#27309336 Backport to 5.7 that restores the fsync() call. We will fix the bug differently; it does not seem to be a good idea to hold the already contentious fil_system->mutex while executing a system call.



 Comments   
Comment by Marko Mäkelä [ 2019-01-23 ]

Related to this, there appears to have been a bad merge to MariaDB 10.0.31 and 10.1.24 that caused XtraDB to ignore errors from posix_fallocate().

Comment by Eugene Kosov (Inactive) [ 2019-01-23 ]

> We will fix the bug differently; it does not seem to be a good idea to hold the already contentious fil_system->mutex while executing a system call.

But os_file_flush() inside fil_flush() already called outside of a mutex.

Comment by Marko Mäkelä [ 2019-01-23 ]

kevg, you are right, the system call in fil_flush() is not covered by a mutex. My mistake.
My solution is not optimal, because in a worst case, it can invoke fsync() twice in a succession: first from os_file_flush() after posix_fallocate() and then possibly from fil_flush(), because in the first call we are not updating the bookkeeping. As always, improvements are welcome.

Generated at Thu Feb 08 08:43:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.