[MDEV-27322] Test innodb.doublewrite crashes when using innodb_flush_method=O_DIRECT Created: 2021-12-20  Updated: 2021-12-21  Resolved: 2021-12-21

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: N/A
Fix Version/s: 10.8.0, 10.6.6, 10.7.2

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: crash, recovery, regression

Issue Links:
Problem/Incident
is caused by MDEV-27014 InnoDB fails to restore page 0 from t... Closed

 Description   

The test innodb.doublewrite fails if O_DIRECT is supported. (Do not use the --mem, or link a directory on a persistent file system to mysql-test/var.) This problem appears to be related to MDEV-24626.

The invocations

./mtr --mysqld=--innodb-flush-method=O_DIRECT --parallel=auto innodb.doublewrite
./mtr --mysqld=--innodb-flush-method=fsync --parallel=auto innodb.doublewrite

work fine on 10.5. In MDEV-24854 on 10.6, we defaulted to O_DIRECT, and thus the test fails even without that extra parameter. Here is sample failure output from a 10.8-based branch:

innodb.doublewrite '16k,innodb,strict_crc32' w4 [ fail ]
2021-12-20 16:32:34 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=48833
2021-12-20 16:32:34 0 [ERROR] InnoDB: Inconsistent tablespace ID in ./test/t1.ibd
2021-12-20 16:32:34 0 [Warning] InnoDB: Retry attempts for writing partial data failed.
2021-12-20 16:32:34 0 [ERROR] InnoDB: Write to file ./test/t1.ibd failed at offset 0, 1 bytes should have been written, only 0 were written. Operating system error number 22. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2021-12-20 16:32:34 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
2021-12-20 16:32:34 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
2021-12-20 16:32:34 0 [ERROR] InnoDB: Trying to add tablespace with id 5 to the cache, but tablespace './test/t1.ibd' already exists in the cache!
mariadbd: /mariadb/10.8/storage/innobase/log/log0recv.cc:795: static fil_space_t*<unnamed struct>::create(const const_iterator&, const string&, uint32_t, fil_space_crypt_t*, uint32_t): Assertion `space' failed.
#8  0x0000564359a78b97 in create (size=6, crypt_data=<optimized out>, flags=33, name="./test/t1.ibd", it=<optimized out>) at /mariadb/10.8/storage/innobase/log/log0recv.cc:795
#9  recv_sys_t::recover_deferred (this=this@entry=0x56435a7994c0 <recv_sys>, p={first = {m_id = 21474836481}, second = {state = page_recv_t::RECV_WILL_NOT_READ, last_offset = 1, log = {head = 0x7f1d180385f8, tail = 0x7f1d18038618}}}, name="./test/t1.ibd", free_block=@0x7ffcadc33528: 0x0) at /mariadb/10.8/storage/innobase/log/log0recv.cc:900

I think that we should attempt to write a complete page at a page size aligned offset, similar to MDEV-26040.

Furthermore, any data files for ROW_FORMAT=COMPRESSED with a smaller page size than 4096 bytes must be opened with OS_DATA_FILE_NO_O_DIRECT. (But, that should not be the issue here, because my block device supports 512-byte O_DIRECT reads and writes.)



 Comments   
Comment by Marko Mäkelä [ 2021-12-21 ]

The fix to the os_file_write() call looks OK to me. It will prevent the EINVAL error (22) for a write that violates the O_DIRECT constraints.

But, can you please revert that fix for a while and fix the subsequent crash in the event that the deferred_spaces.deferred_dblwr() call failed?

Comment by Marko Mäkelä [ 2021-12-21 ]

I tested that this failure was introduced by the fix of MDEV-27014. The bug is not present in any release (except the 10.8 preview releases; at least the one for MDEV-14425).

Generated at Thu Feb 08 09:52:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.