Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.5.7, 10.5.8
Description
elenst reports that her tests are hanging.
The reason seems to be the following combination of parameters:
- innodb_write_io_threads=1
- innodb_use_native_aio=0
- innodb_doublewrite=1
To add confusion, InnoDB will claim that it is using native AIO (because innodb_use_native_aio=1 by default), but a little later, it will silently change to simulated asynchronous I/O (innodb_use_native_aio=0) due to io_setup() failure.
The hang is possible, because starting with MDEV-23855, the doublewrite buffer issues asynchronous writes. On buf_dblwr_t::flush_buffered_writes_completed(), up to 128 page writes may be submitted. But, there may already be 128 outstanding page writes. The thread would then hang here, because the maximum number of outstanding requests per thread is 256:
mariadb-10.5.8 |
#14 0x000055c699979745 in io_slots::acquire (this=0x55c69c565d50) at /home/mariadb/semaphore1/10.5-enterprise/storage/innobase/os/os0file.cc:103
|
#15 0x000055c699977bfe in os_aio (type=..., buf=0x7f9c7847c000, offset=4128768, n=16384) at /home/mariadb/semaphore1/10.5-enterprise/storage/innobase/os/os0file.cc:4211
|
#16 0x000055c699c14ce0 in fil_space_t::io (this=0x55c69cd565e8, type=..., offset=4128768, len=16384, buf=0x7f9c7847c000, bpage=0x7f9c78009ab0)
|
at /home/mariadb/semaphore1/10.5-enterprise/storage/innobase/fil/fil0fil.cc:3431
|
#17 0x000055c699b9113c in buf_dblwr_t::flush_buffered_writes_completed (this=0x55c69b62b960 <buf_dblwr>, request=...) at /home/mariadb/semaphore1/10.5-enterprise/storage/innobase/buf/buf0dblwr.cc:678
|
#18 0x000055c699c14fb9 in fil_aio_callback (request=...) at /home/mariadb/semaphore1/10.5-enterprise/storage/innobase/fil/fil0fil.cc:3466
|
#19 0x000055c699976827 in io_callback (cb=0x55c69c565f40) at /home/mariadb/semaphore1/10.5-enterprise/storage/innobase/os/os0file.cc:3918
|
#20 0x000055c699ca75dd in tpool::simulated_aio::simulated_aio_callback (param=0x55c69c565f40) at /home/mariadb/semaphore1/10.5-enterprise/tpool/aio_simulated.cc:162
|
A possible workaround could be to make innodb_write_io_threads=2 the minimum value.
Thanks to wlad for participating in the debugging.
Attachments
Issue Links
- is caused by
-
MDEV-16264 Implement a common work queue for InnoDB background tasks
-
- Closed
-
-
MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP
-
- Closed
-
- relates to
-
MDEV-25953 Tpool - prevent potential deadlock in simulated AIO
-
- Closed
-
I want to emphasize for posterity: innodb_use_native_aio=0 mentioned in the description is NOT a parameter used by the test. The test runs the server with the default innodb_use_native_aio=1, and all necessary packages are installed on the machines where it is run. Whatever makes InnoDB ignore it happens internally and not configured by the test.
This is an important distinction, because stating that it is dependent on innodb_use_native_aio=0 setting would make people discard it as a possible cause of similar problems, since they are not using such an option. It would be wrong – the server can be run with the default innodb_use_native_aio and still be affected.
For the other two parameters listed in the description, innodb_doublewrite=1 is the default value; and the test does indeed set innodb_write_io_threads=1.