|
axel reproduced one more hang related to innodb_undo_log_truncate=ON, similar to MDEV-30180. Here is a description of a hang that was reproduced with innodb_use_native_aio=0:
- trx_purge_truncate_history() writes the message InnoDB: Truncating and is about to truncate an undo log tablespace.
- trx_purge_truncate_history() is busy-looping in a scan of buf_pool.flush_list because one of the pages belonging to the undo tablespace is write-fixed.
- During the time trx_purge_truncate_history() releases and re-acquires buf_pool.flush_list_mutex, other threads that are waiting for it cannot grab it, in this version of GNU libc. This is similar to
MDEV-30180, which could only be reproduced in the same particular environment.
- buf_dblwr_t::flush_buffered_writes_completed() was waiting for log_sys.mutex in log_write_up_to(), while trying to write the block that trx_purge_truncate_history() is trying to lock.
- log_sys.mutex was be held by buf_flush_page_cleaner(), which is waiting for buf_pool.flush_list_mutex.
A possible fix would be that trx_purge_truncate_history() buffer-fixes the block, releases buf_pool.flush_list_mutex, waits for an exclusive latch on the block and finally reacquire buf_pool.flush_list_mutex. In that way, the blocking of other threads is minimized. The buffer-fix will prevent the eviction or relocation of the block in the buffer pool while no mutex is held by trx_purge_truncate_history().
|