Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6, 10.11, 11.1(EOL), 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 11.0(EOL)
Description
axel reproduced one more hang related to innodb_undo_log_truncate=ON, similar to MDEV-30180. Here is a description of a hang that was reproduced with innodb_use_native_aio=0:
- trx_purge_truncate_history() writes the message InnoDB: Truncating and is about to truncate an undo log tablespace.
- trx_purge_truncate_history() is busy-looping in a scan of buf_pool.flush_list because one of the pages belonging to the undo tablespace is write-fixed.
- During the time trx_purge_truncate_history() releases and re-acquires buf_pool.flush_list_mutex, other threads that are waiting for it cannot grab it, in this version of GNU libc. This is similar to
MDEV-30180, which could only be reproduced in the same particular environment. - buf_dblwr_t::flush_buffered_writes_completed() was waiting for log_sys.mutex in log_write_up_to(), while trying to write the block that trx_purge_truncate_history() is trying to lock.
- log_sys.mutex was be held by buf_flush_page_cleaner(), which is waiting for buf_pool.flush_list_mutex.
A possible fix would be that trx_purge_truncate_history() buffer-fixes the block, releases buf_pool.flush_list_mutex, waits for an exclusive latch on the block and finally reacquire buf_pool.flush_list_mutex. In that way, the blocking of other threads is minimized. The buffer-fix will prevent the eviction or relocation of the block in the buffer pool while no mutex is held by trx_purge_truncate_history().
Attachments
Issue Links
- relates to
-
MDEV-33009 Server hangs for a long time with innodb_undo_log_truncate=ON
- Closed
-
MDEV-27058 Buffer page descriptors are too large
- Closed
-
MDEV-27414 Server may hang when innodb_undo_log_truncate=ON
- Closed
-
MDEV-30180 Server hang with innodb_undo_log_truncate=ON
- Closed
-
MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671, thus shared tablespace (ibdata1) may grow indefinitely for no good reason
- Closed