Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL)
Description
axel reproduced one more hang related to innodb_undo_log_truncate=ON, similar to MDEV-30180. Here is a description of a hang that was reproduced with innodb_use_native_aio=0:
- trx_purge_truncate_history() writes the message InnoDB: Truncating and is about to truncate an undo log tablespace.
- trx_purge_truncate_history() is busy-looping in a scan of buf_pool.flush_list because one of the pages belonging to the undo tablespace is write-fixed.
- During the time trx_purge_truncate_history() releases and re-acquires buf_pool.flush_list_mutex, other threads that are waiting for it cannot grab it, in this version of GNU libc. This is similar to
MDEV-30180, which could only be reproduced in the same particular environment. - buf_dblwr_t::flush_buffered_writes_completed() was waiting for log_sys.mutex in log_write_up_to(), while trying to write the block that trx_purge_truncate_history() is trying to lock.
- log_sys.mutex was be held by buf_flush_page_cleaner(), which is waiting for buf_pool.flush_list_mutex.
A possible fix would be that trx_purge_truncate_history() buffer-fixes the block, releases buf_pool.flush_list_mutex, waits for an exclusive latch on the block and finally reacquire buf_pool.flush_list_mutex. In that way, the blocking of other threads is minimized. The buffer-fix will prevent the eviction or relocation of the block in the buffer pool while no mutex is held by trx_purge_truncate_history().
Attachments
Issue Links
- relates to
-
MDEV-33009 Server hangs for a long time with innodb_undo_log_truncate=ON
-
- Closed
-
-
MDEV-27058 Buffer page descriptors are too large
-
- Closed
-
-
MDEV-27414 Server may hang when innodb_undo_log_truncate=ON
-
- Closed
-
-
MDEV-30180 Server hang with innodb_undo_log_truncate=ON
-
- Closed
-
-
MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671, thus shared tablespace (ibdata1) may grow indefinitely for no good reason
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | In Testing [ 10301 ] |
Assignee | Marko Mäkelä [ marko ] | Axel Schwenke [ axel ] |
Attachment | timeseries_tpcc_64.png [ 70286 ] |
Attachment | timeseries_tpcc_64.png [ 70286 ] |
Attachment | timeseries_tpcc_64.png [ 70287 ] |
Assignee | Axel Schwenke [ axel ] | Matthias Leich [ mleich ] |
issue.field.resolutiondate | 2023-05-26 14:00:06.0 | 2023-05-26 14:00:06.429 |
Fix Version/s | 10.6.14 [ 28914 ] | |
Fix Version/s | 10.9.7 [ 28916 ] | |
Fix Version/s | 10.10.5 [ 28917 ] | |
Fix Version/s | 10.11.4 [ 28918 ] | |
Fix Version/s | 11.0.3 [ 28920 ] | |
Fix Version/s | 11.1.2 [ 28921 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 11.0 [ 28320 ] | |
Fix Version/s | 11.1 [ 28549 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Testing [ 10301 ] | Closed [ 6 ] |
Fix Version/s | 10.6.15 [ 29013 ] | |
Fix Version/s | 10.9.8 [ 29015 ] | |
Fix Version/s | 10.10.6 [ 29017 ] | |
Fix Version/s | 10.11.5 [ 29019 ] | |
Fix Version/s | 10.6.14 [ 28914 ] | |
Fix Version/s | 10.9.7 [ 28916 ] | |
Fix Version/s | 10.10.5 [ 28917 ] | |
Fix Version/s | 10.11.4 [ 28918 ] |
Fix Version/s | 10.6.14 [ 28914 ] | |
Fix Version/s | 10.9.7 [ 28916 ] | |
Fix Version/s | 10.10.5 [ 28917 ] | |
Fix Version/s | 10.11.4 [ 28918 ] | |
Fix Version/s | 11.0.2 [ 28706 ] | |
Fix Version/s | 11.1.1 [ 28704 ] | |
Fix Version/s | 11.0.3 [ 28920 ] | |
Fix Version/s | 11.1.2 [ 28921 ] | |
Fix Version/s | 10.6.15 [ 29013 ] | |
Fix Version/s | 10.9.8 [ 29015 ] | |
Fix Version/s | 10.10.6 [ 29017 ] | |
Fix Version/s | 10.11.5 [ 29019 ] |
Link |
This issue relates to |
Commit f410444a76b from the bb-10.6-
MDEV-31343branch survived a 1 hour run of sysbench-tpcc. So the fix is most probably complete (without it the server hung within few minutes).It still has severe impact of performance. Enabling innodb_undo_log_truncate=ON has lead to 50% performance loss (~3000 tps vs. ~6000 tps) in my benchmark: