[MDEV-34265] Possible hang during IO burst with innodb_flush_sync enabled - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.11.8, 10.6.18, 11.1.5, 11.2.4, 11.4.2
Fix Version/s: 10.6.19, 10.11.9, 11.1.6, 11.2.5, 11.4.3
Component/s: Storage Engine - InnoDB
Labels:
- hang

Description

The issue was found while analyzing the intermittent CI failures.The tests would fail with timeout.
CI failure: innodb.insert_into_empty
CI failure: encryption.innochecksum
[ fail ] timeout after 900 seconds

This is the relevant stack traces from 2 threads showing the issue. User session thread and purge thread gets blocked. They wait free page cleaner to free pages in BP.

Page cleaner thread has entered "furious flush" stage and is busy flushing pages to pull forward the checkpoint LSN. It fails to progress as some dirty pages are already latched by mtr from session thread(s). This creates a deadlock.

Thread 9 (Thread 0xffff85552fc0 (LWP 29458)):

#0  0x0000ffff9d2b28cc in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6

No symbol table info available.

#1  0x0000ffff9d2b5c60 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libc.so.6

No symbol table info available.

#2  0x0000aaaab58536c8 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0lru.cc:452

#3  0x0000aaaab5853bf8 in buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 800}, zip_size=zip_size@entry=0, unzip=<optimized out>) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0rea.cc:121

#4  0x0000aaaab58543bc in buf_read_page_low (space=0xaaaadee50008, sync=sync@entry=true, mode=mode@entry=132, page_id={m_id = 800}, zip_size=zip_size@entry=0, unzip=unzip@entry=false) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0rea.cc:295

#5  0x0000aaaab5855124 in buf_read_page (page_id=page_id@entry={m_id = 800}, zip_size=zip_size@entry=0) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0rea.cc:464

#6  0x0000aaaab5843320 in buf_page_get_low (page_id={m_id = 800}, zip_size=zip_size@entry=0, rw_latch=rw_latch@entry=1, guess=0x0, mode=16, mtr=0xffff855520d8, err=0x0, allow_ibuf_merge=false) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0buf.cc:2831

#7  0x0000aaaab5843f2c in buf_page_get_gen (page_id=<optimized out>, zip_size=zip_size@entry=0, rw_latch=rw_latch@entry=1, guess=guess@entry=0x0, mode=mode@entry=16, mtr=mtr@entry=0xffff855520d8, err=err@entry=0x0, allow_ibuf_merge=allow_ibuf_merge@entry=false) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0buf.cc:3246

#8  0x0000aaaab57fd7c8 in purge_sys_t::get_page (this=this@entry=0xaaaab6bfdf80 <purge_sys>, id=id@entry={m_id = 800}) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/trx/trx0purge.cc:787

#9  0x0000aaaab57fdba0 in purge_sys_t::choose_next_log (this=this@entry=0xaaaab6bfdf80 <purge_sys>) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/trx/trx0purge.cc:909

#10 0x0000aaaab57fe6ec in purge_sys_t::fetch_next_rec (this=0xaaaab6bfdf80 <purge_sys>) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/trx/trx0purge.cc:1022

#11 trx_purge_attach_undo_recs (n_purge_threads=4, thd=0xaaaadf1c51d8) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/trx/trx0purge.cc:1281

#12 trx_purge (n_tasks=n_tasks@entry=4, history_size=<optimized out>) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/trx/trx0purge.cc:1406

Thread 2 (Thread 0xffff95516fc0 (LWP 29374)):

#3  0x0000aaaab5697a50 in pfs_end_mutex_wait_v1 (locker=<optimized out>, rc=0) at /home/buildbot/aarch64-fedora-40/build/storage/perfschema/pfs.cc:4006

#4  0x0000aaaab5194cb8 in psi_mutex_lock (that=<optimized out>, file=<optimized out>, line=<optimized out>) at /home/buildbot/aarch64-fedora-40/build/mysys/my_thr_init.c:494

#5  0x0000aaaab58476c4 in inline_mysql_mutex_lock (that=0xaaaab6c06760 <buf_dblwr+16>, src_file=0xaaaab5a9cba0 "/home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0dblwr.cc", src_line=727) at /home/buildbot/aarch64-fedora-40/build/include/mysql/psi/mysql_thread.h:746

#6  buf_dblwr_t::flush_buffered_writes (this=0xaaaab6c06750 <buf_dblwr>) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0dblwr.cc:727

#7  0x0000aaaab5190d24 in buf_flush_list (max_n=<optimized out>, lsn=lsn@entry=9419484) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0flu.cc:1576

#8  0x0000aaaab584f574 in buf_flush_sync_for_checkpoint (lsn=9419484) at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0flu.cc:2187

#9  buf_flush_page_cleaner () at /home/buildbot/aarch64-fedora-40/build/storage/innobase/buf/buf0flu.cc:2458

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

10.11_test_repeat.patch
4 kB
2024-05-30 06:59

Issue Links

is caused by

MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool

Closed

is duplicated by

MDEV-34442 InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch

Closed

MDEV-34448 [Warning] InnoDB: Could not free any blocks in the buffer pool! 129856 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size.

Closed

MDEV-34469 mariadb10.6.18 InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Closed

relates to

MDEV-35409 InnoDB can still hang while running out of buffer pool

Closed

MDEV-33053 InnoDB LRU flushing does not run before running out of buffer pool

Closed

MDEV-36226 Stall and crash when page cleaner fails to generate free pages during Async flush

Closed

(2 relates to)

Activity

Ascending order - Click to sort in descending order

Debarun Banerjee added a comment - 2024-05-30 07:03 - edited

The issue could be repeated with following debug patch and test attached on 10.11.

commit 3ee6f69d49a58422f994f51a0bd7a0cb1242a3dd (HEAD -> 10.11, origin/10.11)
Date: Thu May 2 22:14:19 2024 +0200

1. Source need to be rebuild after applying the patch for debug checks/sleeps
2. The test need to be run with page size 4k.

    ./mtr --mem --mysqld=--innodb_page_size=4k  innodb.insert_into_empty

The test repeats the hang in almost alternate runs.
Don't use --repeat as it uses the same server and LSN adjustments are different. If it doesn't repeat in first run just retry.

Debarun Banerjee added a comment - 2024-05-30 07:03 - edited The issue could be repeated with following debug patch and test attached on 10.11. commit 3ee6f69d49a58422f994f51a0bd7a0cb1242a3dd (HEAD -> 10.11, origin/10.11) Date: Thu May 2 22:14:19 2024 +0200 1. Source need to be rebuild after applying the patch for debug checks/sleeps 2. The test need to be run with page size 4k. ./mtr --mem --mysqld=--innodb_page_size=4k innodb.insert_into_empty The test repeats the hang in almost alternate runs. Don't use --repeat as it uses the same server and LSN adjustments are different. If it doesn't repeat in first run just retry.

Debarun Banerjee added a comment - 2024-05-30 12:03

When checkpoint age goes beyond the sync flush threshold and buf_flush_sync_lsn is set, page cleaner enters into "furious flush" stage to aggressively flush dirty pages from flush list and pull checkpoint LSN above safe margin. In this stage, page cleaner skips doing LRU flush and eviction.

In 10.6, all other threads entirely rely on page cleaner to generate free pages. If free pages get over while page cleaner is busy in "furious flush" stage, a session thread could wait for free page in the middle of a min-transaction(mtr) while holding latches on other pages.

It, in turn, can prevent page cleaner to flush such pages preventing checkpoint LSN to move forward creating a deadlock situation. Even otherwise, it could create a stall and hang like situation for large BP with plenty of dirty pages to flush before the stage could finish.

Fix: During furious flush, check and evict LRU pages after each flush iteration.

I could repeat the issue in 10.6 also. The repeatability is about 1/2 in 10 runs - a little less frequent compared to 10.11. With the fix the test runs fine in 100 consecutive runs.
https://github.com/MariaDB/server/pull/3298

Debarun Banerjee added a comment - 2024-05-30 12:03 When checkpoint age goes beyond the sync flush threshold and buf_flush_sync_lsn is set, page cleaner enters into "furious flush" stage to aggressively flush dirty pages from flush list and pull checkpoint LSN above safe margin. In this stage, page cleaner skips doing LRU flush and eviction. In 10.6, all other threads entirely rely on page cleaner to generate free pages. If free pages get over while page cleaner is busy in "furious flush" stage, a session thread could wait for free page in the middle of a min-transaction(mtr) while holding latches on other pages. It, in turn, can prevent page cleaner to flush such pages preventing checkpoint LSN to move forward creating a deadlock situation. Even otherwise, it could create a stall and hang like situation for large BP with plenty of dirty pages to flush before the stage could finish. Fix: During furious flush, check and evict LRU pages after each flush iteration. I could repeat the issue in 10.6 also. The repeatability is about 1/2 in 10 runs - a little less frequent compared to 10.11. With the fix the test runs fine in 100 consecutive runs. https://github.com/MariaDB/server/pull/3298

Marko Mäkelä added a comment - 2024-05-31 05:54

Thank you. I think that the general idea is correct. I posted some suggestions.

Marko Mäkelä added a comment - 2024-05-31 05:54 Thank you. I think that the general idea is correct. I posted some suggestions.

Debarun Banerjee added a comment - 2024-06-03 13:49

While analyzing this problem, I have found out an independent issue which makes the "furious flush" stage unnecessarily long. This is more of a resource (I/O) usage issue which could in turn affect performance. Currently the target is set to current system LSN which is far too aggressive compared to the 5.7 design.

I will file a separate issue on it which should ideally be fixed 10.5 onwards.

Debarun Banerjee added a comment - 2024-06-03 13:49 While analyzing this problem, I have found out an independent issue which makes the "furious flush" stage unnecessarily long. This is more of a resource (I/O) usage issue which could in turn affect performance. Currently the target is set to current system LSN which is far too aggressive compared to the 5.7 design. I will file a separate issue on it which should ideally be fixed 10.5 onwards.

Michael Widenius added a comment - 2024-06-27 07:57

Can this bug cause a performance degradation for the user?
Can this be seen as a performance degradation from an earlier 10.6 version? If yes, in which version was the problem introduced?

Michael Widenius added a comment - 2024-06-27 07:57 Can this bug cause a performance degradation for the user? Can this be seen as a performance degradation from an earlier 10.6 version? If yes, in which version was the problem introduced?

Debarun Banerjee added a comment - 2024-06-27 12:52

monty The independent issue of long "furious flush" stage could impact performance for IO bound workload in theory when system IO is under stress. I would say it is is not a general case. It is from earlier versions and present in 10.5. I will investigate and try to find out at what point it was introduced and add all the details to the new bug I need to file. Thanks for reminding.

Debarun Banerjee added a comment - 2024-06-27 12:52 monty The independent issue of long "furious flush" stage could impact performance for IO bound workload in theory when system IO is under stress. I would say it is is not a general case. It is from earlier versions and present in 10.5. I will investigate and try to find out at what point it was introduced and add all the details to the new bug I need to file. Thanks for reminding.

Michael Widenius added a comment - 2025-01-22 09:14

A workaround for this bug is to do:

SET GLOBAL innodb_flush_sync=OFF;

At the same time, you should ensure that innodb_max_dirty_pages_pct_lwm is not 0, in order to ensure that the InnoDB checkpoint age (amount of log written since the latest checkpoint) will remain low. If we get close to the limit, then all SQL write operations will be blocked until the checkpoint has been advanced and it is crash-safe to write more log. With innodb_flush_sync=OFF the page writing activity during this time will be throttled by innodb_io_capacity, which means that the server could be extremely slow during such time.

Michael Widenius added a comment - 2025-01-22 09:14 A workaround for this bug is to do: SET GLOBAL innodb_flush_sync=OFF; At the same time, you should ensure that innodb_max_dirty_pages_pct_lwm is not 0, in order to ensure that the InnoDB checkpoint age (amount of log written since the latest checkpoint) will remain low. If we get close to the limit, then all SQL write operations will be blocked until the checkpoint has been advanced and it is crash-safe to write more log. With innodb_flush_sync=OFF the page writing activity during this time will be throttled by innodb_io_capacity, which means that the server could be extremely slow during such time.

People

Assignee:: Debarun Banerjee

Reporter:: Debarun Banerjee

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2024-05-30 05:57

Updated:: 2025-03-05 14:02

Resolved:: 2024-06-05 15:33

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration