[MDEV-27499] tps regression with 10.8 (w.r.t to 10.6.5) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: N/A
Fix Version/s: 10.5.14, 10.6.6, 10.7.2, 10.8.1
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

while benchmarking 10.8 (#4b14874c2809a8b0f2d04a969132e062fb22b145) work-in-progress release I observed there is a regression with update-non-index, uniform workload (1024 scalability).
tps often drops to 0.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

10.6.5-tps and 10.8.-tps - ARM (uniform, update-non-index, 1024 scalability).png
31 kB
2022-01-14 11:25
10.6.5-tps and 10.8-tps - x86 (uniform, update-non-index, 1024 scalbility).png
34 kB
2022-01-14 11:25
arm.png
30 kB
2022-01-14 14:55
on touching threshold steep drop in tps from X to 0. Even though page cleaner continue to flush pages and fuzzy checkpoint take place the said threads wait for the complete checkpoint.png
40 kB
2022-01-17 11:00
x86.png
33 kB
2022-01-14 14:55

Issue Links

is caused by

MDEV-27416 InnoDB hang in buf_flush_wait_flushed(), on log checkpoint

Closed

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Krunal Bauskar added a comment - 2022-01-17 12:11

Marko,

1. the said patch doesn't help.
2. Condition is constant till buf_flush_sync_lsn is reset to 0. lsn is not changing.
3. I am still confused why we need that condition there?

Krunal Bauskar added a comment - 2022-01-17 12:11 Marko, 1. the said patch doesn't help. 2. Condition is constant till buf_flush_sync_lsn is reset to 0. lsn is not changing. 3. I am still confused why we need that condition there?

Marko Mäkelä added a comment - 2022-01-17 12:42

krunalbauskar, in case you wonder why there are two wait loops in the function buf_flush_wait(): The first one waits for the minimum oldest_modification in the buffer pool to be large enough. That is the prerequisite for a log checkpoint. The oldest_modification of each modified page would be cleared in buf_page_write_complete() in I/O callback threads.

The final loop is waiting for the checkpoint to actually be advanced by the buf_flush_page_cleaner(). That condition is currently waiting for the page cleaner to completely exit the ‘furious flushing’ mode before exiting.

I now realize that my initially suggested change cannot possibly help, because whenever buf_flush_sync_lsn is nonzero, it would be guaranteed to be at least lsn, by design. What we really might need is to wait for the log checkpoint to have advanced enough. Something like this:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc

index dec0292a088..7480e0cf673 100644

--- a/storage/innobase/buf/buf0flu.cc

+++ b/storage/innobase/buf/buf0flu.cc

@@ -1799,7 +1799,7 @@ static void buf_flush_wait(lsn_t lsn)

   /* Wait for the checkpoint. */

-  while (buf_flush_sync_lsn)

+  while (buf_flush_sync_lsn && log_sys.last_checkpoint_lsn < lsn)

     my_cond_wait(&buf_pool.done_flush_list,

                  &buf_pool.flush_list_mutex.m_mutex);

I will have to review whether that second loop is needed at all. It should be unnecessary for the correctness of log_checkpoint_margin(), but I have to consider other callers as well.

Marko Mäkelä added a comment - 2022-01-17 12:42 krunalbauskar , in case you wonder why there are two wait loops in the function buf_flush_wait() : The first one waits for the minimum oldest_modification in the buffer pool to be large enough. That is the prerequisite for a log checkpoint. The oldest_modification of each modified page would be cleared in buf_page_write_complete() in I/O callback threads. The final loop is waiting for the checkpoint to actually be advanced by the buf_flush_page_cleaner() . That condition is currently waiting for the page cleaner to completely exit the ‘furious flushing’ mode before exiting. I now realize that my initially suggested change cannot possibly help, because whenever buf_flush_sync_lsn is nonzero, it would be guaranteed to be at least lsn , by design. What we really might need is to wait for the log checkpoint to have advanced enough. Something like this: diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc index dec0292a088..7480e0cf673 100644 --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -1799,7 +1799,7 @@ static void buf_flush_wait(lsn_t lsn) } /* Wait for the checkpoint. */ - while (buf_flush_sync_lsn) + while (buf_flush_sync_lsn && log_sys.last_checkpoint_lsn < lsn) my_cond_wait(&buf_pool.done_flush_list, &buf_pool.flush_list_mutex.m_mutex); } I will have to review whether that second loop is needed at all. It should be unnecessary for the correctness of log_checkpoint_margin() , but I have to consider other callers as well.

Marko Mäkelä added a comment - 2022-01-17 14:45 - edited

I believe that the second loop of buf_flush_wait() (waiting for the checkpoint) that was added in ~~MDEV-27416~~ might only be needed for the buf_flush_sync() call in srv_prepare_to_delete_redo_log_file(). Other calls of buf_flush_sync() are in debug code, so it is fine for them to wait for the checkpoint. But, the calls in log_checkpoint_margin() definitely do not need to wait for a checkpoint; the checkpoint age will explicitly be checked after buf_flush_wait_flushed().

Edit: I think that the loop is completely unnecessary. For log file resizing, we explicitly disable checkpoints, so we should only care that the buffer pool is clean, not whether the page cleaner exited the ‘furious flushing’ mode. I have submitted the removal of the loop for testing.

Thank you for catching this, krunalbauskar. For some reason, the condition failed to reproduce in our performance tests. Perhaps axel did not test with a small innodb_log_file_size lately? I think that I did run some perfomrance tests like that during the ~~MDEV-14425~~ development, but I did not notice this.

Marko Mäkelä added a comment - 2022-01-17 14:45 - edited I believe that the second loop of buf_flush_wait() (waiting for the checkpoint) that was added in MDEV-27416 might only be needed for the buf_flush_sync() call in srv_prepare_to_delete_redo_log_file() . Other calls of buf_flush_sync() are in debug code, so it is fine for them to wait for the checkpoint. But, the calls in log_checkpoint_margin() definitely do not need to wait for a checkpoint; the checkpoint age will explicitly be checked after buf_flush_wait_flushed() . Edit: I think that the loop is completely unnecessary. For log file resizing, we explicitly disable checkpoints, so we should only care that the buffer pool is clean, not whether the page cleaner exited the ‘furious flushing’ mode. I have submitted the removal of the loop for testing. Thank you for catching this, krunalbauskar . For some reason, the condition failed to reproduce in our performance tests. Perhaps axel did not test with a small innodb_log_file_size lately? I think that I did run some perfomrance tests like that during the MDEV-14425 development, but I did not notice this.

Matthias Leich added a comment - 2022-01-18 10:47

The tree

origin/bb-10.5-MDEV-27499 6c66c66ef6c6a6e4a99102fcc465877174e44757 2022-01-17T16:56:07+02:00

behaved well in RQG testing.

The bad effects observed are known == in other trees too.

Matthias Leich added a comment - 2022-01-18 10:47 The tree origin/bb-10.5-MDEV-27499 6c66c66ef6c6a6e4a99102fcc465877174e44757 2022-01-17T16:56:07+02:00 behaved well in RQG testing. The bad effects observed are known == in other trees too.

Marko Mäkelä added a comment - 2022-01-19 17:04

The second wait loop turns out to be necessary for log file resizing after all. We got a few failures of the test innodb.log_file_size, like this:

innodb.log_file_size '8k,innodb'         w5 [ fail ]

…

…/srv/srv0start.cc:854: lsn_t srv_prepare_to_delete_redo_log_file(): Assertion `!buf_pool.any_io_pending()' failed.

I am unable to reproduce this failure locally.

It might be acceptable to replace this assertion with a similar one (!buf_pool.get_oldest_modification(0)), but then we might get into a race condition where the page cleaner would attempt to write a checkpoint to the newly created log file. I will add that wait loop to buf_flush_sync(), like I originally planned to. Other callers of buf_flush_sync() are in debug code.

Marko Mäkelä added a comment - 2022-01-19 17:04 The second wait loop turns out to be necessary for log file resizing after all. We got a few failures of the test innodb.log_file_size , like this: innodb.log_file_size '8k,innodb' w5 [ fail ] … …/srv/srv0start.cc:854: lsn_t srv_prepare_to_delete_redo_log_file(): Assertion `!buf_pool.any_io_pending()' failed. I am unable to reproduce this failure locally. It might be acceptable to replace this assertion with a similar one ( !buf_pool.get_oldest_modification(0) ), but then we might get into a race condition where the page cleaner would attempt to write a checkpoint to the newly created log file. I will add that wait loop to buf_flush_sync() , like I originally planned to. Other callers of buf_flush_sync() are in debug code.

MariaDB Server

tps regression with 10.8 (w.r.t to 10.6.5)

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration