Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27499

tps regression with 10.8 (w.r.t to 10.6.5)

Details

    Description

      • while benchmarking 10.8 (#4b14874c2809a8b0f2d04a969132e062fb22b145) work-in-progress release I observed there is a regression with update-non-index, uniform workload (1024 scalability).
      • tps often drops to 0.

      Attachments

        Issue Links

          Activity

            Marko,

            1. the said patch doesn't help.
            2. Condition is constant till buf_flush_sync_lsn is reset to 0. lsn is not changing.
            3. I am still confused why we need that condition there?

            krunalbauskar Krunal Bauskar added a comment - Marko, 1. the said patch doesn't help. 2. Condition is constant till buf_flush_sync_lsn is reset to 0. lsn is not changing. 3. I am still confused why we need that condition there?

            krunalbauskar, in case you wonder why there are two wait loops in the function buf_flush_wait(): The first one waits for the minimum oldest_modification in the buffer pool to be large enough. That is the prerequisite for a log checkpoint. The oldest_modification of each modified page would be cleared in buf_page_write_complete() in I/O callback threads.

            The final loop is waiting for the checkpoint to actually be advanced by the buf_flush_page_cleaner(). That condition is currently waiting for the page cleaner to completely exit the ‘furious flushing’ mode before exiting.

            I now realize that my initially suggested change cannot possibly help, because whenever buf_flush_sync_lsn is nonzero, it would be guaranteed to be at least lsn, by design. What we really might need is to wait for the log checkpoint to have advanced enough. Something like this:

            diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
            index dec0292a088..7480e0cf673 100644
            --- a/storage/innobase/buf/buf0flu.cc
            +++ b/storage/innobase/buf/buf0flu.cc
            @@ -1799,7 +1799,7 @@ static void buf_flush_wait(lsn_t lsn)
               }
             
               /* Wait for the checkpoint. */
            -  while (buf_flush_sync_lsn)
            +  while (buf_flush_sync_lsn && log_sys.last_checkpoint_lsn < lsn)
                 my_cond_wait(&buf_pool.done_flush_list,
                              &buf_pool.flush_list_mutex.m_mutex);
             }
            

            I will have to review whether that second loop is needed at all. It should be unnecessary for the correctness of log_checkpoint_margin(), but I have to consider other callers as well.

            marko Marko Mäkelä added a comment - krunalbauskar , in case you wonder why there are two wait loops in the function buf_flush_wait() : The first one waits for the minimum oldest_modification in the buffer pool to be large enough. That is the prerequisite for a log checkpoint. The oldest_modification of each modified page would be cleared in buf_page_write_complete() in I/O callback threads. The final loop is waiting for the checkpoint to actually be advanced by the buf_flush_page_cleaner() . That condition is currently waiting for the page cleaner to completely exit the ‘furious flushing’ mode before exiting. I now realize that my initially suggested change cannot possibly help, because whenever buf_flush_sync_lsn is nonzero, it would be guaranteed to be at least lsn , by design. What we really might need is to wait for the log checkpoint to have advanced enough. Something like this: diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc index dec0292a088..7480e0cf673 100644 --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -1799,7 +1799,7 @@ static void buf_flush_wait(lsn_t lsn) } /* Wait for the checkpoint. */ - while (buf_flush_sync_lsn) + while (buf_flush_sync_lsn && log_sys.last_checkpoint_lsn < lsn) my_cond_wait(&buf_pool.done_flush_list, &buf_pool.flush_list_mutex.m_mutex); } I will have to review whether that second loop is needed at all. It should be unnecessary for the correctness of log_checkpoint_margin() , but I have to consider other callers as well.
            marko Marko Mäkelä added a comment - - edited

            I believe that the second loop of buf_flush_wait() (waiting for the checkpoint) that was added in MDEV-27416 might only be needed for the buf_flush_sync() call in srv_prepare_to_delete_redo_log_file(). Other calls of buf_flush_sync() are in debug code, so it is fine for them to wait for the checkpoint. But, the calls in log_checkpoint_margin() definitely do not need to wait for a checkpoint; the checkpoint age will explicitly be checked after buf_flush_wait_flushed().

            Edit: I think that the loop is completely unnecessary. For log file resizing, we explicitly disable checkpoints, so we should only care that the buffer pool is clean, not whether the page cleaner exited the ‘furious flushing’ mode. I have submitted the removal of the loop for testing.

            Thank you for catching this, krunalbauskar. For some reason, the condition failed to reproduce in our performance tests. Perhaps axel did not test with a small innodb_log_file_size lately? I think that I did run some perfomrance tests like that during the MDEV-14425 development, but I did not notice this.

            marko Marko Mäkelä added a comment - - edited I believe that the second loop of buf_flush_wait() (waiting for the checkpoint) that was added in MDEV-27416 might only be needed for the buf_flush_sync() call in srv_prepare_to_delete_redo_log_file() . Other calls of buf_flush_sync() are in debug code, so it is fine for them to wait for the checkpoint. But, the calls in log_checkpoint_margin() definitely do not need to wait for a checkpoint; the checkpoint age will explicitly be checked after buf_flush_wait_flushed() . Edit: I think that the loop is completely unnecessary. For log file resizing, we explicitly disable checkpoints, so we should only care that the buffer pool is clean, not whether the page cleaner exited the ‘furious flushing’ mode. I have submitted the removal of the loop for testing. Thank you for catching this, krunalbauskar . For some reason, the condition failed to reproduce in our performance tests. Perhaps axel did not test with a small innodb_log_file_size lately? I think that I did run some perfomrance tests like that during the MDEV-14425 development, but I did not notice this.

            The tree
            origin/bb-10.5-MDEV-27499 6c66c66ef6c6a6e4a99102fcc465877174e44757 2022-01-17T16:56:07+02:00
            behaved well in RQG testing.
            The bad effects observed are known == in other trees too.
            

            mleich Matthias Leich added a comment - The tree origin/bb-10.5-MDEV-27499 6c66c66ef6c6a6e4a99102fcc465877174e44757 2022-01-17T16:56:07+02:00 behaved well in RQG testing. The bad effects observed are known == in other trees too.

            The second wait loop turns out to be necessary for log file resizing after all. We got a few failures of the test innodb.log_file_size, like this:

            innodb.log_file_size '8k,innodb'         w5 [ fail ]
            …/srv/srv0start.cc:854: lsn_t srv_prepare_to_delete_redo_log_file(): Assertion `!buf_pool.any_io_pending()' failed.
            

            I am unable to reproduce this failure locally.

            It might be acceptable to replace this assertion with a similar one (!buf_pool.get_oldest_modification(0)), but then we might get into a race condition where the page cleaner would attempt to write a checkpoint to the newly created log file. I will add that wait loop to buf_flush_sync(), like I originally planned to. Other callers of buf_flush_sync() are in debug code.

            marko Marko Mäkelä added a comment - The second wait loop turns out to be necessary for log file resizing after all. We got a few failures of the test innodb.log_file_size , like this: innodb.log_file_size '8k,innodb' w5 [ fail ] … …/srv/srv0start.cc:854: lsn_t srv_prepare_to_delete_redo_log_file(): Assertion `!buf_pool.any_io_pending()' failed. I am unable to reproduce this failure locally. It might be acceptable to replace this assertion with a similar one ( !buf_pool.get_oldest_modification(0) ), but then we might get into a race condition where the page cleaner would attempt to write a checkpoint to the newly created log file. I will add that wait loop to buf_flush_sync() , like I originally planned to. Other callers of buf_flush_sync() are in debug code.

            People

              marko Marko Mäkelä
              krunalbauskar Krunal Bauskar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.