Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32374

log_sys.lsn_lock is a performance hog

Details

    Description

      MDEV-27774 made it possible to for multiple mtr_t::commit() to write to log_sys.buf concurrently, by replacing the InnoDB log_sys.mutex with log_sys.latch and log_sys.lsn_lock. The latter appears to be a performance bottleneck.

      Because we need to keep multiple log_sys data members consistent with each other, we cannot simply invoke std::atomic::fetch_add and std::atomic::compare_exchange_weak on log_sys.buf_free to achieve a similar result. But, we could try to fit more data members in the same single cache line with log_sys.latch.

      Attachments

        Issue Links

          Activity

            steve.shaw@intel.com Steve Shaw added a comment -

            Have tested as attached on an Intel Cascade Lake 8280L server and at 24 virtual users the performance gain is over 5%. The gain is seen up to 60 virtual users at 627,393NOPM which is 1,458,929 MariaDB TPM which is likely to exceed the majority of installed systems. The balance is its slightly lower beyond the peak point however not many systems will be tested on bare metal servers past this point.

            steve.shaw@intel.com Steve Shaw added a comment - Have tested as attached on an Intel Cascade Lake 8280L server and at 24 virtual users the performance gain is over 5%. The gain is seen up to 60 virtual users at 627,393NOPM which is 1,458,929 MariaDB TPM which is likely to exceed the majority of installed systems. The balance is its slightly lower beyond the peak point however not many systems will be tested on bare metal servers past this point.

            wlad, can you please review this version that the graph b58e1ce.png is for?

            For the record, initially we got some very mixed results until steve.shaw@intel.com revised HammerDB so that it would clean up the server between each workload batch. Now that it will wait for the history to be purged and the modified pages to be written back to the file system, each test step will be more deterministic, with no garbage from previous steps lying around.

            marko Marko Mäkelä added a comment - wlad , can you please review this version that the graph b58e1ce.png is for? For the record, initially we got some very mixed results until steve.shaw@intel.com revised HammerDB so that it would clean up the server between each workload batch. Now that it will wait for the history to be purged and the modified pages to be written back to the file system, each test step will be more deterministic, with no garbage from previous steps lying around.
            wlad Vladislav Vaintroub added a comment - - edited

            marko, I suggest following improvement to this patch - do not call log_write_up_to if lsn is 0.
            It is going to be a no-op anyway, but quite a more expensive no-op than just an "if"

            diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
            index af802e9b2fb..9f39b303964 100644
            --- a/storage/innobase/log/log0log.cc
            +++ b/storage/innobase/log/log0log.cc
            @@ -932,6 +932,7 @@ void log_write_up_to(lsn_t lsn, bool durable,
             {
               ut_ad(!srv_read_only_mode || (log_sys.buf_free < log_sys.max_buf_free));
               ut_ad(lsn != LSN_MAX);
            +  ut_ad(lsn != 0);
             
               if (UNIV_UNLIKELY(recv_no_ibuf_operations))
               {
            diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
            index 40c5cfe1eb8..d3f963beab9 100644
            --- a/storage/innobase/mtr/mtr0mtr.cc
            +++ b/storage/innobase/mtr/mtr0mtr.cc
            @@ -469,7 +469,8 @@ void mtr_t::commit()
                 if (UNIV_UNLIKELY(lsns.second != PAGE_FLUSH_NO))
                   buf_flush_ahead(m_commit_lsn, lsns.second == PAGE_FLUSH_SYNC);
             
            -    log_write_up_to(write_lsn, false);
            +    if (write_lsn)
            +      log_write_up_to(write_lsn, false);
               }
               else
               {
            

            wlad Vladislav Vaintroub added a comment - - edited marko , I suggest following improvement to this patch - do not call log_write_up_to if lsn is 0. It is going to be a no-op anyway, but quite a more expensive no-op than just an "if" diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc index af802e9b2fb..9f39b303964 100644 --- a/storage/innobase/log/log0log.cc +++ b/storage/innobase/log/log0log.cc @@ -932,6 +932,7 @@ void log_write_up_to(lsn_t lsn, bool durable, { ut_ad(!srv_read_only_mode || (log_sys.buf_free < log_sys.max_buf_free)); ut_ad(lsn != LSN_MAX); + ut_ad(lsn != 0); if (UNIV_UNLIKELY(recv_no_ibuf_operations)) { diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc index 40c5cfe1eb8..d3f963beab9 100644 --- a/storage/innobase/mtr/mtr0mtr.cc +++ b/storage/innobase/mtr/mtr0mtr.cc @@ -469,7 +469,8 @@ void mtr_t::commit() if (UNIV_UNLIKELY(lsns.second != PAGE_FLUSH_NO)) buf_flush_ahead(m_commit_lsn, lsns.second == PAGE_FLUSH_SYNC); - log_write_up_to(write_lsn, false); + if (write_lsn) + log_write_up_to(write_lsn, false); } else {

            wlad, thank you for the suggestion; I will apply it.

            A call log_write_up_to(0, false, nullptr) will (unless PMEM is used) cause a call to write_lock.acquire(0, nullptr), which in turn will cause a relaxed load of write_lock.m_value. Other than that, it’s just procedure calls and evaluating conditions based on something that could be passed in registers. Evaluating conditions can be expensive. I agree, this write_lsn = log_sys.get_write_target() would most of the time be 0, and it should be more efficient to add one more condition to avoid an unlikely procedure call and avoid evaluating several conditions inside that procedure.

            marko Marko Mäkelä added a comment - wlad , thank you for the suggestion; I will apply it. A call log_write_up_to(0, false, nullptr) will (unless PMEM is used) cause a call to write_lock.acquire(0, nullptr) , which in turn will cause a relaxed load of write_lock.m_value . Other than that, it’s just procedure calls and evaluating conditions based on something that could be passed in registers. Evaluating conditions can be expensive. I agree, this write_lsn = log_sys.get_write_target() would most of the time be 0, and it should be more efficient to add one more condition to avoid an unlikely procedure call and avoid evaluating several conditions inside that procedure.

            Up to 10.11, but not in 11.x branches anymore, there is also additional code that accesses global variable `recv_no_ibuf_operations` . inside `log_write_up_to`

             if (UNIV_UNLIKELY(recv_no_ibuf_operations))
              {
                /* A non-final batch of recovery is active no writes to the log
                are allowed yet. */
                ut_a(!callback);
                return;
              }
            

            Plus, on Linux, as you mentioned if PMEM is compiled in (it usually is?), log.is_pmem, which reads log.flush_buf

            log_write_up_to is not very inefficient when it is used with lsn=0, but I'd better avoid it in mtr_t::commit. I tried it on normal oltp_update_index, when it was called approx 1 million times per second, and lsn was not 0 only about every 2 seconds, giving 99.9999% probability of a dummy invocation.

            wlad Vladislav Vaintroub added a comment - Up to 10.11, but not in 11.x branches anymore, there is also additional code that accesses global variable `recv_no_ibuf_operations` . inside `log_write_up_to` if (UNIV_UNLIKELY(recv_no_ibuf_operations)) { /* A non-final batch of recovery is active no writes to the log are allowed yet. */ ut_a(!callback); return ; } Plus, on Linux, as you mentioned if PMEM is compiled in (it usually is?), log.is_pmem, which reads log.flush_buf log_write_up_to is not very inefficient when it is used with lsn=0, but I'd better avoid it in mtr_t::commit. I tried it on normal oltp_update_index, when it was called approx 1 million times per second, and lsn was not 0 only about every 2 seconds, giving 99.9999% probability of a dummy invocation.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.