[MDEV-32374] log_sys.lsn_lock is a performance hog - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
Fix Version/s: 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

~~MDEV-27774~~ made it possible to for multiple mtr_t::commit() to write to log_sys.buf concurrently, by replacing the InnoDB log_sys.mutex with log_sys.latch and log_sys.lsn_lock. The latter appears to be a performance bottleneck.

Because we need to keep multiple log_sys data members consistent with each other, we cannot simply invoke std::atomic::fetch_add and std::atomic::compare_exchange_weak on log_sys.buf_free to achieve a similar result. But, we could try to fit more data members in the same single cache line with log_sys.latch.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

b58e1ce.png
2023-11-17 11:50
31 kB
Steve Shaw

Issue Links

is caused by

MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit()

Closed

relates to

MDEV-27866 Switching log_sys.latch to use spin based variant

Closed

MDEV-33515 log_sys.lsn_lock causes excessive context switching

Closed

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Steve Shaw added a comment - 2023-11-17 12:05

Have tested as attached on an Intel Cascade Lake 8280L server and at 24 virtual users the performance gain is over 5%. The gain is seen up to 60 virtual users at 627,393NOPM which is 1,458,929 MariaDB TPM which is likely to exceed the majority of installed systems. The balance is its slightly lower beyond the peak point however not many systems will be tested on bare metal servers past this point.

Steve Shaw added a comment - 2023-11-17 12:05 Have tested as attached on an Intel Cascade Lake 8280L server and at 24 virtual users the performance gain is over 5%. The gain is seen up to 60 virtual users at 627,393NOPM which is 1,458,929 MariaDB TPM which is likely to exceed the majority of installed systems. The balance is its slightly lower beyond the peak point however not many systems will be tested on bare metal servers past this point.

Marko Mäkelä added a comment - 2023-11-17 12:20

wlad, can you please review this version that the graph b58e1ce.png is for?

For the record, initially we got some very mixed results until steve.shaw@intel.com revised HammerDB so that it would clean up the server between each workload batch. Now that it will wait for the history to be purged and the modified pages to be written back to the file system, each test step will be more deterministic, with no garbage from previous steps lying around.

Marko Mäkelä added a comment - 2023-11-17 12:20 wlad , can you please review this version that the graph b58e1ce.png is for? For the record, initially we got some very mixed results until steve.shaw@intel.com revised HammerDB so that it would clean up the server between each workload batch. Now that it will wait for the history to be purged and the modified pages to be written back to the file system, each test step will be more deterministic, with no garbage from previous steps lying around.

Vladislav Vaintroub added a comment - 2023-11-17 16:16 - edited

marko, I suggest following improvement to this patch - do not call log_write_up_to if lsn is 0.
It is going to be a no-op anyway, but quite a more expensive no-op than just an "if"

diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc

index af802e9b2fb..9f39b303964 100644

--- a/storage/innobase/log/log0log.cc

+++ b/storage/innobase/log/log0log.cc

@@ -932,6 +932,7 @@ void log_write_up_to(lsn_t lsn, bool durable,

   ut_ad(!srv_read_only_mode || (log_sys.buf_free < log_sys.max_buf_free));

   ut_ad(lsn != LSN_MAX);

+  ut_ad(lsn != 0);

   if (UNIV_UNLIKELY(recv_no_ibuf_operations))

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc

index 40c5cfe1eb8..d3f963beab9 100644

--- a/storage/innobase/mtr/mtr0mtr.cc

+++ b/storage/innobase/mtr/mtr0mtr.cc

@@ -469,7 +469,8 @@ void mtr_t::commit()

     if (UNIV_UNLIKELY(lsns.second != PAGE_FLUSH_NO))

       buf_flush_ahead(m_commit_lsn, lsns.second == PAGE_FLUSH_SYNC);

-    log_write_up_to(write_lsn, false);

+    if (write_lsn)

+      log_write_up_to(write_lsn, false);

   else

Vladislav Vaintroub added a comment - 2023-11-17 16:16 - edited marko , I suggest following improvement to this patch - do not call log_write_up_to if lsn is 0. It is going to be a no-op anyway, but quite a more expensive no-op than just an "if" diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc index af802e9b2fb..9f39b303964 100644 --- a/storage/innobase/log/log0log.cc +++ b/storage/innobase/log/log0log.cc @@ -932,6 +932,7 @@ void log_write_up_to(lsn_t lsn, bool durable, { ut_ad(!srv_read_only_mode || (log_sys.buf_free < log_sys.max_buf_free)); ut_ad(lsn != LSN_MAX); + ut_ad(lsn != 0); if (UNIV_UNLIKELY(recv_no_ibuf_operations)) { diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc index 40c5cfe1eb8..d3f963beab9 100644 --- a/storage/innobase/mtr/mtr0mtr.cc +++ b/storage/innobase/mtr/mtr0mtr.cc @@ -469,7 +469,8 @@ void mtr_t::commit() if (UNIV_UNLIKELY(lsns.second != PAGE_FLUSH_NO)) buf_flush_ahead(m_commit_lsn, lsns.second == PAGE_FLUSH_SYNC); - log_write_up_to(write_lsn, false); + if (write_lsn) + log_write_up_to(write_lsn, false); } else {

Marko Mäkelä added a comment - 2023-11-18 20:29

wlad, thank you for the suggestion; I will apply it.

A call log_write_up_to(0, false, nullptr) will (unless PMEM is used) cause a call to write_lock.acquire(0, nullptr), which in turn will cause a relaxed load of write_lock.m_value. Other than that, it’s just procedure calls and evaluating conditions based on something that could be passed in registers. Evaluating conditions can be expensive. I agree, this write_lsn = log_sys.get_write_target() would most of the time be 0, and it should be more efficient to add one more condition to avoid an unlikely procedure call and avoid evaluating several conditions inside that procedure.

Marko Mäkelä added a comment - 2023-11-18 20:29 wlad , thank you for the suggestion; I will apply it. A call log_write_up_to(0, false, nullptr) will (unless PMEM is used) cause a call to write_lock.acquire(0, nullptr) , which in turn will cause a relaxed load of write_lock.m_value . Other than that, it’s just procedure calls and evaluating conditions based on something that could be passed in registers. Evaluating conditions can be expensive. I agree, this write_lsn = log_sys.get_write_target() would most of the time be 0, and it should be more efficient to add one more condition to avoid an unlikely procedure call and avoid evaluating several conditions inside that procedure.

Vladislav Vaintroub added a comment - 2023-11-19 07:56

Up to 10.11, but not in 11.x branches anymore, there is also additional code that accesses global variable `recv_no_ibuf_operations` . inside `log_write_up_to`

 if (UNIV_UNLIKELY(recv_no_ibuf_operations))

    /* A non-final batch of recovery is active no writes to the log

    are allowed yet. */

    ut_a(!callback);

    return;

Plus, on Linux, as you mentioned if PMEM is compiled in (it usually is?), log.is_pmem, which reads log.flush_buf

log_write_up_to is not very inefficient when it is used with lsn=0, but I'd better avoid it in mtr_t::commit. I tried it on normal oltp_update_index, when it was called approx 1 million times per second, and lsn was not 0 only about every 2 seconds, giving 99.9999% probability of a dummy invocation.

Vladislav Vaintroub added a comment - 2023-11-19 07:56 Up to 10.11, but not in 11.x branches anymore, there is also additional code that accesses global variable `recv_no_ibuf_operations` . inside `log_write_up_to` if (UNIV_UNLIKELY(recv_no_ibuf_operations)) { /* A non-final batch of recovery is active no writes to the log are allowed yet. */ ut_a(!callback); return ; } Plus, on Linux, as you mentioned if PMEM is compiled in (it usually is?), log.is_pmem, which reads log.flush_buf log_write_up_to is not very inefficient when it is used with lsn=0, but I'd better avoid it in mtr_t::commit. I tried it on normal oltp_update_index, when it was called approx 1 million times per second, and lsn was not 0 only about every 2 seconds, giving 99.9999% probability of a dummy invocation.

MariaDB Server

log_sys.lsn_lock is a performance hog

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration