Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4
-
GNU/Linux, NUMA on Intel Xeon
Description
steve.shaw@intel.com is reporting that write intensive workloads on a NUMA system end up spending a lot of time in native_queued_spin_lock_slowpath.part.0 in the Linux kernel. He has provided a patch that adds a user-space spinlock around the calls to mtr_t::do_write() and is significantly improving throughput at larger numbers of concurrent connections in his test environment.
As far as I can tell, that patch would only allow one mtr_t::do_write() call to proceed at a time, and thus make waits on log_sys.latch extremely unlikely. But that would also seem to ruin part of what MDEV-27774 achieved.
If I understood it correctly, the idea would be better implemented at a slightly lower level, to allow maximum concurrency:
diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
|
index b819022fec6..884bb5af5c1 100644
|
--- a/storage/innobase/mtr/mtr0mtr.cc
|
+++ b/storage/innobase/mtr/mtr0mtr.cc
|
@@ -1052,7 +1052,7 @@ std::pair<lsn_t,mtr_t::page_flush_ahead> mtr_t::do_write()
|
}
|
|
if (!m_latch_ex)
|
- log_sys.latch.rd_lock(SRW_LOCK_CALL);
|
+ log_sys.latch.rd_spin_lock();
|
|
if (UNIV_UNLIKELY(m_user_space && !m_user_space->max_lsn &&
|
!is_predefined_tablespace(m_user_space->id))) |
The to-be-written member function rd_lock_spin() would avoid invoking futex_wait(), and instead keep invoking MY_RELAX_CPU() in the spin loop.
An exclusive log_sys.latch will be acquired rarely and held for rather short time, during DDL operations, undo tablespace truncation, as well as around log checkpoints.
Some experimentation will be needed to find something that scales well across the board (from embedded systems to high-end servers).
Attachments
Issue Links
- causes
-
MDEV-34422 InnoDB writes corrupted log on macOS and AIX due to uninitialized log_sys.lsn_lock
-
- Closed
-
- relates to
-
MDEV-27866 Switching log_sys.latch to use spin based variant
-
- Closed
-
-
MDEV-32374 log_sys.lsn_lock is a performance hog
-
- Closed
-
-
MDEV-21923 LSN allocation is a bottleneck
-
- In Progress
-
- blocks
-
PERF-407 Failed to load
Activity
Field | Original Value | New Value |
---|---|---|
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Remote Link | This issue links to "PERF-407 (Jira)" [ 36639 ] |
Summary | log_sys.latch causes excessive context switching on NUMA systems | log_sys.lsn_lock causes excessive context switching |
Attachment | update_index_256threads_x_10tables_x_1mio_rows.svg [ 73295 ] |
Attachment | baseline.svg [ 73296 ] | |
Attachment | spinflag.svg [ 73297 ] |
Assignee | Marko Mäkelä [ marko ] | Vladislav Vaintroub [ wlad ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Attachment | 1socket.png [ 73312 ] | |
Attachment | 2socket.png [ 73313 ] |
Attachment | 1socket-1.png [ 73314 ] |
Attachment | 1socket-1.png [ 73314 ] |
Attachment | mariadbtpm.png [ 73315 ] |
Assignee | Vladislav Vaintroub [ wlad ] | Marko Mäkelä [ marko ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
issue.field.resolutiondate | 2024-03-26 16:06:52.0 | 2024-03-26 16:06:52.18 |
Fix Version/s | 10.11.8 [ 29630 ] | |
Fix Version/s | 11.0.6 [ 29628 ] | |
Fix Version/s | 11.1.5 [ 29629 ] | |
Fix Version/s | 11.2.4 [ 29631 ] | |
Fix Version/s | 11.4.2 [ 29633 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 11.0 [ 28320 ] | |
Fix Version/s | 11.2 [ 28603 ] | |
Fix Version/s | 11.4 [ 29301 ] | |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link | This issue relates to MDEV-21923 [ MDEV-21923 ] |
Link |
This issue causes |
Come to think of it, the actual issue could be the lock_lsn() call in log_t::append_prepare(). Maybe we simply need a spin loop around that. Perhaps the existing implementation would work?
diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h
index 54851ca0a65..1a6bde20cad 100644
--- a/storage/innobase/include/log0log.h
+++ b/storage/innobase/include/log0log.h
@@ -182,7 +182,7 @@ struct log_t
typedef pthread_mutex_wrapper<true> log_lsn_lock;
#else
typedef srw_lock log_rwlock;
- typedef srw_mutex log_lsn_lock;
+ typedef srw_spin_mutex log_lsn_lock;
#endif
Another alternative would be the pthread_mutex_wrapper<true>, which we are using on ARM. That template parameter spinloop=true refers to MY_MUTEX_INIT_FAST a.k.a. PTHREAD_MUTEX_ADAPTIVE_NP, which enables a spin loop pthread_mutex_lock() in the GNU libc.