[MDEV-26779] reduce lock_sys.wait_mutex contention by using spinloop construct Created: 2021-10-07 Updated: 2021-10-27 Resolved: 2021-10-27 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6 |
| Fix Version/s: | 10.6.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Krunal Bauskar | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | performance | ||
| Environment: |
GNU/Linux on ARMv8 (Aarch64) |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
reduce lock_sys.wait_mutex contention by using spinloop construct
|
| Comments |
| Comment by Krunal Bauskar [ 2021-10-07 ] |
|
PR created: https://github.com/MariaDB/server/pull/1923 |
| Comment by Krunal Bauskar [ 2021-10-07 ] |
|
Please check the attached graph. 1. For zipfian (contention-based workload) patch has shown significant improvement (70-100% for all architecture x86 and arm). (test-case: CPU bound, 300 sec (test) + 20 sec (sleep) ... repeated 5 times) |
| Comment by Krunal Bauskar [ 2021-10-14 ] |
|
1. As part of some other issue ( Given the inbuilt construct, it was decided to try the said approach to use MY_MUTEX_INIT_FAST for wait_mutex. testing seems to suggest that the said pthread inbuilt construct is less effective. Please check the attached graphs. |
| Comment by Krunal Bauskar [ 2021-10-14 ] |
|
Adding graph for x86 that too re-proves the fact that custom spin loop is performing better than pthread based inherent spin loop. |
| Comment by Mark Callaghan [ 2021-10-14 ] |
|
A feature of the custom mutex code provided by upstream MySQL/InnoDB was explicit control over spinning (how long to spin before going to sleep). While Pthread mutex has a variant that spins, the spin duration is much less than the duration for the custom InnoDB mutex. This is an interesting problem. The mutex behavior, whether from pthread mutex or custom InnoDB mutex, is global (all instances of the mutex behave mostly the same). But it is likely that some mutex instances benefit from longer spinning durations, while others do not. |
| Comment by Marko Mäkelä [ 2021-10-15 ] |
|
mdcallag, indeed, we did not check how long PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP would spin. In MariaDB 10.6, the lock_sys was changed quite a bit, When the lock_wait() function was introduced in the fix of axel is yet to report his numbers. For the one-liner change to replace nullptr with MY_MUTEX_INIT_FAST in the mysql_mutex_init() call, he observed serious regressions. I would guess that the adaptive logic in the GNU libc would be badly misguided due to the two fundamentally different scenarios where pthread_mutex_lock() is acquired. If we are holding exclusive lock_sys.latch, other threads cannot create any locks, and therefore the contention on lock_sys.wait_mutex should disappear after the current holder releases it. In MariaDB, the spinning logic has evolved over time. A long time ago in While I believe that spinloops are mostly working around a bad design, this lock_sys.wait_mutex is a special case that needs special measures. I did not come up with any good idea how to split or partition that mutex. I think that the only badly contended InnoDB mutex in MariaDB 10.6 is log_sys.mutex, which I hope to fix with a redo log block format change in It might be interesting to check whether we would be better off without any MY_MUTEX_INIT_FAST in our version of InnoDB. We are using it since 10.5 ( |
| Comment by Axel Schwenke [ 2021-10-15 ] |
|
Added sysbench.pdf |
| Comment by Krunal Bauskar [ 2021-10-18 ] |
|
@Axel, Seems like there are a lot of un-related branches that is creating some confusion here including a variant of the patch. Can you test it once maybe a quick test with write-only-workload (please try both uniform and Zipfian) and share the result? If the original patch helps then more optimization as being proposed to non-inline function, one-liner could be explored. |
| Comment by Krunal Bauskar [ 2021-10-18 ] |
|
please check https://jira.mariadb.org/secure/attachment/59788/x86-wait-mutex-run.png. |
| Comment by Krunal Bauskar [ 2021-10-19 ] |
|
Axel, |
| Comment by Vladislav Vaintroub [ 2021-10-19 ] |
|
I think 2% can be sacrificed, for the sake of getting accurate profiles. We did not know that ut_delay() accounts for 40% to 70% in 10.4, in some sysbench, until I uninlined it, to get the accurate picture |
| Comment by Vladislav Vaintroub [ 2021-10-19 ] |
|
In general, spinning tends to hide true scalability bugs. I believe if at all, spinlocks must be last measure , exception, not a rule. I think for example, that it is possible to decrease contention on log_sys.mutex, and simultaneous copying to redo log buffer without holding a mutex is doable . However once contention is hidden with spins, there will be less motivation to fix bottlenecks. |
| Comment by Mark Callaghan [ 2021-10-19 ] |
|
I get your point but there will always be contention and it helps to have some mutexes that can tolerate it via some spinning before they go to sleep. There used to be a MySQL storage engine written by someone who didn't like spin locks. That engine had a custom mutex that included busy-wait spinning, alas the code had a simple bug and the compiler optimized away the spinning. By fixing that, and bringing back spinning, I contributed a large speedup (2X) to Falcon. |
| Comment by Krunal Bauskar [ 2021-10-25 ] |
|
After folding of the multiple changes related to buf_pool mutex optimization I re-evaluated the said patch. 1. For uniform (as we observed before) there is no change in performance. |
| Comment by Axel Schwenke [ 2021-10-26 ] |
|
MDEV-26779-1.pdf |
| Comment by Axel Schwenke [ 2021-10-27 ] |
|
MDEV-26779-2.pdf |
| Comment by Marko Mäkelä [ 2021-10-27 ] |
|
I think that for now, we can apply a simple ARMv8-specific change of initializing lock_sys.wait_mutex with MY_MUTEX_INIT_FAST a.k.a. PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP, similar to how we did to the log_sys mutexes in I would not change other platforms than ARMv8, because our own tests on AMD64 do not show any significant improvement. Spinning is basically a hack to work around contention. The lock_sys.wait_mutex must be split in some way to properly fix this, in a future task. |
| Comment by Marko Mäkelä [ 2021-10-27 ] |
|
I think that before attempting to split lock_sys.wait_mutex, we should implement the following and re-evaluate the situation:
|