[MDEV-21452] Use condition variables and normal mutexes instead of InnoDB os_event and mutex Created: 2020-01-10 Updated: 2023-09-01 Resolved: 2020-12-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking |
| Affects Version/s: | 10.4.10 |
| Fix Version/s: | 10.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Black | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | performance | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
investigation suggested by marko on zulip after reading http://smalldatum.blogspot.com/2020/01/it-is-all-about-constant-factors.html no patches - built just straight from 10.4.10 release tag. Built with cmake -DMUTEXTYPE=$type -DCMAKE_PREFIX_INSTALL=/scratch/mariadb-10.4.10-$type $HOME/mariadb-10.4.10 TPCCRunner test:
Note: while futex was run for much less time - innodb_buffer_pool_dump_pct=100 from the last run and the 30 minutes was receiving consistent throughput.
|
| Comments |
| Comment by Marko Mäkelä [ 2020-01-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
svoj, can you please test
on different instruction set architectures and possibly several recent microarchitectures of AMD64 implementations (such as Intel Skylake and its ‘temporal neighbours’)? If the system mutex is not significantly slower than the specialized ones on any platform (GNU/Linux on AMD64, Aarch64; Windows on AMD64), I think that we should remove the specialized code. danblack conducted his tests on POWER 8. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-03-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
On my 16-core, 32-thread Intel machines I see no significant difference between the different mutex implementations. See attached spread sheet MDEV-21452.ods | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2020-03-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
You won't see any difference for sure until backup locks is the largest bottleneck. With backup locks disabled I can immediately see 20% throughput decline in oltp_update_index benchmark. Backup locks disabled with this patch:
Results for event vs sys mutex:
I tend to remember I saw larger difference in 10.2. In 10.3 we have trx_sys.mutex eliminated from hot path, so it matters less. FWIU in 10.5 we have log_sys mutexes eliminated from hot path, so it matters even less. But still 20% on not that recent hardware is a lot. All in all event mutex seem to give better throughput compared to pthread mutex under certain loads. Implementing own locking primitives is a dead end for sure. But until we fix contention it'd probably be wise to keep them. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-03-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The tests innodb.innodb_sys_semaphore_waits and sys_vars.innodb_fatal_semaphore_wait_threshold depend on MUTEXTYPE=event. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-03-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
New results in MDEV-21452-nbl.ods The server was now patched to disable backup locks. I also added the update_index test from sysbench OLTP (updating an indexed column with autocommit). Now I see a clear performance impact for 'sys' mutexes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2020-03-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We decided to evaluate the performance of the said mutex change on all aarch. I evaluated it for ARM. Here are the results. We didn't observe any major regression or performance gain arising from switching to sys mutex or futex. So, for now, the recommendation would be to continue with the existing default (event). To find out more about how we executed the test you can checkout | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that before we can proceed on this, we must remove some already known InnoDB mutex contention points, mainly in lock_sys and log_sys. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2020-06-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Both futex and sys provide perf improvements over event with Power only, even with the lock_sys/log_sys currently. Maybe I can try to pick one of them trying to make an assumption on the potential gains of lock_sys/log_sys, however it doesn't obviously have to be permanent. I assume you have significant confidence in the futex and sys implementations despite their non-default status? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
One more case of underlying contention was filed in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
danblack, the current default seems to work fine on AMD64 and ARM. I am OK to change the mutex type on POWER for some or all InnoDB mutexes on 10.5 or 10.6, if it can be demonstrated to improve the performance. I believe that we must also deal with | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-08-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
While working on Removing the InnoDB mutex implementation would also remove some instrumentation that is currently available either via SHOW ENGINE INNODB MUTEX or INFORMATION_SCHEMA.innodb_metrics (MDEV-15706). I do not think that we can replace all InnoDB rw_lock_t with mysql_rwlock_t. Emulating the custom SX mode with two rw-locks would most likely incur some performance overhead. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-09-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I started work to replace os_event_t and ib_mutex_t with mysql_cond_t and mysql_mutex_t. So far, this includes simplifying the InnoDB rw_lock_t to use a mutex and two condition variables, instead of using two os_event_t. We still have 10 os_event_t and 45 ib_mutex_t left. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-09-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We have 8 os_event_create() and 37 mutex_create() calls left. In fts_cache_t, two rw_lock_t can be replaced with ordinary mutex. Some mutexes were orphan. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-09-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In lock_wait_suspend_thread() we seem to have some low-hanging fruit:
We should actually protect trx->lock.wait_lock with trx_t::mutex to reduce contention on lock_sys.mutex. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-10-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
SHOW ENGINE INNODB MUTEX as well as INFORMATION_SCHEMA.INNODB_MUTEXES would no longer display any information about InnoDB mutexes. They may output information about InnoDB homebrew rw_lock_t. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2020-11-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The corresponding source tree behaved well during RQG testing. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
INFORMATION_SCHEMA.INNODB_MUTEXES ( The command SHOW ENGINE INNODB MUTEX will return no output, because no InnoDB-specific instrumentation of rw-locks or mutexes will exist. Also removed will be the InnoDB latching order checks and the debug parameter innodb_sync_debug as well as the view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS. In the current state of development, the parameter innodb_fatal_semaphore_wait_threshold will not be enforced at all, and there will be no "Long semaphore wait" messages. I will have to implement something special (probably timeout-based acquisition) for dict_sys.mutex and lock_sys.mutex. Those should cover most hangs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2020-12-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
commit 9159383f32d8350dfa91bb62c825c64b1dc091d1 (HEAD, origin/bb-10.6- | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I implemented special enforcement of innodb_fatal_semaphore_wait_threshold for dict_sys.mutex and lock_sys.mutex. Due to an observed performance regression at high concurrency, I removed the lock_sys.mutex instrumentation and retained only the one on dict_sys.mutex. If pthread_mutex_trylock() fails, then the current thread would compare-and-swap 0 with its current time before waiting in pthread_mutex_lock(). Either the srv_monitor_task() or a subsequent thread that attempts to acquire dict_sys.mutex would then enforce the innodb_fatal_semaphore_wait_threshold and kill the process if necessary. While rewriting the test sys_vars.innodb_fatal_semaphore_wait_threshold accordingly, I noticed that not all hangs would be caught even in the data dictionary cache. For example, if a DDL operation hung while holding both dict_sys.latch and dict_sys.mutex, a subsequent DDL operation would hang while waiting for dict_sys.latch, before even starting the wait for dict_sys.mutex. But, DML threads that are trying to open a table would acquire dict_sys.mutex and be subject to the watchdog. Hopefully this type of watchdog testing will be adequate. We could of course add more instrumentation to debug builds. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The main reason for having the homebrew mutexes was that their built-in spin loops could lead to better performance than the native implementation on contended mutexes. Some performance regression was observed for larger thread counts (exceeding the CPU core count) when updating non-indexed columns. I suspect that the culprit is contention on lock_sys.mutex, and I believe that implementing Also log_sys.mutex is known to be a source of contention, but it was changed to a native mutex already in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We observed frequent timeouts and extremely slow execution time of the test mariabackup.xb_compressed_encrypted especially on Microsoft Windows builders. Those machines have 4 processor cores, and they run 4 client/server process pairs in parallel. (Our Linux builders have a lot more processor cores.) The used to specify innodb_encryption_threads=4. That is, there was one page cleaner thread doing the actual work of writing data pages, and 4 ‘manager’ threads that fight each other to see who gets to wield the shovel and add more dirt to the pile that the page cleaner is trying to transport away. Changing the test to use innodb_encryption_threads=1 seems to have fixed the problem. With the previous setting, the test timed out on win32-debug on two successive runs; with the lower setting innodb_encryption_threads=1 it passed (at least once), consuming 13, 14, and 41 seconds on win32-debug and 14, 22, 27 seconds on win64-debug. On a previous run with innodb_encryption_threads=4 , the execution time was more than 500 seconds on win64-debug, and for 2 of the 3 innodb_page_size values, the execution time exceeded 900 seconds on win32-debug. Thanks to wlad for making the observation that the encryption threads were conflicting with each other. In This highlights a benefit of the homebrew mutexes that we removed: Spinning may yield a little better throughput when there is a lot of contention. I agree with the opnion that svoj has stated earlier: it is better to fix the underlying contention than to implement workarounds. I am confident that with | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The problem with the constantly sleeping and waking encryption threads was partially addressed in |