[MDEV-21452] Use condition variables and normal mutexes instead of InnoDB os_event and mutex - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.4.10
Fix Version/s: 10.6.0
Component/s: Locking
Labels:
- performance

Description

investigation suggested by marko on zulip after reading http://smalldatum.blogspot.com/2020/01/it-is-all-about-constant-factors.html

no patches - built just straight from 10.4.10 release tag. Built with cmake -DMUTEXTYPE=$type -DCMAKE_PREFIX_INSTALL=/scratch/mariadb-10.4.10-$type $HOME/mariadb-10.4.10
Distro ubuntu-18.04 compiler.

TPCCRunner test:

POWER8, altivec supported - 20 core, 8 thread/core
$ tail fullrun-master-fstn4-mariadb-10.4.10-futex-28444.txt fullrun-master-fstn4-mariadb-10.4.10-event-48215.txt fullrun-master-fstn4-mariadb-10.4.10-sys-60112.txt
==> fullrun-master-fstn4-mariadb-10.4.10-futex-28444.txt <==

timestamp tpm avg_rt max_rt avg_db_rt max_db_rt
average 2519939.03 50.01 687 50.00 687

All phase Transactions: 100508512
Warmup phase Transactions: 24910341
Run phase Transactions: 75598171

Waiting slaves to terminate users.
All slaves disconnected.

==> fullrun-master-fstn4-mariadb-10.4.10-event-48215.txt <==

timestamp tpm avg_rt max_rt avg_db_rt max_db_rt
average 1944470.28 63.97 782 63.96 782

All phase Transactions: 466885487
Warmup phase Transactions: 350217270
Run phase Transactions: 116668217

Waiting slaves to terminate users.
All slaves disconnected.

==> fullrun-master-fstn4-mariadb-10.4.10-sys-60112.txt <==

timestamp tpm avg_rt max_rt avg_db_rt max_db_rt
average 2412875.70 51.72 846 51.71 846

All phase Transactions: 579124495
Warmup phase Transactions: 434351953
Run phase Transactions: 144772542

Waiting slaves to terminate users.
All slaves disconnected.

Note: while futex was run for much less time - innodb_buffer_pool_dump_pct=100 from the last run and the 30 minutes was receiving consistent throughput.

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz - 22 core, 4 thread/core
==> fullrun-master-ka4-mariadb-10.4.10-event-rr-50070.txt <==

timestamp tpm avg_rt max_rt avg_db_rt max_db_rt
average 3354020.40 34.35 487 34.32 487

All phase Transactions: 132952152
Warmup phase Transactions: 32331540
Run phase Transactions: 100620612

Waiting slaves to terminate users.
All slaves disconnected.

==> fullrun-master-ka4-mariadb-10.4.10-futex-rr-63543.txt <==

timestamp tpm avg_rt max_rt avg_db_rt max_db_rt
average 3362135.83 33.50 604 33.48 604

All phase Transactions: 131218680
Warmup phase Transactions: 30354605
Run phase Transactions: 100864075

Waiting slaves to terminate users.
All slaves disconnected.

==> fullrun-master-ka4-mariadb-10.4.10-sys-rr-56865.txt <==

timestamp tpm avg_rt max_rt avg_db_rt max_db_rt
average 3363324.87 34.13 996 34.11 996

All phase Transactions: 132642637
Warmup phase Transactions: 31742891
Run phase Transactions: 100899746

Waiting slaves to terminate users.
All slaves disconnected.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

master.properties.60
2020-01-10 05:02
0.8 kB
Daniel Black
MDEV-21452.ods
2020-03-25 06:54
80 kB
Daniel Black
MDEV-21452.ods
2020-03-23 16:20
60 kB
Axel Schwenke
MDEV-21452-nbl.ods
2020-03-24 10:22
66 kB
Axel Schwenke
my.cnf
2020-01-10 05:01
2 kB
Daniel Black
Screenshot from 2020-03-26 12-08-54.png
2020-03-26 06:51
88 kB
Krunal Bauskar

Issue Links

blocks

MDEV-23169 Optimize InnoDB code around mutexes assuming InnoDB locks are uncontended

Closed

causes

MDEV-24637 fts_slots is being accessed after it gets freed.

Closed

MDEV-26779 reduce lock_sys.wait_mutex contention by using spinloop construct

Closed

MDEV-29896 mariadb-backup got stuck with --throttle option doing incremental backup

Closed

MDEV-35424 False alarm/crash: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch

Open

is blocked by

MDEV-16232 Use fewer mini-transactions

Stalled

MDEV-16406 Refactor the InnoDB record locks

Open

MDEV-20612 Improve InnoDB lock_sys scalability

Closed

MDEV-22871 Contention on the buf_pool.page_hash

Closed

MDEV-23888 Potential server hang on replication with InnoDB

Closed

MDEV-24142 rw_lock_t has unnecessarily complex wait logic

Closed

relates to

MDEV-7109 Add support for INFORMATION_SCHEMA.INNODB_SEMAPHORE_WAITS

Closed

MDEV-15653 [Draft] Assertion `lock_word <= 0x20000000' failed in rw_lock_get_writer

Closed

MDEV-18250 [Draft] Server crashed in dirname_length / innobase_basename / sync_arr_fill_sys_semphore_waits_table

Closed

MDEV-24426 fil_crypt_thread keep spinning even if innodb_encryption_rotate_key_age=0

Closed

MDEV-24449 Corruption of system tablespace or last recovered page

Closed

MDEV-24845 Oddities around innodb_fatal_semaphore_wait_threshold and global.innodb_disallow_writes

Closed

MDEV-24973 Performance schema duplicates rarely executed code for mutex operations

Closed

MDEV-25267 Reported latching order violation in ibuf_remove_free_page()

Closed

MDEV-25890 Trying to lock mutex ... when the mutex was already locked | SIGABRT in safe_mutex_lock | Hangs and cross-mysqld interaction

Closed

MDEV-27985 buf_flush_freed_pages() causes InnoDB to hang

Closed

MDEV-28157 SAFE_MUTEX and DBUG corrupt memory | SIGABRT in __libc_message

Confirmed

MDEV-15706 Remove information_schema.innodb_metrics, innodb_monitor_enable, innodb_monitor_disable

Open

MDEV-15752 Possible race between DDL and accessing I_S.INNODB_TABLESPACES_ENCRYPTION

Confirmed

MDEV-21330 Lock monitor doesn't print a semaphore's last reserved thread in non-debug builds and INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is totally broken

Closed

MDEV-22782 SUMMARY: AddressSanitizer: unknown-crash storage/innobase/trx/trx0trx.cc:566 in trx_t::commit_state()

Closed

MDEV-23399 10.5 performance regression with IO-bound tpcc

Closed

MDEV-23472 ASAN use-after-poison in LatchCounter::Count::reset

Closed

MDEV-24630 MY_RELAX_CPU assembly instruction upgrade/research for memory barrier on ARM

Closed

MDEV-30951 Make small facelift to Innotop perl script

Closed

MDEV-32065 Always check whether lock is free at first to optimize InnoDB mutexes

Open

(6 is blocked by, 20 relates to)

Activity

Ascending order - Click to sort in descending order

View 17 older comments

Matthias Leich added a comment - 2020-12-09 12:18

commit 9159383f32d8350dfa91bb62c825c64b1dc091d1 (HEAD, origin/bb-10.6-~~MDEV-21452~~)
behaved well during RQG testing.
Bad effects observed are in the in MariaDB versions without ~~MDEV-21452~~ too.

Matthias Leich added a comment - 2020-12-09 12:18 commit 9159383f32d8350dfa91bb62c825c64b1dc091d1 (HEAD, origin/bb-10.6- MDEV-21452 ) behaved well during RQG testing. Bad effects observed are in the in MariaDB versions without MDEV-21452 too.

Marko Mäkelä added a comment - 2020-12-11 15:25

I implemented special enforcement of innodb_fatal_semaphore_wait_threshold for dict_sys.mutex and lock_sys.mutex. Due to an observed performance regression at high concurrency, I removed the lock_sys.mutex instrumentation and retained only the one on dict_sys.mutex. If pthread_mutex_trylock() fails, then the current thread would compare-and-swap 0 with its current time before waiting in pthread_mutex_lock(). Either the srv_monitor_task() or a subsequent thread that attempts to acquire dict_sys.mutex would then enforce the innodb_fatal_semaphore_wait_threshold and kill the process if necessary.

While rewriting the test sys_vars.innodb_fatal_semaphore_wait_threshold accordingly, I noticed that not all hangs would be caught even in the data dictionary cache. For example, if a DDL operation hung while holding both dict_sys.latch and dict_sys.mutex, a subsequent DDL operation would hang while waiting for dict_sys.latch, before even starting the wait for dict_sys.mutex. But, DML threads that are trying to open a table would acquire dict_sys.mutex and be subject to the watchdog. Hopefully this type of watchdog testing will be adequate. We could of course add more instrumentation to debug builds.

Marko Mäkelä added a comment - 2020-12-11 15:25 I implemented special enforcement of innodb_fatal_semaphore_wait_threshold for dict_sys.mutex and lock_sys.mutex . Due to an observed performance regression at high concurrency, I removed the lock_sys.mutex instrumentation and retained only the one on dict_sys.mutex . If pthread_mutex_trylock() fails, then the current thread would compare-and-swap 0 with its current time before waiting in pthread_mutex_lock() . Either the srv_monitor_task() or a subsequent thread that attempts to acquire dict_sys.mutex would then enforce the innodb_fatal_semaphore_wait_threshold and kill the process if necessary. While rewriting the test sys_vars.innodb_fatal_semaphore_wait_threshold accordingly, I noticed that not all hangs would be caught even in the data dictionary cache. For example, if a DDL operation hung while holding both dict_sys.latch and dict_sys.mutex , a subsequent DDL operation would hang while waiting for dict_sys.latch , before even starting the wait for dict_sys.mutex . But, DML threads that are trying to open a table would acquire dict_sys.mutex and be subject to the watchdog. Hopefully this type of watchdog testing will be adequate. We could of course add more instrumentation to debug builds.

Marko Mäkelä added a comment - 2020-12-16 08:08

The main reason for having the homebrew mutexes was that their built-in spin loops could lead to better performance than the native implementation on contended mutexes.

Some performance regression was observed for larger thread counts (exceeding the CPU core count) when updating non-indexed columns. I suspect that the culprit is contention on lock_sys.mutex, and I believe that implementing ~~MDEV-20612~~ will address that.

Also log_sys.mutex is known to be a source of contention, but it was changed to a native mutex already in ~~MDEV-23855~~. ~~MDEV-23855~~ also removed some contention on fil_system.mutex, but kept it as a homebrew mutex. Contention on these mutexes should be reduced further in ~~MDEV-14425~~.

Marko Mäkelä added a comment - 2020-12-16 08:08 The main reason for having the homebrew mutexes was that their built-in spin loops could lead to better performance than the native implementation on contended mutexes. Some performance regression was observed for larger thread counts (exceeding the CPU core count) when updating non-indexed columns. I suspect that the culprit is contention on lock_sys.mutex , and I believe that implementing MDEV-20612 will address that. Also log_sys.mutex is known to be a source of contention, but it was changed to a native mutex already in MDEV-23855 . MDEV-23855 also removed some contention on fil_system.mutex , but kept it as a homebrew mutex. Contention on these mutexes should be reduced further in MDEV-14425 .

Marko Mäkelä added a comment - 2020-12-16 20:14

We observed frequent timeouts and extremely slow execution time of the test mariabackup.xb_compressed_encrypted especially on Microsoft Windows builders. Those machines have 4 processor cores, and they run 4 client/server process pairs in parallel. (Our Linux builders have a lot more processor cores.) The used to specify innodb_encryption_threads=4. That is, there was one page cleaner thread doing the actual work of writing data pages, and 4 ‘manager’ threads that fight each other to see who gets to wield the shovel and add more dirt to the pile that the page cleaner is trying to transport away. Changing the test to use innodb_encryption_threads=1 seems to have fixed the problem.

With the previous setting, the test timed out on win32-debug on two successive runs; with the lower setting innodb_encryption_threads=1 it passed (at least once), consuming 13, 14, and 41 seconds on win32-debug and 14, 22, 27 seconds on win64-debug. On a previous run with innodb_encryption_threads=4 , the execution time was more than 500 seconds on win64-debug, and for 2 of the 3 innodb_page_size values, the execution time exceeded 900 seconds on win32-debug.

Thanks to wlad for making the observation that the encryption threads were conflicting with each other. In ~~MDEV-22258~~ we did experiment with different settings, and back then (still with the homebrew mutexes) there seemed to be some benefit of having multiple encryption (page-dirtying) threads.

This highlights a benefit of the homebrew mutexes that we removed: Spinning may yield a little better throughput when there is a lot of contention. I agree with the opnion that svoj has stated earlier: it is better to fix the underlying contention than to implement workarounds. I am confident that with ~~MDEV-14425~~ and ~~MDEV-20612~~ we will regain some scalability when the number of concurrent connections exceeds the number of processor cores. We already reduced buf_pool.mutex contention in ~~MDEV-15053~~ and ~~MDEV-23399~~ et al, and fil_system.mutex contention in ~~MDEV-23855~~.

Marko Mäkelä added a comment - 2020-12-16 20:14 We observed frequent timeouts and extremely slow execution time of the test mariabackup.xb_compressed_encrypted especially on Microsoft Windows builders. Those machines have 4 processor cores, and they run 4 client/server process pairs in parallel. (Our Linux builders have a lot more processor cores.) The used to specify innodb_encryption_threads=4 . That is, there was one page cleaner thread doing the actual work of writing data pages, and 4 ‘manager’ threads that fight each other to see who gets to wield the shovel and add more dirt to the pile that the page cleaner is trying to transport away. Changing the test to use innodb_encryption_threads=1 seems to have fixed the problem. With the previous setting, the test timed out on win32-debug on two successive runs; with the lower setting innodb_encryption_threads=1 it passed (at least once), consuming 13, 14, and 41 seconds on win32-debug and 14, 22, 27 seconds on win64-debug. On a previous run with innodb_encryption_threads=4 , the execution time was more than 500 seconds on win64-debug, and for 2 of the 3 innodb_page_size values, the execution time exceeded 900 seconds on win32-debug. Thanks to wlad for making the observation that the encryption threads were conflicting with each other. In MDEV-22258 we did experiment with different settings, and back then (still with the homebrew mutexes) there seemed to be some benefit of having multiple encryption (page-dirtying) threads. This highlights a benefit of the homebrew mutexes that we removed: Spinning may yield a little better throughput when there is a lot of contention. I agree with the opnion that svoj has stated earlier: it is better to fix the underlying contention than to implement workarounds. I am confident that with MDEV-14425 and MDEV-20612 we will regain some scalability when the number of concurrent connections exceeds the number of processor cores. We already reduced buf_pool.mutex contention in MDEV-15053 and MDEV-23399 et al, and fil_system.mutex contention in MDEV-23855 .

Marko Mäkelä added a comment - 2020-12-17 15:22

The problem with the constantly sleeping and waking encryption threads was partially addressed in ~~MDEV-24426~~. On GNU/Linux, with the native mutexes and condition variables, the CPU usage was low, but with the homebrew mutexes and events all threads seemed to spinning constantly. Maybe on Microsoft Windows the flood of sleeps and wakeups performs worse?

Marko Mäkelä added a comment - 2020-12-17 15:22 The problem with the constantly sleeping and waking encryption threads was partially addressed in MDEV-24426 . On GNU/Linux, with the native mutexes and condition variables, the CPU usage was low, but with the homebrew mutexes and events all threads seemed to spinning constantly. Maybe on Microsoft Windows the flood of sleeps and wakeups performs worse?

People

Assignee:: Marko Mäkelä

Reporter:: Daniel Black

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 2020-01-10 04:59

Updated:: 2025-03-13 05:44

Resolved:: 2020-12-16 08:08

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration