[MDEV-21534] improve locking/waiting in log_write_up_to Created: 2020-01-20 Updated: 2022-06-17 Resolved: 2020-03-02 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.5.2 |
| Type: | Task | Priority: | Major |
| Reporter: | Vladislav Vaintroub | Assignee: | Vladislav Vaintroub |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
On multithreaded write-intensive benchmarks, with innodb_flush_log_at_trx_comit (e.g sysbench update_index), log_write_up_to() is one of the hottest functions, and the log_sys.mutex is one of the hottest mutexes. This is partially due to the way how log_sys.flush_event is used. Situation can be improved e.g by using a custom synchronization primitive instead of Innodb event. The synchronization primitive could have 2 operations
Where wait blocks current thread until set is called with set_lsn >= wait_lsn. |
| Comments |
| Comment by Vladislav Vaintroub [ 2020-01-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Quick benchmark Windows: sysbench --test=oltp --oltp-table-size=1000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-01-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Quick benchmark, Linux (Hyper-V) ,same box as above
== my.cnf ==
Sysbench 1.0 script ==update_index.sh == run whole benchmark | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-01-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tested the changes on my 2-socket system (Intel® Xeon® CPU E5-2630 v4 @ 2.20GHz), that is, with 2×10 CPU cores (40 with hyperthreading). I used sysbench version 1.0.18+ds-1 on Debian GNU/Linux unstable and the following update_index.sh script:
I created the database by running ./mtr innodb.101_compatibility,16k and then copying mysql-test/var/install.db to the root file system (ext4fs on an Intel Optane PCI-X NVMe device SSDPED1D960GAY).
Sum of latency:
Note: I used the following kinds of commands to construct the above tables:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tested a build from bb-10.5- And - maybe even more seriously - it was impossible to shutdown the server gracefully. It would just print a lot of warnings to the error log (warning: thread xxx did not stop) but not go down. I had to resort to sigkill to get rid of it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
axel, can you describe "how to reproduce"`? Nobody has seen "odd behaviour", but you. If there is no way to reproduce, please share the all threads callstacks. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The repository contains several commits for the | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There is one easy way to cause a server hang: use parallel_prepare.lua (distributed with sysbench-mariadb) to load multiple tables in parallel. I just tried with 16 tables and it didn't even reach the "loading rows" stage. Files pprepare.stack.dump I remember that when I spotted this, I reconfigured the sysbench script to load tables sequentially, but then got a hang during a benchmark run. A cannot reproduce it right now, but keep trying. As for which commit it was - I cannot say. The build finished on Jan 21st 15:11 UTC and I pulled some minutes before. Wait, looks like I did not pull in this repo after that. It's at commit fcb5d008 now, so probably was at that when I built. And yes, that is before that deadlock fix. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that a merge of (or rebase to) the latest 10.5 is needed for the development branch. Some hangs were introduced in 10.5 by | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2020-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
OK, with commit 09fa2894d9e I did not see any anomalies from branch bb-10.5- |