[MDEV-23635] Add notional delay (using existing ut_delay) while spinning for registering reader in rw-locks Created: 2020-09-01 Updated: 2021-08-02 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Krunal Bauskar | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | ARM, ARMv8, performance | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
| Comments |
| Comment by Krunal Bauskar [ 2020-09-01 ] | |||||||||||||||||||||||
|
Said task is blocked by | |||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2020-09-01 ] | |||||||||||||||||||||||
|
Attached performance graphs are with ut_delay (ARM optimization) to use compiler barrier (vs CAS) + delay in acquiring latches. | |||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2020-09-08 ] | |||||||||||||||||||||||
| |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-09-09 ] | |||||||||||||||||||||||
|
Back in May, I tried a similar idea to reduce contention on the latches that protect buf_pool.page_hash. In the end, in
Before implementing Side note: in rw_lock_lock_word_decr() we could avoid the initial read of lock->lock_word by starting with the constant value
and let the while loop take care of correcting our ‘guess’. I would rather fix the S-latch acquisition to be a simple std::atomic::fetch_add(), with no possibility of ‘soft’ conflicts at all. This would require adjusting all of the rw_lock_t code so that the X-latch and SX-latch holders would tolerate concurrent ‘probes’ made by readers, similar to how the | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-09-29 ] | |||||||||||||||||||||||
|
I think that we should first try replacing the two os_event_t in rw_lock_t with a mutex and two condition variables, and see if that would help. Yes, that code is outside the spinloop, but it could affect some memory bus traffic. Also, perhaps we should try to ensure that some of the fields are either in the same or in a different cache line. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-20 ] | |||||||||||||||||||||||
|
krunalbauskar, in My idea to make S-latch acquisition a simple fetch_add() had to be abandoned, because it caused writer starvation ( | |||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-01-20 ] | |||||||||||||||||||||||
|
@marko Not yet. First thing to try is plain 10.6 benchmark and then explore based on the finding. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-20 ] | |||||||||||||||||||||||
|
I think that it should be fine to add an ARM-specific optimization to older releases with an ARM-specific #ifdef. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-30 ] | |||||||||||||||||||||||
|
Somewhat related to this (and possibly improving the 10.6 implementation), https://rigtorp.se/spinlock/ discusses spinlocks (mutexes) claims that spinning on a read-modify-write instruction is less efficient than spinning on a read instruction. It is not immediately obvious to me how that could be applied to rw-locks, but maybe you could experiment with that, krunalbauskar? | |||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-08-02 ] | |||||||||||||||||||||||
|
@Marko, I went over the article. It suggests checking for the variable before attempting to write it. Fortunately, our server and most new generation system uses compare_exchange_strong there-by stimulating the same behavior but more efficiently at the processor level. just to add a note for a wider audience ... compare_exchange can be looked upon as 2 ops: |