Starting with the Intel Skylake microarchitecture, the PAUSE instruction latency is about 140 clock cycles instead of earlier 10. Some other reference suggested that on AMD, the latency could be 10 or 50 clock cycles, depending on microarchitecture.
Because of this big range of latency, the assumptions around which spin loops have been written can be invalid, suggesting that some redesign may be needed.
I tried to find out how to detect the microarchitecture by the CPUID instruction, but did not find an up-to-date reference for that. It might require a lookup table that needs to be updated constantly with new processor models.
Intel’s article on this includes some code that claims to implement ‘exponential backoff’, but it does not look like that to me. It looks like a constant number of loops around the PAUSE instruction, similar to LF_BACKOFF() in MariaDB.
Sergey Vojtovich used ut_delay(1) in the lock-free TRX_SYS refactoring in MariaDB Server 10.3. Other invocations of this function seem to be passing the value of innodb_spin_wait_delay. It is only 4 by default, because there is an internal multiplier in ut_delay(). The current range is inadequate for making a 14-fold adjustment. Even if we changed the value from 4 to 1, there would be a 3.5-fold increase (14/4).
In MDEV-16168, Daniel Black pointed to a mutex implementation that appears to be storing a spin count in each mutex, and using that for subsequent iterations. Something like: ‘try 2× as many spins as previous time, but no more than a given maximum’. However, storing the spin loop counts in each mutex would seem to increase the memory usage. If we tried to aggregate them (for the likes of buf_block_t::lock), updating the aggregated values could easily become a contention point.
An easy change for MariaDB would seem to be to replace both ut_delay() and LF_BACKOFF() with something that takes a new parameter for the loop count. The old parameter innodb_spin_wait_delay could be deprecated and have no effect.