[MDEV-19845] Adaptive spin loops - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.3.17, 10.4.7, 10.5.0
Component/s: Locking, Server, Storage Engine - Aria, Storage Engine - InnoDB
Labels:
- performance
- threads

Description

Starting with the Intel Skylake microarchitecture, the PAUSE instruction latency is about 140 clock cycles instead of earlier 10. Some other reference suggested that on AMD, the latency could be 10 or 50 clock cycles, depending on microarchitecture.

Because of this big range of latency, the assumptions around which spin loops have been written can be invalid, suggesting that some redesign may be needed.

I tried to find out how to detect the microarchitecture by the CPUID instruction, but did not find an up-to-date reference for that. It might require a lookup table that needs to be updated constantly with new processor models.

Intel’s article on this includes some code that claims to implement ‘exponential backoff’, but it does not look like that to me. It looks like a constant number of loops around the PAUSE instruction, similar to LF_BACKOFF() in MariaDB.

svoj used ut_delay(1) in the lock-free TRX_SYS refactoring in MariaDB Server 10.3. Other invocations of this function seem to be passing the value of innodb_spin_wait_delay. It is only 4 by default, because there is an internal multiplier in ut_delay(). The current range is inadequate for making a 14-fold adjustment. Even if we changed the value from 4 to 1, there would be a 3.5-fold increase (14/4).

In ~~MDEV-16168~~, danblack pointed to a mutex implementation that appears to be storing a spin count in each mutex, and using that for subsequent iterations. Something like: ‘try 2× as many spins as previous time, but no more than a given maximum’. However, storing the spin loop counts in each mutex would seem to increase the memory usage. If we tried to aggregate them (for the likes of buf_block_t::lock), updating the aggregated values could easily become a contention point.

An easy change for MariaDB would seem to be to replace both ut_delay() and LF_BACKOFF() with something that takes a new parameter for the loop count. The old parameter innodb_spin_wait_delay could be deprecated and have no effect.

Attachments

Issue Links

causes

MDEV-20233 Linking issue on Redhat 6

Closed

MDEV-23249 __builtin_readcyclecounter in include/my_rdtsc.h causes SIGILL on ARM

Closed

relates to

MDEV-19929 Add a startup message about PAUSE instruction timing

In Review

MDEV-16168 Performance regression on sysbench write benchmarks from 10.2 to 10.3

Closed

Activity

Transition	Time In Source Status	Execution Times

Marko Mäkelä made transition - 2019-06-26 14:37

Open

In Progress

2d 5h 41m

Marko Mäkelä made transition - 2019-06-26 14:45

In Progress

In Review

8m 1s

Sergey Vojtovich made transition - 2019-06-26 16:34

In Review

Stalled

1h 48m

Marko Mäkelä made transition - 2019-06-27 08:05

Stalled

Closed

15h 31m

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-06-24 08:56

Updated:: 2020-08-08 07:47

Resolved:: 2019-06-27 08:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.