Details
-
Task
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
MariaDB developers:
Here's a simple performance improvement I found in MariaDB (v5.5.31) while analyzing sysbench on my 4-node system.
It improves the sysbench oltp test by 3% to 17%, depending on the number of threads specified (and I'm sure there's some noise).
The patch is attached to this message. It reduces the memory accesses to the "spins" and "rng_state" fields of the my_pthread_fast_mutex_t struct.
typedef struct st_my_pthread_fastmutex_t
my_pthread_fastmutex_t;
As I'm sure you know, the mutex in that struct is very hot. Since it's accessed by cpus on all nodes, a lot of time is wasted tugging the cacheline back-n-forth between numa nodes.
I noticed the code is repeatedly accessing the "spins" and "rng_state" fields when looping trying to get the mutex. Since those fields reside in the same cacheline as the mutex, and since their accesses come from all cpus on all numa nodes, they were contributing to making the mutex slower (because they increased the cache-to-cache contention between nodes).
My change is simply to keep the values for "spins" and "rng_state" in local variables (a register) as long as possible and only update their values in memory when necessary. I didn't change anything in the algorithm.
The rest of this msg shows the improvement in sysbench transaction values for different thread counts.
Let me know if you have any questions. Since I'm not on the mailing list, please cc me on any reply.
Joe Mario
- sysbench --test=oltp --num-threads=12 --max-requests=1000000 --max-time=100 run
5.5.31-MariaDB 5.5.31-MariaDB-Modified
-------------- -----------------------
Thread cnt:12
transactions: 572694 (5726.83 per sec.) 589543 (5895.34 per sec.) 2.94% speedup.
transactions: 564215 (5642.05 per sec.) 582254 (5822.43 per sec.) 3.20% speedup.
transactions: 565231 (5652.21 per sec.) 583228 (5832.19 per sec.) 3.18% speedup.
Thread cnt:20
transactions: 507300 (5072.82 per sec.) 580229 (5802.09 per sec.) 14.38% speedup.
transactions: 509373 (5093.60 per sec.) 585629 (5856.09 per sec.) 14.97% speedup.
transactions: 497711 (4976.89 per sec.) 583506 (5834.94 per sec.) 17.24% speedup.
Thread cnt:30
transactions: 369979 (3699.66 per sec.) 410698 (4106.74 per sec.) 11.01% speedup.
transactions: 372194 (3721.70 per sec.) 412884 (4128.65 per sec.) 10.93% speedup.
Thread cnt:40
transactions: 366285 (3662.60 per sec.) 401050 (4010.23 per sec.) 9.49% speedup.
transactions: 369626 (3696.02 per sec.) 401913 (4018.88 per sec.) 8.74% speedup.
Thread cnt:50
transactions: 357529 (3574.99 per sec.) 389759 (3897.25 per sec.) 9.01% speedup.
transactions: 357116 (3570.83 per sec.) 387115 (3870.80 per sec.) 8.40% speedup.
Thread cnt:60
transactions: 335427 (3353.88 per sec.) 375134 (3750.91 per sec.) 11.84% speedup.
transactions: 334128 (3340.90 per sec.) 359116 (3590.78 per sec.) 7.48% speedup.
I've attached the patch, since it got mangled when I tried to insert it here.
Joe