[MDEV-27889] Make table cache mutex contention threshold configurable Created: 2022-02-18  Updated: 2023-10-07  Resolved: 2023-10-07

Status: Closed
Project: MariaDB Server
Component/s: Configuration
Fix Version/s: N/A

Type: Task Priority: Minor
Reporter: Joseph Peterson Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None


 Description   

Currently there is a check for lock table cache mutex contention that considers an instance to be contested if more than 20000 mutex acquisitions cannot be immediately serviced before 80000 are immediately serviced. The comments in the code indicate that this is because of the estimated 100K queries per second is the maximum. These values are hard coded based on 2 socket / 20 core / 40 thread Intel Broadwell systems with a comment that these numbers may need to be adjusted for other systems. Broadwell was released in 2015 so these numbers are likely very dated.

In our scalability testing we found the table cache mutex contention at 20% sometimes took a couple of tests to get to a steady state. Before then the performance was lower. This could mean when doing benchmarks that people may see an artificially lower number on initial tests than after waiting for a couple of runs. When we set this hard-coded value to 10K misses before 90k hits, our systems ramp up to the setting chosen for the number of table_open_cache_instances more quickly, removing this bottleneck in the first test.

We suggest making this value configurable so it can be adjusted for the system being used.



 Comments   
Comment by Joseph Peterson [ 2022-02-18 ]

I have a patch that I'm cleaning up that can address this, if this is accepted.

Comment by Sergei Golubchik [ 2022-02-22 ]

How fast does the performance stabilize in your tests with and without the patch? Are we talking hours or milliseconds?

Comment by Joseph Peterson [ 2022-02-23 ]

Somewhere in the middle- we're talking minutes. When Steve Shaw was doing his testing he indicated that the first run of the HammerDB suite was showing significantly worse benchmark results. After digging in we found that it was not noticing the lock contention quickly and opening new instances quickly. It took almost 16 minutes to reach steady state. By setting these values to 10000/90000 the system ramped to steady state in 8 seconds.

For example, it took 16 min to add a third table cache instance. (From a Cascade Lake system)
2022-02-22 11:29:42 39 [Note] Detected table cache mutex contention at instance 1: 21% waits. Additional table cache instance activated. Number of instances after activation: 2.
2022-02-22 11:45:29 119 [Note] Detected table cache mutex contention at instance 2: 21% waits. Additional table cache instance activated. Number of instances after activation: 3.

vs 5 sec

2022-02-04 6:06:53 6 [Note] Detected table cache mutex contention at instance 1: 10% waits. Additional table cache instance activated. Number of instances after activation: 2.
2022-02-04 6:06:58 7 [Note] Detected table cache mutex contention at instance 2: 10% waits. Additional table cache instance activated. Number of instances after activation: 3.

Once all the table cache instances are activated, test results are consistent. You get similar results by changing the estimated queries per second, so my patch allows you to set both. I'll submit it for feedback. I'm not sure the names are ideal.

Comment by Joseph Peterson [ 2022-02-23 ]

Looks like the PR build fails. I will dig into why.

Edit: It was a missing commit. I pushed that commit.

Comment by Daniel Black [ 2022-02-24 ]

table_cache_contention_threshold as a percentage looks like a very good move.

table_open_cache_k_queries_per_sec sounds like a hard variable to tune or maintain a default value over time.

As an alternative, I did this exceptionally crude example that contains flaws that looks at 10 second interval. Too slow? configurable? Better/worse?

https://github.com/MariaDB/server/compare/10.9...grooverdan:MDEV-27889?expand=1

Comment by Joseph Peterson [ 2022-02-24 ]

True, maintaining the kqps is likely tricky, but the 100000 hard coded value existed and was probably 5 years out of date so I extracted it... using a timer is probably better if there is no performance impact.

Comment by Daniel Black [ 2022-02-25 ]

I updated the branch to be cleaner and work. The sysbench that I was using didn't seem to trigger the cache expansion on an 8 cpu i7-10510U (not without removing now > (t_last + 2) (threads=32, tables=90)). jeepeterson if you have time to test on a large system that would be appreciated.

serg, svoj, what are your thoughts on a time based approach? or other strategies to avoiding hardware assumptions.

Comment by Sergey Vojtovich [ 2022-02-25 ]

danblack, technically there're no hardware assumptions in the original approach. I'd say time based solution is going to be even more tied to hardware.

What the comment says about Broadwell is more like "it was tested with" rather than "it was optimised for".

Can it happen so that the algorithm came to be not sensitive enough for the HammerDB load rather than for specific hardware?

To sum up: I have no good answer without deeper analysis.

Comment by Sergei Golubchik [ 2022-02-25 ]

Table cache auto-adjusting to the load was implemented precisely to avoid adding a variable for table cache instances. Which is good for benchmarks, but doesn't adapt to the real world use case and makes MariaDB even more difficult to configure.

Adding a new variable to manually control the automatic behavior goes directly against this ease of use concept, if we'd wanted a new variable, we'd added a variable to specify the number of table cache instances.

16 minutes to adjust could be a lot. What are other warm-up times that you see? How long does it take to load filesystem caches? InnoDB buffer pool? How big is it and what's your dataset size? How many tables?

Anecdotal evidences claim it can take up to few hours to warm up a big innodb buffer tool.

Comment by Joseph Peterson [ 2022-03-01 ]

So what is the guidance here? Sounds like a resounding "no" to this in concept. I can certainly test out the updated patch. I think that allowing this adjustment doesn't remove the auto adjustment, right? Just how quickly it adjusts.

Sergei, I am confused by this statement: "if we'd wanted a new variable, we'd added a variable to specify the number of table cache instances."
There is such a variable: table_open_cache_instances

Comment by Joseph Peterson [ 2022-03-02 ]

Daniel, the patch you provided to use time instead of a counter showed a 2% performance boost when using HammerDB. It appears to ramp up quickly even without changing the threshold to 10%. I suspect that the default 10s timer is allowing it to ramp up quicker than the 20K misses of 100K.

Comment by Sergei Golubchik [ 2022-03-10 ]

table_open_cache_instances controls the max number of instances. I meant that we'd need another variable to specify the current (or initial) value of caches.

I don't think it's a resounding "no". Ideally, I believe, all data structures should warm up in about the same time. That's why I asked about what you are seeing regarding filesystem caches and InnoDB buffer pool.

If table cache is an outlier — if it adjusts too slow, we definitely should fix it.

Generated at Thu Feb 08 09:56:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.