[MDEV-31759] Large grain of dict_sys lock by table creation affects performance Created: 2023-07-21  Updated: 2023-11-14  Resolved: 2023-10-02

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5
Fix Version/s: N/A

Type: Bug Priority: Trivial
Reporter: Fan Lyu Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: performance

Issue Links:
Relates
relates to MDEV-24258 Merge dict_sys.mutex into dict_sys.latch Closed

 Description   

Our test guys tried to create tables with 500 concurrent sessions (each session 200 tables)and cause slow queries.
I used pstack to inspect the stacks and found most threads are waiting for the mutex of global variable dict_sys

I viewed the related codes in method
ha_innobase::create
and think the lock range is too large.

row_mysql_lock_data_dictionary / row_mysql_unlock_data_dictionary between
error = info.create_table(own_trx)

In create_table_info_t::create_table, there are a lot of object allocation or simply setting members in those objects before attaching them to the cache of global variable dict_sys. IMO those steps don't require mutex of global dict.

Would it be better if we make the mutex dict_sys fine-grained instead of locking whole process of create_table_info_t::create_table



 Comments   
Comment by Marko Mäkelä [ 2023-07-22 ]

Which MariaDB Server version is this about? I do not think that dict_sys.latch should be a bottleneck in MariaDB Server 10.6 or later. The dict_sys.mutex was removed in MDEV-24258.

Comment by Fan Lyu [ 2023-07-24 ]

Hello Marko, I am exactly using 10.5.
In fact I had a discussion with Monty last weekend regarding the linux mutex "jump in the queue" behaviour

Comment by Marko Mäkelä [ 2023-07-25 ]

Thank you, lyufangabriel. I have not analyzed such "jumping the queue" behaviour myself, but I believe that it could be dependent on the Linux kernel version, possibly some scheduling parameters, and on the hardware architecture (SMP vs. NUMA). In MariaDB Server 10.6, MDEV-21452, MDEV-27058 and many other changes should improve multi-threaded performance (while also making the code easier to debug).

Can you test in a staging environment if 10.6 would perform better for you?

Comment by Marko Mäkelä [ 2023-11-14 ]

MDEV-31095 should have addressed some "jumping the queue", in MariaDB Server 10.6.16 and later major versions.

Generated at Thu Feb 08 10:26:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.