[MDEV-15135] 'show global status' can cause lock contention on LOCK_thread_count Created: 2018-01-30  Updated: 2020-08-25  Resolved: 2019-01-29

Status: Closed
Project: MariaDB Server
Component/s: Server
Fix Version/s: 10.4.2

Type: Task Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 3
Labels: upstream-fixed

Issue Links:
Blocks
blocks MDEV-13983 Mariadb becomes unresponsive Closed
Relates
relates to MDEV-17251 SHOW STATUS unnecessary calls calc_su... Closed
relates to MDEV-13983 Mariadb becomes unresponsive Closed
relates to MDEV-20547 mysqld crashes: Semaphore wait has la... Closed

 Description   

calc_sum_of_all_status() holds the LOCK_thread_count mutex for the whole duration of its calculations, which can take a while when having many active connections, so blocking other threads waiting on the same mutex.

In MySQL 5.7 this is fixed by taking copies of relevant THD object data first (while holding the LOCK_thread_count), and only then doing the actual calculations after releasing the mutex again.

See MySQL bug #42930 and changeset 034c40f4baeff1a1d2dd637f6a1f2c2720a4d26f

https://bugs.mysql.com/bug.php?id=42930

https://github.com/mysql/mysql-server/commit/034c40f4baeff1a1d2dd637f6a1f2c2720a4d26f



 Comments   
Comment by Hartmut Holzgraefe [ 2018-01-30 ]

MDEV-13983 may be related

Comment by Vladislav Vaintroub [ 2018-02-02 ]

Seems like rwlock instead of mutex could improve some things here. cal_sum_of_all_status would only need read lock (needs to prevent modification of threads list).

Comment by Sergey Vojtovich [ 2018-02-07 ]

I agree it is a problem, however a good fix is a lot more complex.

I don't know how MySQL can get any benefit from 034c40f4baeff1a1d2dd637f6a1f2c2720a4d26f. FWICS they made the problem even worse: now they have to do memory allocations under LOCK_thread_count. Which is a lot more expensive compared to rather cheap calculations. I can imagine they didn't even benchmark this.

Comment by Vladislav Vaintroub [ 2018-02-07 ]

it depends. maybe std:set already preallocates some memory it its constructor. I imagine they benchmarked it before pushing. also, a linked list like we have it now, might have a problem of bad memory locality, where every element of the list is far away from other elements, and traversing a list has detrimenal effects on CPU caches.

Comment by Sergey Vojtovich [ 2018-02-07 ]

1. according to http://en.cppreference.com/w/cpp/container/set, std::set doesn't allow pre-allocation
2. std::set is rb-tree, which suffers from data locality as well

Comment by Vladislav Vaintroub [ 2018-02-07 ]

I'm not saying that MySQL does everything right,however
1. STL does allow custom allocators, so it might be well places inside MEM_ROOT memory
2. The data locality problem is not exactly the same as for std::set . std::set stores pointers (8 bytes), and some auxiliary data, maybe 24 or so bytes per node. if the rb-tree nodes are allocated at the same time (which they are for copying), they are close together, in spatial sense, i.e to get next element one maybe needs read current_address +24bytes, which can well be within the same cache line. To get next element in THD list, it is a jump of at least sizeof(THD) == 19576 bytes, and that definitely won't be in the same cache line.

Comment by Sergey Vojtovich [ 2018-02-07 ]

1. I wonder how can we use MEM_ROOT for this purpose, it'll keep growing till OOM?
2. why copying matters here? I don't think we ever copy to global_thread_list, but rather insert/delete elements. Which in turn is malloc+free. Chances for different elements to appear on the same cache line are negligible, right?

Comment by Vladislav Vaintroub [ 2018-02-07 ]

MEM_ROOT for even 10K pointers would still be a small memroot

The actual calculation of sum_of_all is expensive operation, in terms of memory accesses, all THDs session variables need to be loaded into memory, and there is a ~15K between each THD, so the access pattern is not very great if there is a lot of THDs. If you hold lock for entire operation, it is also potentially a long lock. copying small amount of memory is fast, and releasing the lock gives other people to run the same thing in parallel. Moreover, not holding this LOCK_thread_count allows new logins while "show" is ongoing.

Comment by Sergey Vojtovich [ 2018-03-09 ]

According to original stack trace there is non-trivial deadlock between InnoDB, replication and SELECT ... FROM INFORMATION_SCHEMA.PROCESSLIST.
SHOW GLOBAL STATUS is a victim itself: it is stuck waiting for LOCK_thread_count, which is held by I_S.PROCESSLIST query.

That is fixing SHOW GLOBAL STATUS won't affect this deadlock in any way. But it is kind of valid feature request.

hholzgra, should we create new issue for this deadlock and lower priority/retarget current one?

Comment by Sergei Golubchik [ 2018-09-20 ]

There might be a rather simple fix that does not fix the problem from the issue summary, only alleviates it. But it completely solves the customer problem that caused this issue to be created.

There is no need to do calc_sum_of_all_status() when show status would only return variables that aren't global values aggregated from session values. It's a complete waste to call calc_sum_of_all_status() in such a case.

A way to do it might be to postpone calc_sum_of_all_status() until show_status_array() gets to the first variable that needs aggregation.

Comment by Sergei Golubchik [ 2018-09-20 ]

I've created a separate MDEV-17251 for the above optimization

Comment by Vladislav Vaintroub [ 2018-10-30 ]

One way to reduce contention would be to give the aggregated values some "time to live". If aggregation is lock and CPU intensive, maybe we can cache the result for say a millisecond or something like that, and give 2999 threads out of 3000 a slightly dated copy of aggregated values.

Comment by Ralf Gebhardt [ 2018-11-12 ]

serg ratzpo Do we have anybody who can do this while not being blocked by too many other tasks for 10.4 already. It maybe first needs a rough estimate. Otherwise we should remove it from 10.4

Comment by Vladislav Vaintroub [ 2018-11-12 ]

ralf.gebhardt@mariadb.com, you can assign that to me, if you don't mind. To get an estimate, some experimentation (with different ideas listed there) might be necessary.

Comment by Ralf Gebhardt [ 2018-11-13 ]

wlad done, thanks! serg It is not completely clear to me how this task and MDEV-17251 are related. Are both needed?

Comment by Sergei Golubchik [ 2018-11-14 ]

I think yes. MDEV-17251 is about not doing an expensive piece of work, when it's not needed at all. This issue is about reducing mutex contention during this "expensive piece of work" (when it still has to be done).

Comment by Vladislav Vaintroub [ 2018-11-14 ]

It is related, though not completely overlapping. the one is about contention (which can be solved even with making mutex an rwlock), so there is no contention. The MDEV-17251 another one is about making an expensive operation cheaper, in some cases, but not universally applicable. This can reduce contention (in some cases, not universally), if we do not do anything with the mutex

Comment by Vladislav Vaintroub [ 2019-01-10 ]

Benchmark description : mysqlslap -uroot --query="show global status" --create-schema=mysql --concurrency=50 --verbose --iterations=3 --number-of-queries=100000

Generated at Thu Feb 08 08:18:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.