[MDEV-26823] provide a way to monitor thread_pool_max_threads status Created: 2021-09-28 Updated: 2024-02-05 |
|
| Status: | Needs Feedback |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | N/A |
| Type: | New Feature | Priority: | Major |
| Reporter: | Allen Lee (Inactive) | Assignee: | Ben Stillman |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
A customer is requesting the feature to monitor the usage of thread_pool_max_threads.
|
| Comments |
| Comment by Vladislav Vaintroub [ 2021-10-27 ] | ||||||
|
First, thread_pool_max_threads does not grow. It is a system variable, and it does not change, unless user with enough privileges changes its value with "SET GLOBAL thread_pool_max_threads". I assume in following that the variable in question is "threadpool_threads", and people want to know how many threads there are. The answer is - you monitor is like any other global status variable. How to know that maximum threads were reached - compare the status variable "threadpool_threads" with system variable "thread_pool_max_threads".
The 'statistic' variable is called "threadpool_threads". There is no forecast, because in case of contention number of threads will grow rather rapidly (this was necessary to fix I suggest to describe the actual problem people have, it is hard to make sense of the request otherwise. | ||||||
| Comment by Chris Calender (Inactive) [ 2021-10-28 ] | ||||||
|
Hi wlad! Many thanks for the feedback! Yes, my apologies about thread_pool_max_threads. Yes, that is the system variable they set. So your assumption is correct in that they want to monitor the status variable of the max threads being used at any point in time. While threadpool_threads sounds like it should do what they want, they are not finding that to be the case. Perhaps we simply do not know how the calculation is performed, or maybe it is not tracked 100% correctly. I say that because the customer has the following threadpool-related variables set: thread_handling=pool-of-threads While the server is running, they actively monitor both threadpool_threads and thread_pool_idle_threads. Just before the exhaustion error: 2021-09-01 0:00:23 0 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_pool_max_threads' parameter can help in this situation. They saw reported values of (~1 minute before the crash): threadpool_threads: 1044 So we're unsure how these 2 variables relate to the 4096 they have set for thread_pool_max_threads. I would expect thread_pool_threads to be close to the 4096? Or at least threadpool_threads + threadpool_idle_threads = 4096? Perhaps even slightly higher based on the docs: "Number of threads in the thread pool. In rare cases, this can be slightly higher than thread_pool_max_threads, because each thread group needs at least two threads (i.e. at least one worker thread and at least one listener thread) to prevent deadlocks. " Am I missing something obvious in the calculation? If I take ((threadpool_threads + threadpool_idle_threads) * 2), then I approach 4096, but I think I'm just grasping at straws on that idea... It would be terrific news if they can indeed simply monitor threadpool_threads and threadpool_idle_threads... In the end, they just want to be able to monitor it, so they can try to avoid threadpool exhaustion on their side. Hope this helps explain. | ||||||
| Comment by Vladislav Vaintroub [ 2021-10-28 ] | ||||||
|
I think the calculations are incorrect. Thing is
In addition
The assumption that threadpool can't create 3K threads in a minute is incorrect. In your case it did, and it can create 3K threads in couple of milliseconds, if all workers block on something. I guess all your workers are blocking on something, wild guess is that something like FLUSH TABLES WITH READ LOCK is running. You could try a couple of thousands "SELECT SLEEP(1)" in parallel, to experience a surge in thread counts, too. So, you can monitor both status variables, and in addition you can monitor "Threads_running" status variable, which tells you how many queries are currently executing (they could block though, contributing to "idle" thread count). Just don't assume that throttling in thread creation always applies. These rules usually apply, unless there is a contention on some global resource, or something that makes all threads block (as I explained, also SLEEP(1) will do) BTW, unless their box is like 128 CPU large, or something around it, thread_pool_size=512 is an exaggeration. It should usually work OK with default settings. If it is a workaround for something, I'd like to know for what. If you want to avoid those messages, do not set thread_pool_max_threads to 4096. Set it larger, or much larger. You mentioned some crash. If there was a crash, it is a different thing than this (often harmless) "thread pool blocked" message. I changed the severity of "thread pool blocked" message now, so it spits out a warning, rather than [ERROR]. Nothing is blocked, as long as queries are executed. even if no queries are executed, it could also be just a consequence of global locks. In the end, I think people should start with default settings for the threadpool, and if they monitor surge in number of threads, they could correlate it with contention. They could try to ignore the warning, it does not flood the log, and it is written at most once since the startup. And of course, they could use the admin connection via --extra-port, to modify the thread_pool_max_threads or "recover from this situation". th admin connection is not participating in threadpool, it won't experience queuing or anything like that. | ||||||
| Comment by Vladislav Vaintroub [ 2021-10-28 ] | ||||||
|
BTW, what should probably be done in threadpool to avoid surges in thread counts, is the ability "not" to yield a worker thread if current thread holds some lock(FTWRL, backup lock, table lock, user lock, or even inside transaction, possibly holding row locks), plus minimal throttling interval between thread creations. in many cases, immediate creation of threads is done in a hope that a newly created thread will serve a query that releases some global lock that everyone is waiting for, e.g UNLOCK TABLES. |