[MDEV-26873] Partial server hang when using many threads Created: 2021-10-21  Updated: 2021-10-29  Resolved: 2021-10-29

Status: Closed
Project: MariaDB Server
Component/s: Locking
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Vladislav Vaintroub
Resolution: Won't Do Votes: 0
Labels: hang

Attachments: Text File MDEV-26873.cc     File MDEV-26873_GDB_10.6_OPT.txt.tar.xz     File MDEV-26873_GDB_FULL_10.6_OPT.txt.tar.xz     Text File MDEV-26873_SHOW_FULL_PROCESSLIST_10.6_OPT.txt     File hang.sql     Text File show_full_processlist.txt    
Issue Links:
Issue split
split from MDEV-26381 InnoDB: Failing assertion: fsize != o... Open
Relates
relates to MDEV-26935 Improve MDL to be scalable with many ... Open

 Description   

Split from MDEV-26381. Logging a simplified overview here, with easy reproducibility and keeping things simple, though there are likely more aspects to these hang(s), some described in that ticket.

Execute the attached hang.sql (identical to [MDEV-26381_OTHER_1.sql] from MDEV-26381), using 10k threads, with all threads replaying in random order (against test db).

After a few minutes, even on optimized builds, partial hang issues will start to show. SHOW FULL PROCESSLIST attached as show_full_processlist.txt as a 10.7 example of such an occurrence. Issue is very easy to reproduce.

When logging errors (like ERROR 1146 (42S02) at line 1: Table 'test.t2' doesn't exist) to the screen, it's easy to see when the server starts locking up after 1-5 minutes as the error rate either abruptly stops or slows down clearly/significantly. It then stays in that semi-hang state for 30+ minutes, sometimes unlocking partially with some threads continuing to process transactions whilst others remain in hanged state.

Machine is not OOM, nor OOS, nor busy (nothing else running), not challenged by the 10k threads (low load average in htop). IOW, this is not server hardware/capability related in any way afaict.

Tested version/revision was 10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)



 Comments   
Comment by Roel Van de Paar [ 2021-10-21 ]

MDEV-26873_SHOW_FULL_PROCESSLIST_10.6_OPT.txt shows a full process list in 10.6 optimized showing the same issue (produced in the same way) just before a gdb breakin (gdb --pid=x) obtaining a backtrace of all threads MDEV-26873_GDB_10.6_OPT.txt.tar.xz as well as a full backtrace of all threads with set print frame-arguments all MDEV-26873_GDB_FULL_10.6_OPT.txt.tar.xz. Tested version was 10.6.5 ebd52051206620a0944ffc084582532c0c394523 (Optimized).

Comment by Roel Van de Paar [ 2021-10-21 ]

10.5 Also looks to be affected (queries queuing up), but not sure if it is as consistent in hanging as 10.6-10.7

Comment by Roel Van de Paar [ 2021-10-21 ]

10.2-10.4 Also affected.

When shutting down the sever, we see:

10.4.22 a75813d46739f1e3aea9821b163c0ff1332a017e (Optimized)

2021-10-22  5:34:56 0 [Warning] /test/MD131021-mariadb-10.4.22-linux-x86_64-opt/bin/mysqld: Thread 19 (user : 'root') did not exit
2021-10-22  5:34:56 0 [Warning] /test/MD131021-mariadb-10.4.22-linux-x86_64-opt/bin/mysqld: Thread 17 (user : 'root') did not exit
2021-10-22  5:34:56 0 [Warning] /test/MD131021-mariadb-10.4.22-linux-x86_64-opt/bin/mysqld: Thread 15 (user : 'root') did not exit
2021-10-22  5:34:56 0 [Warning] /test/MD131021-mariadb-10.4.22-linux-x86_64-opt/bin/mysqld: Thread 13 (user : 'root') did not exit

Comment by Marko Mäkelä [ 2021-10-22 ]

In MDEV-26873_GDB_10.6_OPT.txt.tar.xz the only InnoDB thread is the idle page cleaner:

Thread 5 (Thread 0x1511ecd56700 (LWP 47143)):
#0  futex_wait_cancelable (private=..., expected=..., futex_word=...) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=..., clockid=..., mutex=..., cond=...) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=..., mutex=...) at pthread_cond_wait.c:638
#3  0x000055ee6114407b in buf_flush_page_cleaner () at /test/10.6_opt/storage/innobase/buf/buf0flu.cc:2203
#4  0x00001512022a5de4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00001512023c0609 in start_thread (arg=...) at pthread_create.c:477
#6  0x0000151201fae293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

InnoDB cannot possibly be the culprit here. I see no occurrence of "part" in that file either, so the cause should be unlikely to be partitioning either.

Of the 9979 threads, 3767 are executing MDL_context::try_acquire_lock_impl() and 8694 are executing open_table_get_mdl_lock(). That is why I set the component to Locking.

Comment by Vladislav Vaintroub [ 2021-10-22 ]

I removed "Critical". It can't be critical. It is probably also not "Major"

Comment by Vladislav Vaintroub [ 2021-10-22 ]

marko comments, that there is no partitioning. So, could you please verify that, and file a non-partitioning, most simplified example, ideally with 2-3 threads, or with as few threads as possible.

Comment by Vladislav Vaintroub [ 2021-10-22 ]

Roel, also what is "partial hang". It is either a hang, or not a hang. There can't be a partial hang, just like there is no way to be only a little bit pregnant.

Comment by Roel Van de Paar [ 2021-10-22 ]

wlad Please feel free to call or describe it as you please. The best term I could find to describe the behavior I am seeing is a partial server hang; some operations (like CLI client access, executing SHOW PROCESSLIST etc.) are still possible, while at the same time a lot of the threads are in some form of long-duration (30 min+) hanging or locked situation, with at times some threads continuing to process a few transactions only. That, to me, seems to be most accurately described as a partial sever hang, and is offset from a full server hang where for example nothing is possible, like no CLI access and no thread interaction of any kind after the hang was established, or a deadlock situation which times out etc.

Also, IMHO, any issue which has the potential to partially hang the server in release builds (and that with simple SQL) is definitely critical.

Comment by Roel Van de Paar [ 2021-10-22 ]

Confirmed issue is reproducible without partitioning. Updated title, removed tag.

Comment by Roel Van de Paar [ 2021-10-22 ]

The issue is not reproducible with 100, 500 nor 2000 threads only.

Sidenote: A 4 to 5 second pause in output flow (like error messages) is regularly observed. It is possible that some mechanism or background process/thread (of approx this duration) is firing and that the same may be related to the issue.

Comment by Roel Van de Paar [ 2021-10-22 ]

The issue can be produced without even creating a single table. Only DROP TABLE IF EXISTS, INSERT and SELECT are necessary in 10k threads.

At least two table names for those commands are necessary (even though such tables never exist). With a single table name only the issue does not reproduce.

Comment by Roel Van de Paar [ 2021-10-22 ]

The issue does not reproduce with 13 INSERT statements per (nonexisting) table only. 23 Is sufficient. It is likely that more INSERT statements make the issue easier to reproduce.

Whilst the threads are running, there is a build up of the number of active threads, and there seem to be two stages to this; firstly a build up to around 95-120 active threads quite quickly, then some sort of trigger event (5-60 seconds later) with a subsequent quick but gradual buildup to 6-10k active threads in SHOW PROCESSLIST;

Comment by Roel Van de Paar [ 2021-10-22 ]

Improved and final testcase. Run, continually looping, in random order, against 10k threads.

DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
INSERT INTO t1 VALUES (0);
INSERT INTO t1 VALUES (1);
INSERT INTO t1 VALUES (2);
INSERT INTO t1 VALUES (3);
INSERT INTO t1 VALUES (4);
INSERT INTO t1 VALUES (5);
INSERT INTO t1 VALUES (6);
INSERT INTO t1 VALUES (7);
INSERT INTO t1 VALUES (8);
INSERT INTO t1 VALUES (9);
INSERT INTO t1 VALUES (10);
INSERT INTO t1 VALUES (11);
INSERT INTO t1 VALUES (12);
INSERT INTO t1 VALUES (13);
INSERT INTO t1 VALUES (14);
INSERT INTO t1 VALUES (15);
INSERT INTO t1 VALUES (16);
INSERT INTO t1 VALUES (17);
INSERT INTO t1 VALUES (18);
INSERT INTO t1 VALUES (19);
INSERT INTO t1 VALUES (20);
INSERT INTO t1 VALUES (21);
INSERT INTO t1 VALUES (23);
INSERT INTO t2 VALUES (1);
INSERT INTO t2 VALUES (2);
INSERT INTO t2 VALUES (3);
INSERT INTO t2 VALUES (4);
INSERT INTO t2 VALUES (5);
INSERT INTO t2 VALUES (6);
INSERT INTO t2 VALUES (7);
INSERT INTO t2 VALUES (8);
INSERT INTO t2 VALUES (9);
INSERT INTO t2 VALUES (10);
INSERT INTO t2 VALUES (11);
INSERT INTO t2 VALUES (12);
INSERT INTO t2 VALUES (13);
INSERT INTO t2 VALUES (14);
INSERT INTO t2 VALUES (15);
INSERT INTO t2 VALUES (16);
INSERT INTO t2 VALUES (17);
INSERT INTO t2 VALUES (18);
INSERT INTO t2 VALUES (19);
INSERT INTO t2 VALUES (20);
INSERT INTO t2 VALUES (21);
INSERT INTO t2 VALUES (23);
SELECT * FROM t1;
SELECT * FROM t2;

I consider this issue critical.

Comment by Roel Van de Paar [ 2021-10-22 ]

Note: whilst with the testcase in the last comment is able to reproduce the issue, i.e. for example:

10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

+-------+------+-----------+------+---------+------+---------------------------------+---------------------------------------+----------+
| Id    | User | Host      | db   | Command | Time | State                           | Info                                  | Progress |
+-------+------+-----------+------+---------+------+---------------------------------+---------------------------------------+----------+
...
|  9117 | root | localhost | test | Query   |  741 | closing tables                  | SELECT * FROM t1           |    0.000 |

Adding the following to the SQL mix:

CREATE TABLE t1 (a INT) ENGINE=InnoDB;
CREATE TABLE t2 (a INT) ENGINE=InnoDB;

Produces the issue clearer, and faster.

Comment by Vladislav Vaintroub [ 2021-10-25 ]

I set it priority to "minor", I consider it fair.
First, nobody in his right mind will run 10K connections without threadpool, this is our C10K thing. Second, "easy to reproduce" required writing a connector/c program, with 10K threads, which none of the Linuxoid systems I know will allow out-of-the-box, and also server won't allow it out-of-the-box . Third, there is no thing like "partial hang". Fourth, neither SHOW PROCESSLIST, no gdb will take a point-of-time snapshot of threads, they walk list of connections rsp threads one-by-one, without suspending other connections/threads, and this makes gdb useless to use in this form. There is some setting in gdb, which will allow you to take snapshot of threads though (I forgot the name of it by now). Fifth, the test case is too much artificial, a lot of times it issues DML statements against non-existing tables.

There are some problems with this bug report, especialy "how to reproduce". There is no user bug associated with it. Not everything can be Critical

Comment by Vladislav Vaintroub [ 2021-10-25 ]

the gdb thing that allows to suspend threads, to take point-in-time "thread apply all bt" is

set scheduler-locking on

according to https://stackoverflow.com/a/5697181/547065

It seems to be the only viable way to get the accurate picture on what's happening.

Comment by Roel Van de Paar [ 2021-10-26 ]

Responding to each numbered point:
1) Understood. Will test with threadpool also to see if it makes a difference.
2) The issue is easy to reproduce with just CLI based connections in a random sort + loop using a Bash script and setting server to allow 10K connections. If you like I can provide information on how to set this up or provide you with a server where the environment for reproduction is already setup for you to debug.
3) I respectfully disagree, for the reasons already stated.
4) Understood. Will look into the 'set scheduler-locking on' gdb command
5) When the tables are created the issue reproduces more easily. Yes there will still be some DML against non-existing tables.
I remain of the opinion that this is critical, but not going to keep flipping it back and forth. Let's agree in the middle (Major) for the time being. Sooner or later a customer will run into this, but they will likely just restart the server and then it will work again for a while. There is a serious locking issue in the server and the 10k threads is just what makes it easy to see.

Comment by Roel Van de Paar [ 2021-10-26 ]

Re: 4) It seems the 'set scheduler-locking on' gdb command is not directly useful (for QA) without stepping/debugging.
https://www.oreilly.com/library/view/gdb-pocket-reference/9780596100278/re43.html
Are you confident that 'thread apply all bt' does not render an accurate point-in-time snapshot when one breaks in with gdb?

Comment by Vladislav Vaintroub [ 2021-10-26 ]

I'm quite positive about "gdb thread apply all bt" not stopping the threads. And we have seen obviously "impossible stacktraces" in the past, like executing code under mutex in multiple threads seemingly concurrenty, although I already forgot which bug. Maybe marko can remember, as I also recall his proposal to first issue "kill -SIGSTOP" or similar prior to gdb invocation.

The gdb documentation talks that a breakpoint will stop all threads, but otherwise it won't. That the situation we have here, there is no breakpoints. there is some "non-stop mode", and "all-stop mode"(https://sourceware.org/gdb/current/onlinedocs/gdb/All_002dStop-Mode.html) , although I can't extract sense of that documentation, being only occasional user of that debugger.

Comment by Marko Mäkelä [ 2021-10-26 ]

wlad, I think that you are referring to MDEV-13983 or one of the linked tickets. In that case, the root cause was perfectly visible in the thread apply all backtrace output.

I would think that the main purpose of set scheduler-locking is to control commands like step, stepi, next, nexti, continue, finish. I typically do not use the scheduler-locking, so I do not know for sure.

I would expect that when the process is paused in a debugger, all threads will be paused. And I would expect that to be the case when the debugger is ready to execute commands, such as thread apply all backtrace. So, I do not believe that thread apply all backtrace would display an impossible snapshot of thread stacks.

In any case, I think that it would be much better to try to repeat this with fewer threads (say, less than 100 or 50). I have hard time imagining that a deadlock or livelock (partial hang) would involve more than 5 threads. Usually the root cause of any concurrency problem involves 2 or 3 threads. (Some more threads may be necessary to set the right conditions.)

My suggestion to Roel: Study the stack traces and inject some code around the MDL acquisition (sleeps or sched_yield()) to make the hang more likely with fewer threads. Also, try some other tricks, such as compiling a RelWithDebInfo with ASAN and UBSAN to affect the timing.

Comment by Vladislav Vaintroub [ 2021-10-26 ]

There is no hang in there. Whenever I execute the sequence, 1 CPU is always busy. doing MDL deadlock detection, or something in table definition cache. so, a snapshot that would show that everything is locked up, when not everything is locked up, is fake. i trust gdb to produce fake snapshots, that's my experience

Comment by Marko Mäkelä [ 2021-10-26 ]

I did not investigate whether any of the attached stack traces in this ticket expose genuine hangs. I only had the objective to prove that no storage engine is the culprit.

I agree that one snapshot of a running server is not sufficient to prove a hang. It would require multiple samples (say, multiple thread apply all backtrace) and some (preferrably automated) analysis to highlight which threads do not seem to have progressed. (If the backtrace or backtrace full of a thread is identical between samples or differs at most by some timestamp values, then the thread has likely not made any progress. But we could get an identical stack trace if the exact same command is being executed over and over again.)

With a larger number of threads, it is more plausible (although I think improbable) that the operating system scheduler is unfair, and some theoretically runnable thread could have been starved of CPU resources. That we have witnessed many times with rr replay traces that show a ‘fake’ hang.

Comment by Vladislav Vaintroub [ 2021-10-26 ]

So, as perf reports, there is no hang, instead MDL subsystem is busy trying to find a deadlock, in a system with 10K participants.

73.53% mariadbd [.] MDL_lock::visit_subgraph
20.37% mariadbd [.] MDL_ticket::is_incompatible_when_waiting

Comment by Vladislav Vaintroub [ 2021-10-26 ]

I attached a quick and dirty program, which demonstrates allegedly "very easy to reproduce" issue.
I think Roel maybe should write instructions on how to increase both the number of files, and number of threads, so that different Linuxoid systems do not bark, it takes me half an hour almost every time, to find out where those settings are hidden, except /etc/security/limits.conf

Comment by Vladislav Vaintroub [ 2021-10-26 ]

so, the resolution to this problem is currently

add
thread-handling=pool-of-threads
thread-pool-max-threads=1000 (or other large number that makes sense, I have no idea why I changed default from 1000 to 64K in 10.2)

There is a currently a cosmetic problem that threadpool will reach its max (it does so, when contention is detected), and then write the following into error log.

2021-10-26 22:44:59 5172 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_pool_max_threads' parameter can help in this situation.
 If 'extra_port' parameter is set, you can still connect to the database with superuser account (it must be TCP connection using extra_port as TCP port) and troubleshoot the situation. A likely cause of pool blocks are clients that lock resources for long time. 'show processlist' or 'show engine innodb status' can give additional hints.
2021-10-26 22:44:59 5172 [Note] Threadpool has been blocked for 30 seconds

[ERROR] should definitely be warning, I'll change that back in 10.2. Maybe it should be a [Note], I dunno.
This will improve the throughput tremendously, for the given case. Whether or not MDL deadlock detection can be made 10K concurrent threads-safe, I can't tell at the moment, maybe it can't. But there is a threadpool, use it. there is no reason to run 10K OS threads in the server anyway.

Comment by Roel Van de Paar [ 2021-10-29 ]

Thank you wlad for the analysis & input thus far, as well as for writing a test script from your end. Very cool.

Observations from my side:

  • Using --thread_handling=pool-of-threads without limiting the amount of threads produces the issue in much the same way, albeit it comes on slower, and it seems to resolve somewhat faster (350+ to 400+ sec waiting threads rather than 700+ sec waiting threads). I assume you saw this too as you recommend using --thread_pool_max_threads=1000.

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

    Id	User	Host	db	Command	Time	State	Info	Progress
    24839	root	localhost	test	Query	414	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    24993	root	localhost	test	Query	423	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25077	root	localhost	test	Query	423	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25155	root	localhost	test	Query	427	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25249	root	localhost	test	Query	421	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25305	root	localhost	test	Query	429	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25363	root	localhost	test	Query	407	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25493	root	localhost	test	Query	402	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25423	root	localhost	test	Query	414	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    25511	root	localhost	test	Query	425	Waiting for table metadata lock	CREATE TABLE t2 (a INT) ENGINE=InnoDB	0.000
    ...
    

  • Further to the above, using --thread_handling=pool-of-threads with --thread_pool_max_threads=10000 shows the issue in much the same way:

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

    Id	User	Host	db	Command	Time	State	Info	Progress
    4676	root	localhost	test	Query	361	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5026	root	localhost	test	Query	360	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5086	root	localhost	test	Query	354	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5098	root	localhost	test	Query	359	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5154	root	localhost	test	Query	354	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5198	root	localhost	test	Query	362	Commit implicit	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5206	root	localhost	test	Query	359	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5220	root	localhost	test	Query	359	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5212	root	localhost	test	Query	350	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5240	root	localhost	test	Query	357	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5298	root	localhost	test	Query	354	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    5312	root	localhost	test	Query	355	Waiting for table metadata lock	CREATE TABLE t1 (a INT) ENGINE=InnoDB	0.000
    ...
    

  • Using --thread_pool_max_threads=1000 - whilst it does not produce the issue - is just masking the issue as the number of threads is limited to 1000 in that case. 1000 threads would not be sufficient for customers with busy websites etc. In summary, I do not believe that Threadpool is relevant to this issue, nor a solution/resolution for it. At best, it may be an improvement. How much it improves things overall is to be seen, as there is apparently an all together new symptom showing up when using threadpool (with or without the 1000 max threads limitation); there are times (which last at least several seconds) when SHOW PROCESSLIST is completely empty.
  • Using --thread_handling=pool-of-threads with --thread_pool_max_threads=1000 also produces repeated additional "Threadpool blocked" notes:

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

    2021-10-29 16:42:39 49198 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_pool_max_threads' parameter can help in this situation. If 'extra_port' parameter is set, you can still connect to the database with superuser account (it must be TCP connection using extra_port as TCP port) and troubleshoot the situation. A likely cause of pool blocks are clients that lock resources for long time. 'show processlist' or 'show engine innodb status' can give additional hints.
    2021-10-29 16:42:39 49198 [Note] Threadpool has been blocked for 30 seconds
     
    2021-10-29 16:42:39 46404 [Note] Threadpool has been blocked for 30 seconds
    

    This issue is also visible when for example using --thread_pool_max_threads=2000

Question:

  • Why does the server need to create 5 table cache instances when only two tables are being used?

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

    2021-10-29 16:51:22 3588 [Note] Detected table cache mutex contention at instance 1: 24% waits. Additional table cache instance activated. Number of instances after activation: 5.
    

Notes:

  • Re: "I'm quite positive about "gdb thread apply all bt" not stopping the threads": if one breaks in with gdb --pid=x (in contrast to running the sever under gdb), then the full program is halted, in full, at that exact point in time. So the threads are already stopped in full, and they will only continue once gdb detaches (due to the user quitting gdb). IOW, the backtrace of threads should be a 100% accurate state of the server at the exact point in time of the breakin.
  • Additionally, the SHOW FULL PROCESSLIST list clearly shows threads being stuck for 400+ seconds (and 700+ seconds without threadpool). Even with many threads this would seem unlikely to be "normal".
  • On test servers we set all limits to unlimited and/or as high as possible by default. Customers who would tune for many connections would also tune their OS settings higher. See https://github.com/mariadb-corporation/mariadb-qa/blob/master/setup_server.sh
Comment by Vladislav Vaintroub [ 2021-10-29 ]

"Using --thread_pool_max_threads=1000 - whilst it does not produce the issue - is just masking the issue as the number of threads is limited to 1000 in that case. 1000 threads would not be sufficient for customers with busy websites etc". This is one-thread-per-connection thinking. You do not need as many threads as there are connections.

The thread pool's idea is to mask those issues, and fix the OS scheduler problems, which Linux has, and to reduce the contention. If I run my program on a different OS, scheduler there turns out to be smarter, as seen by the output of my program. Run it on Windows 10 , with quadcore, and it will runs much faster than 32core Linux, even with its non-default one-thread-per-connection. So if you like, maybe you can tune your Linux scheduler. I can't do that, I have no clue.

And yes, if one sets thread_pool_max_threads to 10K, the number of threads will grow until it degenerates in basically one-thread-per-connection, because high contention is a bad scenario for threadpool (and for everyone else). Once contention is over, the additional threads disappear.

Now , take a look at many threads would say SQLServer run, https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-worker-threads-server-configuration-option?view=sql-server-ver15. They have at most , 16 per core on machines with less than 64 cores, and 32 on machines with more than 64 cores. To reach 10K OS-threads, as you suggest, one would need 300 or so cores.

To know how much improvement a setting like "thread-handling=pool-of-threads" and "thread-pool-max-threads" gives, measure QPS. With that, it is no more "remains to be seen". My program, which I shared, measures QPS. you can script sysbench otherwise, and if you manage to run it with 10K connections (cause I can't), it gives a better overview now only about QPS, but also about latency.

The table cache automagically opens an instance if it measures contention. There is a contention on table cache (no surprise). Therefore there are many instances.

I only would trust gdb to be point in time, if it works on crash dump. This is my experience analyzing several thousands-of-threads dumps, over the years, I'm not really new to this.

So we assume, gdb "thread apply all" is not really "point-of-time" (why would it else have all these all-stop-mode or non-stop-mode, really?), or some problems with Linux, scheduler, and what not, that make it appear as-if it would not be point-in-time. There is a workaround for this.

Use wonderful PMP tool, take several probes, and it will be at least statistically correct. And the abbreviated stack traces in PMP are much-much better to read, than one dumps with all variables and function parameters, those dumps take longer to read than both volumes of War and Peace.

For SHOW PROCESSLIST, I even happen to know it is not point in time, because I understand its implementation.

Anyway, I suggest to close this one, because there is a workaround with threadpool, "more or less", much better than without. I'm closing it, in favor of newly created task MDEV-26935 . I doubt we can do anything for already released versions.

Comment by Vladislav Vaintroub [ 2021-10-29 ]

Closing, as MDL deadlock detector can not be fixed in already released versions, to scale to 10K OS threads. There is a workaround with threadpool and capping thread_pool_max_threads to a reasonable value . If you find 1K unreasonable ,try 2K, 4K

I created MDEV-26935 as a task to try to make MDL scale to 10K concurrent OS threads. It is a task, not a bug, and it requires enough time even for investigation and experimentation only

Generated at Thu Feb 08 09:48:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.