[MDEV-7588] [PATCH] Slow connections with thread pool and replication Created: 2015-02-14 Updated: 2015-11-24 Resolved: 2015-11-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER, Replication |
| Affects Version/s: | 5.5, 10.0 |
| Fix Version/s: | 5.5.47, 10.0.23, 10.1.10 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Peter Nørlund | Assignee: | Vladislav Vaintroub |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | verified | ||
| Environment: |
Debian Linux 7, Linux 3.10, 4 cores / 8 threads, official MariaDB packages |
||
| Attachments: |
|
| Sprint: | 10.1.9-3 |
| Description |
|
If MariaDB is set up with pool-of-threads and a replication slave connects and issues the COM_BINLOG_DUMP command, every nth connection (in my setup every 8th) can take as much as 1 second to establish. That is, the TCP connection is completed fast, but it takes a long time (relatively speaking) before the server greeting is sent. The reason is apparently that the binlog dump thread doesn't notify the thread pool that it is waiting, so the active_thread_count remains > 0 and queue_put() will not do anything. Instead, the thread group is activated by the thread timer (thread_pool_stall_limit), which with the default value of 500ms give us a worst case close to 1 sec. I only have a superficial understanding of the MariaDB core, but it seems that mysql_binlog_send() ought to call thd_wait_begin. Either something like that or the active_thread_count check should be removed in queue_put(). |
| Comments |
| Comment by Peter Nørlund [ 2015-02-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've tried adding the_wait_begin/thd_wait_end into MYSQL_BIN_LOG::wait_for_update_bin_log where the thread waits for a conditional variable. This had an immediate effect on the reaction time. (Attached patch yield_wait_for_update_bin_log.diff) I couldn't compile the Debian package without commenting out the tests (and I actually had to backport two patches to even compile in the first place), but all the failed tests seems to be related the SSL, and at least one of the failures was caused by an expired certificate. Anyway, the patch is currently running in the internal databases at my work place, to see if it have any unwanted side-effects. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jean Weisbuch [ 2015-02-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
The SSL certificates issues are due to the fact the the certs used for the test suite have expired, it has been solved since on About your issue, i had issues on a slave running pool-of-threads that could be related : Do you think this bug could produce that kind if issue? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Peter Nørlund [ 2015-02-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I doubt that it is related to | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-02-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the report and the patch. Below is a simple MTR test case to confirm the problem (not to be included to the regression suite). Results with and without the patch on the current 5.5 tree, debug build:
So the patch does help, although I don't know if it's otherwise correct.
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patryk Pomykalski [ 2015-02-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
We have the same problem on servers with thread pool. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2015-11-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fixed, thanks for the patch Peter. That long wait without indication to threadpool was indeed the problem. |