Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
5.5(EOL), 10.0(EOL)
-
Debian Linux 7, Linux 3.10, 4 cores / 8 threads, official MariaDB packages
-
10.1.9-3
Description
If MariaDB is set up with pool-of-threads and a replication slave connects and issues the COM_BINLOG_DUMP command, every nth connection (in my setup every 8th) can take as much as 1 second to establish. That is, the TCP connection is completed fast, but it takes a long time (relatively speaking) before the server greeting is sent.
The reason is apparently that the binlog dump thread doesn't notify the thread pool that it is waiting, so the active_thread_count remains > 0 and queue_put() will not do anything. Instead, the thread group is activated by the thread timer (thread_pool_stall_limit), which with the default value of 500ms give us a worst case close to 1 sec.
I only have a superficial understanding of the MariaDB core, but it seems that mysql_binlog_send() ought to call thd_wait_begin. Either something like that or the active_thread_count check should be removed in queue_put().
I've tried adding the_wait_begin/thd_wait_end into MYSQL_BIN_LOG::wait_for_update_bin_log where the thread waits for a conditional variable. This had an immediate effect on the reaction time. (Attached patch yield_wait_for_update_bin_log.diff)
I couldn't compile the Debian package without commenting out the tests (and I actually had to backport two patches to even compile in the first place), but all the failed tests seems to be related the SSL, and at least one of the failures was caused by an expired certificate.
Anyway, the patch is currently running in the internal databases at my work place, to see if it have any unwanted side-effects.