[MDEV-10100] main.pool_of_threads fails sporadically in buildbot Created: 2016-05-22  Updated: 2017-01-04  Resolved: 2017-01-04

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0, 10.1, 10.2
Fix Version/s: 5.5.55, 10.0.29, 10.1.21, 10.2.4

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
Duplicate
duplicates MDEV-11669 main.pool_of_threads fails in buildbo... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar/builds/1030/steps/test/logs/stdio

main.pool_of_threads                     w2 [ fail ]
        Test ended at 2016-05-21 16:38:02
 
CURRENT_TEST: main.pool_of_threads
--- /var/lib/buildbot/maria-slave/power8-vlp04-bintar/build/mysql-test/r/pool_of_threads.result	2016-05-18 09:35:44.821968095 -0400
+++ /var/lib/buildbot/maria-slave/power8-vlp04-bintar/build/mysql-test/r/pool_of_threads.reject	2016-05-21 16:38:01.580946588 -0400
@@ -2162,7 +2162,7 @@
 connect con2,localhost,root,,;
 connection con2;
 SELECT sleep(5);
-# -- Success: more than --thread_pool_max_threads normal connections not possible
+# -- Error: managed to establish more than --thread_pool_max_threads connections
 connection default;
 sleep(5.5)
 0



 Comments   
Comment by Elena Stepanova [ 2016-12-31 ]

Copied from MDEV-11669.
Another failure, happens on the valgrind builder and can be reproduced locally with high enough --parallel value:

main.pool_of_threads                     w22 [ fail ]
        Test ended at 2016-12-26 15:28:25
 
CURRENT_TEST: main.pool_of_threads
 
 
Could not execute 'check-testcase' before testcase 'main.pool_of_threads' (res: 1):
mysqltest: Logging to '/dev/shm/var/22/tmp/check-mysqld_1.log'.
mysqltest: Results saved in '/dev/shm/var/22/tmp/check-mysqld_1.result'.
mysqltest: Connecting to server localhost:16460 (socket /dev/shm/var/tmp/22/mysqld.1.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: Could not open connection 'default': 2013 Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 110
not ok
mysqltest failed but provided no output

It is a different problem from the one in the description, but they have to be solved together.

Comment by Elena Stepanova [ 2016-12-31 ]

The failure in the comment above happens on very slow builders and/or in the conditions which make them even slower (such as parallel runs on a valgrind builder). The reason is that the test config has this:

[client]
connect-timeout=  2 

If everything is really slow, client connections are too, and MTR simply cannot establish the initial connection.

The failure in the description also happens on slow builders, but the slowness here plays an opposite role.

This part of the test does the following:
a) sends select sleep(5.5) in one connection;
b) waits a little bit;
c) establishes another connection;
d) sends select sleep(5) in second connection;
e) waits a little bit;
f) attempts to establish a third connection.
Since the test is configured with thread_pool_max_threads=2, the third connection is supposed to fail. However, on a slow builder it might happen that all this connecting and waiting on steps (a)-(e) takes too long (more than 3 seconds), and thus when the third connection attempts to connect, the 2-second connect_timeout turns out to be sufficient for one of sleeps to finish, a pool thread to be freed, and thus the connection succeeds instead of failing.

So, in one failure the immediate problem is too short connect_timeout, while in another problem it's too long connect_timeout. That's why they need to be solved together.

Comment by Elena Stepanova [ 2017-01-01 ]

wlad, please review the patch.

https://github.com/MariaDB/server/commit/b1165b0dd35a1dbd24337056b6b074448fa4e046

The commit comment is bigger than the patch itself, but I thought it needed to be explained. The change passed a buildbot round on my tree, but I'd like you to check that I haven't lost any of the initially designed test logic by re-arranging the fragments.

Comment by Vladislav Vaintroub [ 2017-01-01 ]

Looks reasonable, ok to push. made a tiny suggestion to fix a comment in the .test file.

Comment by Elena Stepanova [ 2017-01-01 ]

Thanks.
Pushed into 10.0:
https://github.com/MariaDB/server/commit/3871477c40efc826805f4c4e35b006c2c233dd26

Comment by Elena Stepanova [ 2017-01-04 ]

The second problem happens on 5.5 too, need to backport the fix there as well

https://github.com/MariaDB/server/commit/e5d7fc967ede53407a65bfde3faec3181e35f19f

Generated at Thu Feb 08 07:39:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.