Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10100

main.pool_of_threads fails sporadically in buildbot

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar/builds/1030/steps/test/logs/stdio

      main.pool_of_threads                     w2 [ fail ]
              Test ended at 2016-05-21 16:38:02
       
      CURRENT_TEST: main.pool_of_threads
      --- /var/lib/buildbot/maria-slave/power8-vlp04-bintar/build/mysql-test/r/pool_of_threads.result	2016-05-18 09:35:44.821968095 -0400
      +++ /var/lib/buildbot/maria-slave/power8-vlp04-bintar/build/mysql-test/r/pool_of_threads.reject	2016-05-21 16:38:01.580946588 -0400
      @@ -2162,7 +2162,7 @@
       connect con2,localhost,root,,;
       connection con2;
       SELECT sleep(5);
      -# -- Success: more than --thread_pool_max_threads normal connections not possible
      +# -- Error: managed to establish more than --thread_pool_max_threads connections
       connection default;
       sleep(5.5)
       0
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Copied from MDEV-11669.
            Another failure, happens on the valgrind builder and can be reproduced locally with high enough --parallel value:

            main.pool_of_threads                     w22 [ fail ]
                    Test ended at 2016-12-26 15:28:25
             
            CURRENT_TEST: main.pool_of_threads
             
             
            Could not execute 'check-testcase' before testcase 'main.pool_of_threads' (res: 1):
            mysqltest: Logging to '/dev/shm/var/22/tmp/check-mysqld_1.log'.
            mysqltest: Results saved in '/dev/shm/var/22/tmp/check-mysqld_1.result'.
            mysqltest: Connecting to server localhost:16460 (socket /dev/shm/var/tmp/22/mysqld.1.sock) as 'root', connection 'default', attempt 0 ...
            mysqltest: Could not open connection 'default': 2013 Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 110
            not ok
            mysqltest failed but provided no output
            

            It is a different problem from the one in the description, but they have to be solved together.

            elenst Elena Stepanova added a comment - - edited Copied from MDEV-11669 . Another failure, happens on the valgrind builder and can be reproduced locally with high enough --parallel value: main.pool_of_threads w22 [ fail ] Test ended at 2016-12-26 15:28:25 CURRENT_TEST: main.pool_of_threads Could not execute 'check-testcase' before testcase 'main.pool_of_threads' (res: 1): mysqltest: Logging to '/dev/shm/var/22/tmp/check-mysqld_1.log'. mysqltest: Results saved in '/dev/shm/var/22/tmp/check-mysqld_1.result'. mysqltest: Connecting to server localhost:16460 (socket /dev/shm/var/tmp/22/mysqld.1.sock) as 'root', connection 'default', attempt 0 ... mysqltest: Could not open connection 'default': 2013 Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 110 not ok mysqltest failed but provided no output It is a different problem from the one in the description, but they have to be solved together.

            The failure in the comment above happens on very slow builders and/or in the conditions which make them even slower (such as parallel runs on a valgrind builder). The reason is that the test config has this:

            [client]
            connect-timeout=  2 
            

            If everything is really slow, client connections are too, and MTR simply cannot establish the initial connection.

            The failure in the description also happens on slow builders, but the slowness here plays an opposite role.

            This part of the test does the following:
            a) sends select sleep(5.5) in one connection;
            b) waits a little bit;
            c) establishes another connection;
            d) sends select sleep(5) in second connection;
            e) waits a little bit;
            f) attempts to establish a third connection.
            Since the test is configured with thread_pool_max_threads=2, the third connection is supposed to fail. However, on a slow builder it might happen that all this connecting and waiting on steps (a)-(e) takes too long (more than 3 seconds), and thus when the third connection attempts to connect, the 2-second connect_timeout turns out to be sufficient for one of sleeps to finish, a pool thread to be freed, and thus the connection succeeds instead of failing.

            So, in one failure the immediate problem is too short connect_timeout, while in another problem it's too long connect_timeout. That's why they need to be solved together.

            elenst Elena Stepanova added a comment - The failure in the comment above happens on very slow builders and/or in the conditions which make them even slower (such as parallel runs on a valgrind builder). The reason is that the test config has this: [client] connect-timeout= 2 If everything is really slow, client connections are too, and MTR simply cannot establish the initial connection. The failure in the description also happens on slow builders, but the slowness here plays an opposite role. This part of the test does the following: a) sends select sleep(5.5) in one connection; b) waits a little bit; c) establishes another connection; d) sends select sleep(5) in second connection; e) waits a little bit; f) attempts to establish a third connection. Since the test is configured with thread_pool_max_threads=2 , the third connection is supposed to fail. However, on a slow builder it might happen that all this connecting and waiting on steps (a)-(e) takes too long (more than 3 seconds), and thus when the third connection attempts to connect, the 2-second connect_timeout turns out to be sufficient for one of sleeps to finish, a pool thread to be freed, and thus the connection succeeds instead of failing. So, in one failure the immediate problem is too short connect_timeout, while in another problem it's too long connect_timeout. That's why they need to be solved together.

            wlad, please review the patch.

            https://github.com/MariaDB/server/commit/b1165b0dd35a1dbd24337056b6b074448fa4e046

            The commit comment is bigger than the patch itself, but I thought it needed to be explained. The change passed a buildbot round on my tree, but I'd like you to check that I haven't lost any of the initially designed test logic by re-arranging the fragments.

            elenst Elena Stepanova added a comment - wlad , please review the patch. https://github.com/MariaDB/server/commit/b1165b0dd35a1dbd24337056b6b074448fa4e046 The commit comment is bigger than the patch itself, but I thought it needed to be explained. The change passed a buildbot round on my tree, but I'd like you to check that I haven't lost any of the initially designed test logic by re-arranging the fragments.

            Looks reasonable, ok to push. made a tiny suggestion to fix a comment in the .test file.

            wlad Vladislav Vaintroub added a comment - Looks reasonable, ok to push. made a tiny suggestion to fix a comment in the .test file.
            elenst Elena Stepanova added a comment - Thanks. Pushed into 10.0: https://github.com/MariaDB/server/commit/3871477c40efc826805f4c4e35b006c2c233dd26
            elenst Elena Stepanova added a comment - - edited

            The second problem happens on 5.5 too, need to backport the fix there as well

            https://github.com/MariaDB/server/commit/e5d7fc967ede53407a65bfde3faec3181e35f19f

            elenst Elena Stepanova added a comment - - edited The second problem happens on 5.5 too, need to backport the fix there as well https://github.com/MariaDB/server/commit/e5d7fc967ede53407a65bfde3faec3181e35f19f

            People

              elenst Elena Stepanova
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.