Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10255

Test timeouts on p8-trusty-bintar and p8-trusty-bintar-debug

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5(EOL), 10.0(EOL), 10.1(EOL)
    • N/A
    • Tests
    • None

    Description

      Various tests fail with timeouts on p8 buliders.
      http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar/builds/1071

      rpl.rpl_mixed_drop_create_temp_table 'innodb_plugin,mix' w1 [ fail ]  timeout after 900 seconds
              Test ended at 2016-06-17 17:14:16
       
      Test case timeout after 900 seconds
       
      == /var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/1/log/rpl_mixed_drop_create_temp_table.log == 
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
      master-bin.000001	#	Query	#	#	BEGIN
      master-bin.000001	#	Query	#	#	use `test`; INSERT INTO tt_xx_1() VALUES (1)
      master-bin.000001	#	Xid	#	#	COMMIT /* XID */
      master-bin.000001	#	Query	#	#	use `test`; DROP TABLE IF EXISTS `xx_1` /* generated by server */
      -e-e-e-e-e-e-e-e-e-e-e- >> B T Drop-If-Xe << -e-e-e-e-e-e-e-e-e-e-e-
       
      SET @commands= 'B T Drop-TXe';
      BEGIN;
      INSERT INTO tt_xx_1() VALUES (1);
      DROP TABLE tt_2, xx_1;
      ERROR 42S02: Unknown table 'xx_1'
      -b-b-b-b-b-b-b-b-b-b-b- >> B T Drop-TXe << -b-b-b-b-b-b-b-b-b-b-b-
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
      master-bin.000001	#	Query	#	#	BEGIN
      master-bin.000001	#	Query	#	#	use `test`; INSERT INTO tt_xx_1() VALUES (1)
      master-bin.000001	#	Xid	#	#	COMMIT /* XID */
      master-bin.000001	#	Query	#	#	use `test`; DROP TABLE `tt_2`,`xx_1` /* generated by server */
      -e-e-e-e-e-e-e-e-e-e-e- >> B T Drop-TXe << -e-e-e-e-e-e-e-e-e-e-e-
       
       
       == /var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/1/tmp/analyze-timeout-mysqld.1.err ==
      mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/tmp/1/mysqld.1.sock' (111)
       
       == /var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/1/tmp/analyze-timeout-mysqld.2.err ==
      mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/tmp/1/mysqld.2.sock' (111)
      

      parts.partition_alter1_1_innodb 'innodb_plugin' w1 [ fail ]  timeout after 900 seconds
              Test ended at 2016-06-17 20:12:50
       
      Test case timeout after 900 seconds
       
      == /var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/1/log/partition_alter1_1_innodb.log == 
      CAST(f_int1 AS CHAR), 'just inserted' FROM t0_template
      WHERE f_int1 BETWEEN @max_row_div2 - 1 AND @max_row_div2 + 1
      ORDER BY f_int1;
      DROP TRIGGER trg_3;
      	
      # check trigger-12 success: 	1
      DELETE FROM t1
      WHERE f_int1 <> CAST(f_char1 AS SIGNED INT)
      AND f_int2 <> CAST(f_char1 AS SIGNED INT)
      AND f_charbig = '####updated per insert trigger####';
      ANALYZE  TABLE t1;
      Table	Op	Msg_type	Msg_text
      test.t1	analyze	status	OK
      CHECK    TABLE t1 EXTENDED;
      Table	Op	Msg_type	Msg_text
      test.t1	check	status	OK
      CHECKSUM TABLE t1 EXTENDED;
      Table	Checksum
      test.t1	<some_value>
      OPTIMIZE TABLE t1;
       
       == /var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/1/tmp/analyze-timeout-mysqld.1.err ==
      mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/var/lib/buildbot/maria-slave/p8-trusty-bintar/build/mysql-test/var/tmp/1/mysqld.1.sock' (111)
      

      etc.

      Another flavor:
      http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar/builds/1069/steps/test/logs/stdio

      innodb.innodb_autoinc_lock_mode_zero 'innodb_plugin' w3 [ fail ]
              Test ended at 2016-06-17 04:28:09
       
      CURRENT_TEST: innodb.innodb_autoinc_lock_mode_zero
       
       
      Failed to start mysqld.1
      mysqltest failed but provided no output
      

      Attachments

        Issue Links

          Activity

            Many varieties, e.g. http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar/builds/9/steps/test/logs/stdio

            Failing test(s): multi_source.gtid rpl.rpl_mdev6020 rpl.rpl_row_drop_create_temp_table rpl.rpl_row_img_blobs rpl.rpl_row_img_eng_min
            

            They can fail directly with "test timeout" or indirectly because some part of the test failed due to a timeout.

            elenst Elena Stepanova added a comment - Many varieties, e.g. http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar/builds/9/steps/test/logs/stdio Failing test(s): multi_source.gtid rpl.rpl_mdev6020 rpl.rpl_row_drop_create_temp_table rpl.rpl_row_img_blobs rpl.rpl_row_img_eng_min They can fail directly with "test timeout" or indirectly because some part of the test failed due to a timeout.
            svoj Sergey Vojtovich added a comment - - edited

            P8 builders (not only trusty) seem to have rather slow disks, according to hdparm: Timing buffered disk reads: 44 MB in 3.34 seconds = 13.17 MB/sec

            Since we run tests with --parallel=4 it may happen so that 2 (or more) IO hungry tests may load disks heavily.

            Some options:

            • use --mem, but it may not be suitable for big tests and we'll have to make sure /dev/shm is clean after every test run
            • reduce parallel, but it may not give desired effect (unless reduced down to 1) because 2 heavy ddl_innodb tests may run concurrently
            • remove xtra-big tests, but I don't completely like reducing coverage this way
            • increase timeouts
            svoj Sergey Vojtovich added a comment - - edited P8 builders (not only trusty) seem to have rather slow disks, according to hdparm: Timing buffered disk reads: 44 MB in 3.34 seconds = 13.17 MB/sec Since we run tests with --parallel=4 it may happen so that 2 (or more) IO hungry tests may load disks heavily. Some options: use --mem, but it may not be suitable for big tests and we'll have to make sure /dev/shm is clean after every test run reduce parallel, but it may not give desired effect (unless reduced down to 1) because 2 heavy ddl_innodb tests may run concurrently remove xtra-big tests, but I don't completely like reducing coverage this way increase timeouts

            It seems that the situation might have been resolved already. Around July 3-4 both builders were switched from p8-trusty-bb slave to power8-vlp04. Since then there have been 11 builds on both builders together, and not a single timeout. Before that, p8-trusty-bintar-debug was failing pretty much all the time, and p8-trusty-bintar all the time, so 11 successful builds in a row is a statistically significant number.

            So, I won't change the buildbot config just yet, lets observe it for some more time and see how it goes.

            p8-trusty-bb is still on the list, but as of the time of this comment, it's offline, hopefully it stays this way.

            elenst Elena Stepanova added a comment - It seems that the situation might have been resolved already. Around July 3-4 both builders were switched from p8-trusty-bb slave to power8-vlp04. Since then there have been 11 builds on both builders together, and not a single timeout. Before that, p8-trusty-bintar-debug was failing pretty much all the time, and p8-trusty-bintar all the time, so 11 successful builds in a row is a statistically significant number. So, I won't change the buildbot config just yet, lets observe it for some more time and see how it goes. p8-trusty-bb is still on the list, but as of the time of this comment, it's offline, hopefully it stays this way.

            There have been no timeouts for 2 weeks, since the tests were switched to the new slave power8-vlp04, so I think it's safe to assume the problem was fixed by the upgrade.

            elenst Elena Stepanova added a comment - There have been no timeouts for 2 weeks, since the tests were switched to the new slave power8-vlp04, so I think it's safe to assume the problem was fixed by the upgrade.

            People

              dbart Daniel Bartholomew
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.