Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7236

rpl.rpl_gtid_basic failed in buildbot with wait_condition timeout

Details

    Description

      Happened once so far (according to the cross-reference report):
      http://buildbot.askmonty.org/buildbot/builders/bld-dan-release/builds/1867/steps/test/logs/stdio

      rpl.rpl_gtid_basic 'innodb_plugin,stmt'  w3 [ fail ]
              Test ended at 2014-11-28 19:50:11
       
      CURRENT_TEST: rpl.rpl_gtid_basic
      mysqltest: In included file "./include/wait_for_slave_param.inc": 
      included from ./include/wait_for_slave_sql_error.inc at line 41:
      included from /opt/buildbot-slave/mariadb/dan_demeter2/build/mysql-test/suite/rpl/t/rpl_gtid_basic.test at line 471:
      At line 115: Timeout in include/wait_for_slave_param.inc
       
      The result from queries just before the failure was:
      < snip >
      master-bin.000001	2984	Query	1	3073	use `test`; INSERT INTO t1 VALUES (17)
      master-bin.000001	3073	Xid	1	3100	COMMIT /* xid=399 */
      master-bin.000001	3100	Gtid	1	3138	BEGIN GTID 2-1-1031
      master-bin.000001	3138	Query	1	3227	use `test`; INSERT INTO t1 VALUES (18)
      master-bin.000001	3227	Xid	1	3254	COMMIT /* xid=400 */
      master-bin.000001	3254	Gtid	1	3292	BEGIN GTID 2-1-1032
      master-bin.000001	3292	Query	1	3381	use `test`; INSERT INTO t1 VALUES (19)
      master-bin.000001	3381	Xid	1	3408	COMMIT /* xid=401 */
      master-bin.000001	3408	Gtid	1	3446	GTID 3-1-100
      master-bin.000001	3446	Query	1	3544	use `test`; CREATE TABLE t2 (a INT PRIMARY KEY)
      master-bin.000001	3544	Gtid	1	3582	GTID 3-1-101
      master-bin.000001	3582	Query	1	3686	use `test`; DROP TABLE `t2` /* generated by server */
      master-bin.000001	3686	Gtid	1	3724	BEGIN GTID 2-1-1040
      master-bin.000001	3724	Query	1	3813	use `test`; INSERT INTO t1 VALUES (20)
      master-bin.000001	3813	Xid	1	3840	COMMIT /* xid=421 */
       
      **** SHOW RELAYLOG EVENTS on server_1 ****
      relaylog_name = 'No such row'
      SHOW RELAYLOG EVENTS IN 'No such row';
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info

      Attachments

        Issue Links

          Activity

            knielsen Kristian Nielsen added a comment - - edited

            So I checked the error log from the failing test.

            What we see is that START SLAVE starts the SQL thread, but the I/O thread is
            not started. This causes the wait to time out, as no events arrive from the
            master.

            I notice that just before this point, the test uses START SLAVE UNTIL, then
            waits for the SQL thread to stop. But there is no wait for the IO thread to
            stop. It seems plausible that the IO thread is still shutting down due to
            UNTIL when the test reaches START SLAVE. Thus, START SLAVE does not start the
            IO thread (it is not running), and a split-second after, the IO thread then
            stops due to the UNTIL condition. This would leave to a failure like what is
            seen.

            EDIT: confirmed, I was able to reproduce with a sleep just before the I/O
            thread marks itself as stopped.

            knielsen Kristian Nielsen added a comment - - edited So I checked the error log from the failing test. What we see is that START SLAVE starts the SQL thread, but the I/O thread is not started. This causes the wait to time out, as no events arrive from the master. I notice that just before this point, the test uses START SLAVE UNTIL, then waits for the SQL thread to stop. But there is no wait for the IO thread to stop. It seems plausible that the IO thread is still shutting down due to UNTIL when the test reaches START SLAVE. Thus, START SLAVE does not start the IO thread (it is not running), and a split-second after, the IO thread then stops due to the UNTIL condition. This would leave to a failure like what is seen. EDIT: confirmed, I was able to reproduce with a sleep just before the I/O thread marks itself as stopped.
            knielsen Kristian Nielsen added a comment - Pushed to 10.0.16: http://lists.askmonty.org/pipermail/commits/2014-December/007140.html

            People

              knielsen Kristian Nielsen
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.