Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10632

rpl.rpl_parallel fails in buildbot, Failed to sync with master

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0(EOL)
    • N/A
    • Tests
    • None
    • 5.5.55

    Description

      http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/95/steps/test/logs/stdio

      rpl.rpl_parallel 'innodb_plugin,mix'     w1 [ fail ]
              Test ended at 2016-08-14 05:19:15
       
      CURRENT_TEST: rpl.rpl_parallel
      mysqltest: In included file "./include/sync_with_master_gtid.inc": 
      included from /var/lib/buildbot/maria-slave/p8-trusty-bintar-debug/build/mysql-test/suite/rpl/t/rpl_parallel.test at line 2371:
      At line 44: Failed to sync with master
       
      The result from queries just before the failure was:
      < snip >
      SET @old_dbug= @@SESSION.debug_dbug;
      SET @commit_id= 20000;
      SET SESSION debug_dbug="+d,binlog_force_commit_id";
      SET SESSION debug_dbug=@old_dbug;
      SELECT * FROM t7 ORDER BY a;
      a	b
      1	1
      2	2
      3	86
      4	4
      5	5
      100	5
      101	1
      102	2
      103	3
      104	4
      include/save_master_gtid.inc
      include/start_slave.inc
      include/sync_with_master_gtid.inc
      Timeout in master_gtid_wait('1-1-3,0-1-1454,3-1-1,2-1-2', 120), current slave GTID position is: 1-1-3,0-1-1254,3-1-1,2-1-2.
      

      Attachments

        Issue Links

          Activity

            This failure was observed on two builders:

            • p8-trusty-bintar-debug till mid August 2016, about the time when the extremely slow slave p8-trusty-bb was replaced by a less slow one, power8-vlp04;
            • currently on xenial-amd64-valgrind – a valgrind builder which runs tests with high parallel value (--parallel=20 at the moment).

            The problem is, apparently, a simple timing issue, when on a really slow builder the default 120 second timeout in sync_with_master_gtid.inc wasn't enough for the slave to catch up with master. Since we don't observe it anymore on a non-valgrind builders, let's ignore this possibility for now (that p8 was indeed extremely slow); and for valgrind runs, it makes sense to increase the timeout as it has already been done in some other places in MTR.

            https://github.com/MariaDB/server/commit/80a4525b3a9f234043419ce2217880516fd3195b

            elenst Elena Stepanova added a comment - This failure was observed on two builders: p8-trusty-bintar-debug till mid August 2016, about the time when the extremely slow slave p8-trusty-bb was replaced by a less slow one, power8-vlp04; currently on xenial-amd64-valgrind – a valgrind builder which runs tests with high parallel value (--parallel=20 at the moment). The problem is, apparently, a simple timing issue, when on a really slow builder the default 120 second timeout in sync_with_master_gtid.inc wasn't enough for the slave to catch up with master. Since we don't observe it anymore on a non-valgrind builders, let's ignore this possibility for now (that p8 was indeed extremely slow); and for valgrind runs, it makes sense to increase the timeout as it has already been done in some other places in MTR. https://github.com/MariaDB/server/commit/80a4525b3a9f234043419ce2217880516fd3195b

            It happened again recently (Feb 17, 2017) on p8-trusty-bintar-debug:
            http://askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/560

            Later P8 builders were switched to using --mem. After that, the average execution time for this test on this builder has gone down from ~150 sec to ~25 sec, so hopefully the timing issue is solved now (and as mentioned above, for valgrind builds the timeout was exceeded).

            elenst Elena Stepanova added a comment - It happened again recently (Feb 17, 2017) on p8-trusty-bintar-debug: http://askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/560 Later P8 builders were switched to using --mem. After that, the average execution time for this test on this builder has gone down from ~150 sec to ~25 sec, so hopefully the timing issue is solved now (and as mentioned above, for valgrind builds the timeout was exceeded).

            People

              elenst Elena Stepanova
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.