Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17745

innodb.innodb_stats_persistent failed in buildbot with wrong result

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest-big/builds/2224

      10.3 573c4db57a9b9fc5998bd2a2f1311873

      innodb.innodb_stats_persistent 'innodb'  w2 [ fail ]
              Test ended at 2018-11-13 22:28:21
       
      CURRENT_TEST: innodb.innodb_stats_persistent
      --- /mnt/buildbot/build/mariadb-10.3.11/mysql-test/suite/innodb/r/innodb_stats_persistent.result	2018-11-13 07:59:56.000000000 -0500
      +++ /mnt/buildbot/build/mariadb-10.3.11/mysql-test/suite/innodb/r/innodb_stats_persistent.reject	2018-11-13 22:28:20.782762345 -0500
      @@ -67,7 +67,7 @@
       1	SIMPLE	t2	ref	val	val	4	const	1	Using index
       SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
       SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
      -InnoDB		0 transactions not purged
      +InnoDB		30 transactions not purged
       SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
       # After COMMIT and purge, the DELETE must show up.
       EXPLAIN SELECT * FROM t1 WHERE val=4;
       
      mysqltest: Result length mismatch
      

      Attachments

        Issue Links

          Activity

            We still seem to have the ‘purge fails to run’ problem that was originally filed as MDEV-11802. With MDEV-12288 in 10.3, it can be observed also as a result of INSERT activity.

            thiru, can you try to repeat this and determine what causes the failure to run? I think I have occasionally seen this on buildbot for other tests as well.

            marko Marko Mäkelä added a comment - We still seem to have the ‘purge fails to run’ problem that was originally filed as MDEV-11802 . With MDEV-12288 in 10.3, it can be observed also as a result of INSERT activity. thiru , can you try to repeat this and determine what causes the failure to run? I think I have occasionally seen this on buildbot for other tests as well.

            I believe that this particular failure can be attributed to the slowness of the system. The performance optimization in MDEV-18878 should help in other cases, but not this one, because the table is not being dropped, rebuilt, or discarded.

            marko Marko Mäkelä added a comment - I believe that this particular failure can be attributed to the slowness of the system. The performance optimization in MDEV-18878 should help in other cases, but not this one, because the table is not being dropped, rebuilt, or discarded.

            Unfortunately it doesn't help much. Tests should be protected from moderate slowness, as there can always be circumstances when slow builders become even slower – not only in buildbot, but also in the build process performed by distributions (it was a big problem with Debian builds).

            The only exception is when a builder is impossibly slow beyond any reason, so that tests fail massively, in which case the builder itself needs to be fixed. I don't think it applies here, though.

            elenst Elena Stepanova added a comment - Unfortunately it doesn't help much. Tests should be protected from moderate slowness, as there can always be circumstances when slow builders become even slower – not only in buildbot, but also in the build process performed by distributions (it was a big problem with Debian builds). The only exception is when a builder is impossibly slow beyond any reason, so that tests fail massively, in which case the builder itself needs to be fixed. I don't think it applies here, though.

            This one could help:
            MDEV-16260 Scale the purge effort according to the workload

            marko Marko Mäkelä added a comment - This one could help: MDEV-16260 Scale the purge effort according to the workload
            marko Marko Mäkelä added a comment - - edited

            I extended the wait_all_purged.inc timeout from 30 to 60 seconds. That should reduce the probability of failures. A 60-second wait was enough to hide MDEV-18878 on the affected platform.

            marko Marko Mäkelä added a comment - - edited I extended the wait_all_purged.inc timeout from 30 to 60 seconds. That should reduce the probability of failures. A 60-second wait was enough to hide MDEV-18878 on the affected platform.

            MDEV-22958 has been filed for the same problem, and I think that the solution is to implement a server-side wait for purge, by introducing a new SET GLOBAL variable, whose update trigger would implement the wait. The wait time can be limited by a statement timeout.

            marko Marko Mäkelä added a comment - MDEV-22958 has been filed for the same problem, and I think that the solution is to implement a server-side wait for purge, by introducing a new SET GLOBAL variable, whose update trigger would implement the wait. The wait time can be limited by a statement timeout.

            This should be fixed by MDEV-16952.

            marko Marko Mäkelä added a comment - This should be fixed by MDEV-16952 .

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.