Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11722

main.join_cache fails in buildbot on very slow builders

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/xenial-amd64-valgrind/builds/256/steps/test/logs/stdio

      main.join_cache 'xtradb'                 w15 [ fail ]
              Test ended at 2017-01-03 17:17:43
       
      CURRENT_TEST: main.join_cache
      --- /home/buildbot/build/mysql-test/r/join_cache.result	2017-01-02 14:29:31.000000000 -0500
      +++ /home/buildbot/build/mysql-test/r/join_cache.reject	2017-01-03 17:17:42.082081716 -0500
      @@ -5706,7 +5706,7 @@
       id	col1
       select timestampdiff(second, @init_time, now()) <= 1;
       timestampdiff(second, @init_time, now()) <= 1
      -1
      +0
       set join_cache_level=2;
       set @init_time:=now();
       SELECT t.* 
       
      mysqltest: Result content mismatch
      

      This part tests a fix for performance degradation MDEV-6292, and it assumes that the query should take less than a second, which of course is not necessarily true if the builder is really slow.

      Attachments

        Issue Links

          Activity

            Time is measured for this query performed on empty tables:

            SELECT t.* 
            FROM 
              t1 t 
              LEFT JOIN t2 c1 ON c1.parent_id = t.id AND c1.col2 = "val"
              LEFT JOIN t2 c2 ON c2.parent_id = t.id AND c2.col2 = "val"
              LEFT JOIN t2 c3 ON c3.parent_id = t.id AND c3.col2 = "val"
              LEFT JOIN t2 c4 ON c4.parent_id = t.id AND c4.col2 = "val"
              LEFT JOIN t2 c5 ON c5.parent_id = t.id AND c5.col2 = "val"
              LEFT JOIN t2 c6 ON c6.parent_id = t.id AND c6.col2 = "val"
              LEFT JOIN t2 c7 ON c7.parent_id = t.id AND c7.col2 = "val"
              LEFT JOIN t2 c8 ON c8.parent_id = t.id AND c8.col2 = "val"
              LEFT JOIN t2 c9 ON c9.parent_id = t.id AND c9.col2 = "val"
              LEFT JOIN t2 c10 ON c10.parent_id = t.id AND c10.col2 = "val"
              LEFT JOIN t2 c11 ON c11.parent_id = t.id AND c11.col2 = "val"
              LEFT JOIN t2 c12 ON c12.parent_id = t.id AND c12.col2 = "val"
              LEFT JOIN t2 c13 ON c13.parent_id = t.id AND c13.col2 = "val"
              LEFT JOIN t2 c14 ON c14.parent_id = t.id AND c14.col2 = "val"
              LEFT JOIN t2 c15 ON c15.parent_id = t.id AND c15.col2 = "val"
              LEFT JOIN t2 c16 ON c16.parent_id = t.id AND c16.col2 = "val"
              LEFT JOIN t2 c17 ON c17.parent_id = t.id AND c17.col2 = "val"
              LEFT JOIN t2 c18 ON c18.parent_id = t.id AND c18.col2 = "val"
              LEFT JOIN t2 c19 ON c19.parent_id = t.id AND c19.col2 = "val"
              LEFT JOIN t2 c20 ON c20.parent_id = t.id AND c20.col2 = "val"
              LEFT JOIN t2 c21 ON c21.parent_id = t.id AND c21.col2 = "val"
              LEFT JOIN t2 c22 ON c22.parent_id = t.id AND c22.col2 = "val"
              LEFT JOIN t2 c23 ON c23.parent_id = t.id AND c23.col2 = "val"
              LEFT JOIN t2 c24 ON c24.parent_id = t.id AND c24.col2 = "val"
              LEFT JOIN t2 c25 ON c25.parent_id = t.id AND c25.col2 = "val"
            ORDER BY 
              col1;
            

            After the fix MDEV-6292, the query for both values of join_cache_level take some ~2000-4000 microsecond on a non-debug build (the value can vary of course).
            Before the fix, on 10.0.13, the query with join_cache_level=2 on the same machine similar non-debug build takes ~1700000 microseconds. It means that on a really fast machine it can produce false negatives, but i don't think it's extremely important.

            If I add two more LEFT JOINs to the query, execution time for fast ones (with join_cache_level=0 before the fix, with both values on the fixed version) does not change noticeably, it's still around ~4000 microseconds; but for the slow query (join_cache_level=2 before the fix) it grows up to ~6500000 microseconds.

            So, the easy solution is to add this couple of LEFT JOINs and then increase the margin for the error, make it fail, let's say, if the query takes longer than 4 seconds. If there is no bug, it shouldn't matter, and it should help for slow builders to avoid false positives.

            elenst Elena Stepanova added a comment - Time is measured for this query performed on empty tables: SELECT t.* FROM t1 t LEFT JOIN t2 c1 ON c1.parent_id = t.id AND c1.col2 = "val" LEFT JOIN t2 c2 ON c2.parent_id = t.id AND c2.col2 = "val" LEFT JOIN t2 c3 ON c3.parent_id = t.id AND c3.col2 = "val" LEFT JOIN t2 c4 ON c4.parent_id = t.id AND c4.col2 = "val" LEFT JOIN t2 c5 ON c5.parent_id = t.id AND c5.col2 = "val" LEFT JOIN t2 c6 ON c6.parent_id = t.id AND c6.col2 = "val" LEFT JOIN t2 c7 ON c7.parent_id = t.id AND c7.col2 = "val" LEFT JOIN t2 c8 ON c8.parent_id = t.id AND c8.col2 = "val" LEFT JOIN t2 c9 ON c9.parent_id = t.id AND c9.col2 = "val" LEFT JOIN t2 c10 ON c10.parent_id = t.id AND c10.col2 = "val" LEFT JOIN t2 c11 ON c11.parent_id = t.id AND c11.col2 = "val" LEFT JOIN t2 c12 ON c12.parent_id = t.id AND c12.col2 = "val" LEFT JOIN t2 c13 ON c13.parent_id = t.id AND c13.col2 = "val" LEFT JOIN t2 c14 ON c14.parent_id = t.id AND c14.col2 = "val" LEFT JOIN t2 c15 ON c15.parent_id = t.id AND c15.col2 = "val" LEFT JOIN t2 c16 ON c16.parent_id = t.id AND c16.col2 = "val" LEFT JOIN t2 c17 ON c17.parent_id = t.id AND c17.col2 = "val" LEFT JOIN t2 c18 ON c18.parent_id = t.id AND c18.col2 = "val" LEFT JOIN t2 c19 ON c19.parent_id = t.id AND c19.col2 = "val" LEFT JOIN t2 c20 ON c20.parent_id = t.id AND c20.col2 = "val" LEFT JOIN t2 c21 ON c21.parent_id = t.id AND c21.col2 = "val" LEFT JOIN t2 c22 ON c22.parent_id = t.id AND c22.col2 = "val" LEFT JOIN t2 c23 ON c23.parent_id = t.id AND c23.col2 = "val" LEFT JOIN t2 c24 ON c24.parent_id = t.id AND c24.col2 = "val" LEFT JOIN t2 c25 ON c25.parent_id = t.id AND c25.col2 = "val" ORDER BY col1; After the fix MDEV-6292 , the query for both values of join_cache_level take some ~2000-4000 microsecond on a non-debug build (the value can vary of course). Before the fix, on 10.0.13, the query with join_cache_level=2 on the same machine similar non-debug build takes ~1700000 microseconds. It means that on a really fast machine it can produce false negatives, but i don't think it's extremely important. If I add two more LEFT JOINs to the query, execution time for fast ones (with join_cache_level=0 before the fix, with both values on the fixed version) does not change noticeably, it's still around ~4000 microseconds; but for the slow query (join_cache_level=2 before the fix) it grows up to ~6500000 microseconds. So, the easy solution is to add this couple of LEFT JOINs and then increase the margin for the error, make it fail, let's say, if the query takes longer than 4 seconds. If there is no bug, it shouldn't matter, and it should help for slow builders to avoid false positives.
            elenst Elena Stepanova added a comment - https://github.com/MariaDB/server/commit/f1ee011a6cbda1069a6ec9b5e2428451a64861fd

            People

              elenst Elena Stepanova
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.