Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7142

main.index_merge_innodb fails sporadically in buildbot with wrong result or timeout

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • N/A
    • Tests
    • 10.2.1-5

    Description

      http://buildbot.askmonty.org/buildbot/builders/bintar-rhel7-p8/builds/16/steps/test/logs/stdio

      main.index_merge_innodb 'xtradb'         w1 [ fail ]
              Test ended at 2014-11-19 12:27:44
       
      CURRENT_TEST: main.index_merge_innodb
      --- /home/buildbot/maria-slave/power8-vlp03-bintar/build/mysql-test/r/index_merge_innodb.result	2014-11-19 10:43:11.911410000 -0500
      +++ /home/buildbot/maria-slave/power8-vlp03-bintar/build/mysql-test/r/index_merge_innodb.reject	2014-11-19 12:27:43.950083391 -0500
      @@ -313,7 +313,7 @@
       update t1 set key2=key1,key3=key1;
       explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
       id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
      -1	SIMPLE	t1	index_merge	i2,i3	i3,i2	4,4	NULL	REF	Using sort_union(i3,i2); Using where
      +1	SIMPLE	t1	ALL	i2,i3	NULL	NULL	NULL	REF	Using where
       select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
       key1	key2	key3
       31	31	31
      

      It happens from time to time on different builds.

      Attachments

        Issue Links

          Activity

            I suspect this something that is not a problem of index_merge. index_merge code hasn't been touched in years. Perhaps, innodb started to return different record estimates...

            elenst, can you track when (at least approximately) this issue started to show up?

            psergei Sergei Petrunia added a comment - I suspect this something that is not a problem of index_merge. index_merge code hasn't been touched in years. Perhaps, innodb started to return different record estimates... elenst , can you track when (at least approximately) this issue started to show up?

            Debugging this example, in get_best_disjunct_quick

            The table has 1024 rows.

            costs of merged index scans are:

            (gdb) p (*cur_child)->read_cost
              $7 = 1.6124384296513044
            ...
            (gdb) p (*cur_child)->read_cost
              $8 = 2.2160960741282612
            

            After adding costs merging and of using Unique we construct TRP_INDEX_MERGE
            with this cost:

            (gdb) print imerge_cost
              $20 = 12.928837487575034
            

            There is a huge difference with cost of full scan:

            (gdb) print read_time
              $13 = 207.90000000000001
             
            (gdb) p param->table->stat_records()
              $12 = 1024
            

            So, this failure doesn't look like an "optimizer switches between two query plans with nearly identical costs" failure.

            psergei Sergei Petrunia added a comment - Debugging this example, in get_best_disjunct_quick The table has 1024 rows. costs of merged index scans are: (gdb) p (*cur_child)->read_cost $7 = 1.6124384296513044 ... (gdb) p (*cur_child)->read_cost $8 = 2.2160960741282612 After adding costs merging and of using Unique we construct TRP_INDEX_MERGE with this cost: (gdb) print imerge_cost $20 = 12.928837487575034 There is a huge difference with cost of full scan: (gdb) print read_time $13 = 207.90000000000001   (gdb) p param->table->stat_records() $12 = 1024 So, this failure doesn't look like an "optimizer switches between two query plans with nearly identical costs" failure.

            The only reason for this failure that I can think of is InnoDB's statistics update being delayed. This will cause the optimizer to see a very low stat_records(), and it will pick full scan instead of index_merge.
            Pushed this patch: https://github.com/MariaDB/server/commit/a14f61ef749ad9f9ab2b0f5badf6754ba7443c9e

            elenst, feel free to re-open if the issue still shows up.

            psergei Sergei Petrunia added a comment - The only reason for this failure that I can think of is InnoDB's statistics update being delayed. This will cause the optimizer to see a very low stat_records(), and it will pick full scan instead of index_merge. Pushed this patch: https://github.com/MariaDB/server/commit/a14f61ef749ad9f9ab2b0f5badf6754ba7443c9e elenst , feel free to re-open if the issue still shows up.

            It still happens.
            http://buildbot.askmonty.org/buildbot/builders/kvm-deb-jessie-amd64/builds/1308/steps/test_4/logs/stdio

            main.index_merge_innodb 'xtradb'         w3 [ fail ]
                    Test ended at 2016-11-01 04:24:15
             
            CURRENT_TEST: main.index_merge_innodb
            --- /usr/share/mysql/mysql-test/r/index_merge_innodb.result	2016-11-01 01:56:57.000000000 -0400
            +++ /dev/shm/var/3/log/index_merge_innodb.reject	2016-11-01 04:24:15.250529474 -0400
            @@ -316,7 +316,7 @@
             test.t1	analyze	status	OK
             explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
             id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            -1	SIMPLE	t1	index_merge	i2,i3	i3,i2	4,4	NULL	REF	Using sort_union(i3,i2); Using where
            +1	SIMPLE	t1	ALL	i2,i3	NULL	NULL	NULL	REF	Using where
             select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
             key1	key2	key3
             31	31	31
             
            mysqltest: Result length mismatch
            

            Normally we don't re-open issues after (attempted) bugfixes have been released, but since it's just a test, I suppose there is no big harm.

            elenst Elena Stepanova added a comment - It still happens. http://buildbot.askmonty.org/buildbot/builders/kvm-deb-jessie-amd64/builds/1308/steps/test_4/logs/stdio main.index_merge_innodb 'xtradb' w3 [ fail ] Test ended at 2016-11-01 04:24:15   CURRENT_TEST: main.index_merge_innodb --- /usr/share/mysql/mysql-test/r/index_merge_innodb.result 2016-11-01 01:56:57.000000000 -0400 +++ /dev/shm/var/3/log/index_merge_innodb.reject 2016-11-01 04:24:15.250529474 -0400 @@ -316,7 +316,7 @@ test.t1 analyze status OK explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40); id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 index_merge i2,i3 i3,i2 4,4 NULL REF Using sort_union(i3,i2); Using where +1 SIMPLE t1 ALL i2,i3 NULL NULL NULL REF Using where select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40); key1 key2 key3 31 31 31   mysqltest: Result length mismatch Normally we don't re-open issues after (attempted) bugfixes have been released, but since it's just a test, I suppose there is no big harm.
            elenst Elena Stepanova added a comment - Still happens, now on 10.2: https://internal.askmonty.org/buildbot/builders/win32-packages/builds/2110/steps/test/logs/stdio

            It also fails with timeout on valgrind:
            http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10064/steps/test/logs/stdio

            main.index_merge_innodb 'xtradb'         w1 [ fail ]  timeout after 9000 seconds
                    Test ended at 2017-05-23 02:58:00
             
            Test case timeout after 9000 seconds
             
            == /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/log/index_merge_innodb.log == 
            key2 int not null,
            pktail1ok  int not null,
            pktail2ok  int not null,
            pktail3bad int not null,
            pktail4bad int not null,
            pktail5bad int not null,
            pk2copy int not null,
            badkey  int not null,
            filler1 char (200),
            filler2 char (200),
            key (key1),
            key (key2),
            /* keys with tails from CPK members */
            key (pktail1ok, pk1),
            key (pktail2ok, pk1, pk2),
            key (pktail3bad, pk2, pk1),
            key (pktail4bad, pk1, pk2copy),
            key (pktail5bad, pk1, pk2, pk2copy),
            primary key (pk1, pk2)
            );
             
             == /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/tmp/analyze-timeout-mysqld.1.err ==
            mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/tmp/1/mysqld.1.sock' (111 "Connection refused")
             
             
             - skipping '/mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/log/main.index_merge_innodb-xtradb/'
            ***Warnings generated in error logs during shutdown after running tests: funcs_1.is_columns_is main.index_merge_innodb
             
            Attempting backtrace. You can use the following information to find out
            

            elenst Elena Stepanova added a comment - It also fails with timeout on valgrind: http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10064/steps/test/logs/stdio main.index_merge_innodb 'xtradb' w1 [ fail ] timeout after 9000 seconds Test ended at 2017-05-23 02:58:00   Test case timeout after 9000 seconds   == /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/log/index_merge_innodb.log == key2 int not null, pktail1ok int not null, pktail2ok int not null, pktail3bad int not null, pktail4bad int not null, pktail5bad int not null, pk2copy int not null, badkey int not null, filler1 char (200), filler2 char (200), key (key1), key (key2), /* keys with tails from CPK members */ key (pktail1ok, pk1), key (pktail2ok, pk1, pk2), key (pktail3bad, pk2, pk1), key (pktail4bad, pk1, pk2copy), key (pktail5bad, pk1, pk2, pk2copy), primary key (pk1, pk2) );   == /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/tmp/analyze-timeout-mysqld.1.err == mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/tmp/1/mysqld.1.sock' (111 "Connection refused")     - skipping '/mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/log/main.index_merge_innodb-xtradb/' ***Warnings generated in error logs during shutdown after running tests: funcs_1.is_columns_is main.index_merge_innodb   Attempting backtrace. You can use the following information to find out

            My attempt to debug this issue (the original one, not the timeout) has resulted in https://bugs.mysql.com/bug.php?id=84366 (InnoDB index dives do not detect concurrent tree changes, return bogus estimates)

            laurynas Laurynas Biveinis added a comment - My attempt to debug this issue (the original one, not the timeout) has resulted in https://bugs.mysql.com/bug.php?id=84366 (InnoDB index dives do not detect concurrent tree changes, return bogus estimates)
            alice Alice Sherepa added a comment -

            it fails on 10.3 too, now with innodb. http://buildbot.askmonty.org/buildbot/builders/kvm-deb-stretch-x86/builds/1183/steps/mtr/logs/stdio

            main.index_merge_innodb 'innodb'         w4 [ fail ]
                    Test ended at 2017-09-05 03:31:47
            CURRENT_TEST: main.index_merge_innodb
            --- /usr/share/mysql/mysql-test/r/index_merge_innodb.result	2017-09-05 02:12:54.000000000 -0400
            +++ /dev/shm/var/4/log/index_merge_innodb.reject	2017-09-05 03:31:47.271347035 -0400
            @@ -316,7 +316,7 @@
             test.t1	analyze	status	OK
             explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
             id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            -1	SIMPLE	t1	index_merge	i2,i3	i3,i2	4,4	NULL	REF	Using sort_union(i3,i2); Using where
            +1	SIMPLE	t1	ALL	i2,i3	NULL	NULL	NULL	REF	Using where
             select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40);
             key1	key2	key3
             31	31	31
             
            mysqltest: Result length mismatch
            

            alice Alice Sherepa added a comment - it fails on 10.3 too, now with innodb. http://buildbot.askmonty.org/buildbot/builders/kvm-deb-stretch-x86/builds/1183/steps/mtr/logs/stdio main.index_merge_innodb 'innodb' w4 [ fail ] Test ended at 2017-09-05 03:31:47 CURRENT_TEST: main.index_merge_innodb --- /usr/share/mysql/mysql-test/r/index_merge_innodb.result 2017-09-05 02:12:54.000000000 -0400 +++ /dev/shm/var/4/log/index_merge_innodb.reject 2017-09-05 03:31:47.271347035 -0400 @@ -316,7 +316,7 @@ test.t1 analyze status OK explain select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40); id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 index_merge i2,i3 i3,i2 4,4 NULL REF Using sort_union(i3,i2); Using where +1 SIMPLE t1 ALL i2,i3 NULL NULL NULL REF Using where select * from t1 where (key3 > 30 and key3<35) or (key2 >32 and key2 < 40); key1 key2 key3 31 31 31   mysqltest: Result length mismatch

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.