Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15284

innodb_gis.rtree_concurrent_srch fails with COUNT(*) mismatch

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2-big/builds/1736/steps/mtr_emb/logs/stdio

      innodb_gis.rtree_concurrent_srch 'innodb' w2 [ fail ]
              Test ended at 2018-02-11 16:32:51
       
      CURRENT_TEST: innodb_gis.rtree_concurrent_srch
      Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-ca=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/cacert.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-cert=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/client-cert.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-key=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/client-key.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown option '--loose-skip-ssl'
      --- /mnt/buildbot/build/mariadb-10.2.13/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result	2018-02-11 15:34:47.000000000 +0200
      +++ /mnt/buildbot/build/mariadb-10.2.13/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject	2018-02-11 16:32:49.000000000 +0200
      @@ -28,7 +28,7 @@
       SET DEBUG_SYNC = 'now SIGNAL go_ahead';
       connection a;
       count(*)
      -576
      +442
       select count(*) from t1 where MBRWithin(t1.c2, @g1);
       count(*)
       1152
       
      mysqltest: Result content mismatch
      

      Attachments

        Issue Links

          Activity

            The test is executing the SELECT concurrently with ROLLBACK. In MDEV-14059 I put back some dubious code to the debug build, similar to MySQL 5.7, which apparently does this in an attempt to hide a genuine problem: ROLLBACK can corrupt the R-tree cursors of ongoing operations. There is a similar race between purge and DML, and it is possible that that race has been ‘solved’ by skipping the purge operation (leaving delete-marked garbage in the spatial index).
            I can occasionally repeat this problem locally. Here is a failure from the non-embedded 10.3 as of commit 5521994ce2fb119946b08a0071e8271181a29b1f:

            CURRENT_TEST: innodb_gis.rtree_concurrent_srch
            --- /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result	2018-01-05 22:50:48.860723015 +0200
            +++ /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject	2018-02-19 16:10:08.359421260 +0200
            @@ -28,7 +28,7 @@
             SET DEBUG_SYNC = 'now SIGNAL go_ahead';
             connection a;
             count(*)
            -576
            +438
             select count(*) from t1 where MBRWithin(t1.c2, @g1);
             count(*)
             1152
             
            mysqltest: Result content mismatch
            

            marko Marko Mäkelä added a comment - The test is executing the SELECT concurrently with ROLLBACK . In MDEV-14059 I put back some dubious code to the debug build, similar to MySQL 5.7, which apparently does this in an attempt to hide a genuine problem: ROLLBACK can corrupt the R-tree cursors of ongoing operations. There is a similar race between purge and DML, and it is possible that that race has been ‘solved’ by skipping the purge operation (leaving delete-marked garbage in the spatial index). I can occasionally repeat this problem locally. Here is a failure from the non-embedded 10.3 as of commit 5521994ce2fb119946b08a0071e8271181a29b1f: CURRENT_TEST: innodb_gis.rtree_concurrent_srch --- /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result 2018-01-05 22:50:48.860723015 +0200 +++ /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject 2018-02-19 16:10:08.359421260 +0200 @@ -28,7 +28,7 @@ SET DEBUG_SYNC = 'now SIGNAL go_ahead'; connection a; count(*) -576 +438 select count(*) from t1 where MBRWithin(t1.c2, @g1); count(*) 1152   mysqltest: Result content mismatch

            The same problem affects also CHECK TABLE, which is counting the records in each index, using a read view.

            CURRENT_TEST: innodb_gis.rtree_recovery
            --- /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.result    2018-01-11 22:44:40.580284621 +0200
            +++ /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.reject    2018-02-21 00:31:21.414434778 +0200
            @@ -19,7 +19,8 @@
             COMMIT;
             check table t1;
             Table  Op      Msg_type        Msg_text
            -test.t1        check   status  OK
            +test.t1        check   Warning InnoDB: Index 'c2' contains 551 entries, should be 367.
            +test.t1        check   error   Corrupt
            

            This test would start a large transaction, then kill and restart the server, and execute CHECK TABLE.
            The test is nondeterministic:

            # Test level 1 rtree.
            CALL insert_t1(367);
            COMMIT;
             
            --let $shutdown_timeout=0
            --source include/restart_mysqld.inc
             
            # Check table.
            check table t1;
            

            As far as I understand, the COMMIT might not be durable at the time the server is killed (timeout 0 causes kill), because there is no SET GLOBAL innodb_flush_log_at_trx_commit=1 in the test. So, on the subsequent startup, we might actually initiate a rollback of the incomplete transaction, which would run in parallel with the CHECK TABLE. But then again, all 367 records were counted in the clustered index, which would hint that the transaction had been committed.
            This test looks a bit strange, because it is dropping and re-creating the table in the middle. To better catch bugs, the same table should be used as long as possible.

            marko Marko Mäkelä added a comment - The same problem affects also CHECK TABLE , which is counting the records in each index, using a read view. CURRENT_TEST: innodb_gis.rtree_recovery --- /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.result 2018-01-11 22:44:40.580284621 +0200 +++ /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.reject 2018-02-21 00:31:21.414434778 +0200 @@ -19,7 +19,8 @@ COMMIT; check table t1; Table Op Msg_type Msg_text -test.t1 check status OK +test.t1 check Warning InnoDB: Index 'c2' contains 551 entries, should be 367. +test.t1 check error Corrupt This test would start a large transaction, then kill and restart the server, and execute CHECK TABLE . The test is nondeterministic: # Test level 1 rtree. CALL insert_t1(367); COMMIT ;   --let $shutdown_timeout=0 --source include/restart_mysqld.inc   # Check table . check table t1; As far as I understand, the COMMIT might not be durable at the time the server is killed (timeout 0 causes kill), because there is no SET GLOBAL innodb_flush_log_at_trx_commit=1 in the test. So, on the subsequent startup, we might actually initiate a rollback of the incomplete transaction, which would run in parallel with the CHECK TABLE . But then again, all 367 records were counted in the clustered index, which would hint that the transaction had been committed. This test looks a bit strange, because it is dropping and re-creating the table in the middle. To better catch bugs, the same table should be used as long as possible.

            See also MDEV-14059.

            marko Marko Mäkelä added a comment - See also MDEV-14059 .

            I tried to refactor the way how transactional locks are created on spatial indexes. The attached 0001-Clean-up-SPATIAL-INDEX-lock-creation.patch is work in progress (does not work correctly yet), but I believe that it revealed a possible bug: sel_set_rtr_rec_lock() would appear to set a record lock in a spatial index, while my understanding is that spatial indexes should always use page locks or predicate locks.

            marko Marko Mäkelä added a comment - I tried to refactor the way how transactional locks are created on spatial indexes. The attached 0001-Clean-up-SPATIAL-INDEX-lock-creation.patch is work in progress (does not work correctly yet), but I believe that it revealed a possible bug: sel_set_rtr_rec_lock() would appear to set a record lock in a spatial index, while my understanding is that spatial indexes should always use page locks or predicate locks.

            People

              thiru Thirunarayanan Balathandayuthapani
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.