[MDEV-15284] innodb_gis.rtree_concurrent_srch fails with COUNT(*) mismatch Created: 2018-02-11  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.2, 10.3
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Thirunarayanan Balathandayuthapani
Resolution: Unresolved Votes: 1
Labels: race, upstream

Attachments: File 0001-Clean-up-SPATIAL-INDEX-lock-creation.patch    
Issue Links:
Relates
relates to MDEV-14057 InnoDB GIS tests fail Closed
relates to MDEV-14059 InnoDB assertion failure offset >= ((... Closed
relates to MDEV-16269 innodb_gis.rtree_compress2 failed in ... Open
relates to MDEV-15275 innodb_gis.rtree_purge failed in buil... Open
relates to MDEV-24257 innodb_gis.rtree_purge failed in bb,... Open

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2-big/builds/1736/steps/mtr_emb/logs/stdio

innodb_gis.rtree_concurrent_srch 'innodb' w2 [ fail ]
        Test ended at 2018-02-11 16:32:51
 
CURRENT_TEST: innodb_gis.rtree_concurrent_srch
Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-ca=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/cacert.pem'
Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-cert=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/client-cert.pem'
Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-key=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/client-key.pem'
Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown option '--loose-skip-ssl'
--- /mnt/buildbot/build/mariadb-10.2.13/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result	2018-02-11 15:34:47.000000000 +0200
+++ /mnt/buildbot/build/mariadb-10.2.13/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject	2018-02-11 16:32:49.000000000 +0200
@@ -28,7 +28,7 @@
 SET DEBUG_SYNC = 'now SIGNAL go_ahead';
 connection a;
 count(*)
-576
+442
 select count(*) from t1 where MBRWithin(t1.c2, @g1);
 count(*)
 1152
 
mysqltest: Result content mismatch



 Comments   
Comment by Marko Mäkelä [ 2018-02-19 ]

The test is executing the SELECT concurrently with ROLLBACK. In MDEV-14059 I put back some dubious code to the debug build, similar to MySQL 5.7, which apparently does this in an attempt to hide a genuine problem: ROLLBACK can corrupt the R-tree cursors of ongoing operations. There is a similar race between purge and DML, and it is possible that that race has been ‘solved’ by skipping the purge operation (leaving delete-marked garbage in the spatial index).
I can occasionally repeat this problem locally. Here is a failure from the non-embedded 10.3 as of commit 5521994ce2fb119946b08a0071e8271181a29b1f:

CURRENT_TEST: innodb_gis.rtree_concurrent_srch
--- /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result	2018-01-05 22:50:48.860723015 +0200
+++ /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject	2018-02-19 16:10:08.359421260 +0200
@@ -28,7 +28,7 @@
 SET DEBUG_SYNC = 'now SIGNAL go_ahead';
 connection a;
 count(*)
-576
+438
 select count(*) from t1 where MBRWithin(t1.c2, @g1);
 count(*)
 1152
 
mysqltest: Result content mismatch

Comment by Marko Mäkelä [ 2018-02-21 ]

The same problem affects also CHECK TABLE, which is counting the records in each index, using a read view.

CURRENT_TEST: innodb_gis.rtree_recovery
--- /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.result    2018-01-11 22:44:40.580284621 +0200
+++ /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.reject    2018-02-21 00:31:21.414434778 +0200
@@ -19,7 +19,8 @@
 COMMIT;
 check table t1;
 Table  Op      Msg_type        Msg_text
-test.t1        check   status  OK
+test.t1        check   Warning InnoDB: Index 'c2' contains 551 entries, should be 367.
+test.t1        check   error   Corrupt

This test would start a large transaction, then kill and restart the server, and execute CHECK TABLE.
The test is nondeterministic:

# Test level 1 rtree.
CALL insert_t1(367);
COMMIT;
 
--let $shutdown_timeout=0
--source include/restart_mysqld.inc
 
# Check table.
check table t1;

As far as I understand, the COMMIT might not be durable at the time the server is killed (timeout 0 causes kill), because there is no SET GLOBAL innodb_flush_log_at_trx_commit=1 in the test. So, on the subsequent startup, we might actually initiate a rollback of the incomplete transaction, which would run in parallel with the CHECK TABLE. But then again, all 367 records were counted in the clustered index, which would hint that the transaction had been committed.
This test looks a bit strange, because it is dropping and re-creating the table in the middle. To better catch bugs, the same table should be used as long as possible.

Comment by Marko Mäkelä [ 2018-02-21 ]

See also MDEV-14059.

Comment by Marko Mäkelä [ 2018-03-14 ]

I tried to refactor the way how transactional locks are created on spatial indexes. The attached 0001-Clean-up-SPATIAL-INDEX-lock-creation.patch is work in progress (does not work correctly yet), but I believe that it revealed a possible bug: sel_set_rtr_rec_lock() would appear to set a record lock in a spatial index, while my understanding is that spatial indexes should always use page locks or predicate locks.

Generated at Thu Feb 08 08:20:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.