[MDEV-15284] innodb_gis.rtree_concurrent_srch fails with COUNT(*) mismatch - Jira

Details

Type: Bug
Status: Confirmed (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.2(EOL), 10.3(EOL)
Fix Version/s: 10.5, 10.6, 10.11, 11.4
Component/s: Storage Engine - InnoDB, Tests
Labels:
- race
- upstream

Description

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2-big/builds/1736/steps/mtr_emb/logs/stdio

innodb_gis.rtree_concurrent_srch 'innodb' w2 [ fail ]

        Test ended at 2018-02-11 16:32:51

CURRENT_TEST: innodb_gis.rtree_concurrent_srch

Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-ca=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/cacert.pem'

Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-cert=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/client-cert.pem'

Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-key=/mnt/buildbot/build/mariadb-10.2.13/mysql-test/std_data/client-key.pem'

Warning: /mnt/buildbot/build/mariadb-10.2.13/libmysqld/examples/mysqltest_embedded: unknown option '--loose-skip-ssl'

--- /mnt/buildbot/build/mariadb-10.2.13/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result	2018-02-11 15:34:47.000000000 +0200

+++ /mnt/buildbot/build/mariadb-10.2.13/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject	2018-02-11 16:32:49.000000000 +0200

@@ -28,7 +28,7 @@

 SET DEBUG_SYNC = 'now SIGNAL go_ahead';

 connection a;

 count(*)

-576

+442

 select count(*) from t1 where MBRWithin(t1.c2, @g1);

 count(*)

mysqltest: Result content mismatch

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

0001-Clean-up-SPATIAL-INDEX-lock-creation.patch
17 kB
2018-03-14 09:33

Issue Links

relates to

MDEV-14057 InnoDB GIS tests fail

Closed

MDEV-14059 InnoDB assertion failure offset >= ((38U + 36 + 2 * 10) + 5) at page0page.h line 318

Closed

MDEV-16269 innodb_gis.rtree_compress2 failed in buildbot with wrong result

Open

MDEV-35420 Server aborts while deleting the record in spatial index

Closed

MDEV-36612 Implement CHECK TABLE…EXTENDED for SPATIAL INDEX

Open

MDEV-15275 innodb_gis.rtree_purge failed in buildbot with timeout

Open

MDEV-24257 innodb_gis.rtree_purge failed in bb, crash after "delete from t"

Open

MDEV-26123 Using Spatial Indexes results in Update locks cannot be acquired during a READ UNCOMMITTED transaction

Stalled

(3 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2018-02-19 14:36

The test is executing the SELECT concurrently with ROLLBACK. In ~~MDEV-14059~~ I put back some dubious code to the debug build, similar to MySQL 5.7, which apparently does this in an attempt to hide a genuine problem: ROLLBACK can corrupt the R-tree cursors of ongoing operations. There is a similar race between purge and DML, and it is possible that that race has been ‘solved’ by skipping the purge operation (leaving delete-marked garbage in the spatial index).
I can occasionally repeat this problem locally. Here is a failure from the non-embedded 10.3 as of commit 5521994ce2fb119946b08a0071e8271181a29b1f:

CURRENT_TEST: innodb_gis.rtree_concurrent_srch

--- /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result	2018-01-05 22:50:48.860723015 +0200

+++ /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject	2018-02-19 16:10:08.359421260 +0200

@@ -28,7 +28,7 @@

 SET DEBUG_SYNC = 'now SIGNAL go_ahead';

 connection a;

 count(*)

-576

+438

 select count(*) from t1 where MBRWithin(t1.c2, @g1);

 count(*)

mysqltest: Result content mismatch

Marko Mäkelä added a comment - 2018-02-19 14:36 The test is executing the SELECT concurrently with ROLLBACK . In MDEV-14059 I put back some dubious code to the debug build, similar to MySQL 5.7, which apparently does this in an attempt to hide a genuine problem: ROLLBACK can corrupt the R-tree cursors of ongoing operations. There is a similar race between purge and DML, and it is possible that that race has been ‘solved’ by skipping the purge operation (leaving delete-marked garbage in the spatial index). I can occasionally repeat this problem locally. Here is a failure from the non-embedded 10.3 as of commit 5521994ce2fb119946b08a0071e8271181a29b1f: CURRENT_TEST: innodb_gis.rtree_concurrent_srch --- /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.result 2018-01-05 22:50:48.860723015 +0200 +++ /mariadb/10.3/mysql-test/suite/innodb_gis/r/rtree_concurrent_srch.reject 2018-02-19 16:10:08.359421260 +0200 @@ -28,7 +28,7 @@ SET DEBUG_SYNC = 'now SIGNAL go_ahead'; connection a; count(*) -576 +438 select count(*) from t1 where MBRWithin(t1.c2, @g1); count(*) 1152 mysqltest: Result content mismatch

Marko Mäkelä added a comment - 2018-02-21 08:46

The same problem affects also CHECK TABLE, which is counting the records in each index, using a read view.

CURRENT_TEST: innodb_gis.rtree_recovery

--- /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.result    2018-01-11 22:44:40.580284621 +0200

+++ /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.reject    2018-02-21 00:31:21.414434778 +0200

@@ -19,7 +19,8 @@

 COMMIT;

 check table t1;

 Table  Op      Msg_type        Msg_text

-test.t1        check   status  OK

+test.t1        check   Warning InnoDB: Index 'c2' contains 551 entries, should be 367.

+test.t1        check   error   Corrupt

This test would start a large transaction, then kill and restart the server, and execute CHECK TABLE.
The test is nondeterministic:

# Test level 1 rtree.

CALL insert_t1(367);

COMMIT;

--let $shutdown_timeout=0

--source include/restart_mysqld.inc

# Check table.

check table t1;

As far as I understand, the COMMIT might not be durable at the time the server is killed (timeout 0 causes kill), because there is no SET GLOBAL innodb_flush_log_at_trx_commit=1 in the test. So, on the subsequent startup, we might actually initiate a rollback of the incomplete transaction, which would run in parallel with the CHECK TABLE. But then again, all 367 records were counted in the clustered index, which would hint that the transaction had been committed.
This test looks a bit strange, because it is dropping and re-creating the table in the middle. To better catch bugs, the same table should be used as long as possible.

Marko Mäkelä added a comment - 2018-02-21 08:46 The same problem affects also CHECK TABLE , which is counting the records in each index, using a read view. CURRENT_TEST: innodb_gis.rtree_recovery --- /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.result 2018-01-11 22:44:40.580284621 +0200 +++ /home/my/maria-compatibility/mysql-test/suite/innodb_gis/r/rtree_recovery.reject 2018-02-21 00:31:21.414434778 +0200 @@ -19,7 +19,8 @@ COMMIT; check table t1; Table Op Msg_type Msg_text -test.t1 check status OK +test.t1 check Warning InnoDB: Index 'c2' contains 551 entries, should be 367. +test.t1 check error Corrupt This test would start a large transaction, then kill and restart the server, and execute CHECK TABLE . The test is nondeterministic: # Test level 1 rtree. CALL insert_t1(367); COMMIT ; --let $shutdown_timeout=0 --source include/restart_mysqld.inc # Check table . check table t1; As far as I understand, the COMMIT might not be durable at the time the server is killed (timeout 0 causes kill), because there is no SET GLOBAL innodb_flush_log_at_trx_commit=1 in the test. So, on the subsequent startup, we might actually initiate a rollback of the incomplete transaction, which would run in parallel with the CHECK TABLE . But then again, all 367 records were counted in the clustered index, which would hint that the transaction had been committed. This test looks a bit strange, because it is dropping and re-creating the table in the middle. To better catch bugs, the same table should be used as long as possible.

Marko Mäkelä added a comment - 2018-02-21 08:47

MariaDB Server

innodb_gis.rtree_concurrent_srch fails with COUNT(*) mismatch

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration