[MDEV-32873] Test innodb.innodb-index-online occasionally fails Created: 2023-11-24  Updated: 2023-11-24  Resolved: 2023-11-24

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: purge

Issue Links:
Relates
relates to MDEV-32874 Test innodb.innodb-table-online,crypt... Closed
relates to MDEV-32050 UNDO logs still growing for write-int... Closed

 Description   

The test innodb.innodb-index-online often fails like this on 10.6 or later branches. I suspect that it could be due to some changed timing characteristics related to MDEV-32050.

11.0 5b6134b040b59d6c896ec0a2c86029a0bcca788c

innodb.innodb-index-online '4k,clear,innodb' w19 [ fail ]
        Test ended at 2023-11-24 10:49:23
 
CURRENT_TEST: innodb.innodb-index-online
--- /mariadb/11/mysql-test/suite/innodb/r/innodb-index-online.result	2023-11-24 10:15:47.229132350 +0200
+++ /mariadb/11/mysql-test/suite/innodb/r/innodb-index-online.reject	2023-11-24 10:49:23.174794627 +0200
@@ -170,8 +170,8 @@
 ERROR 70100: Query execution was interrupted
 SELECT name, count FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE subsystem = 'ddl';
 name	count
-ddl_background_drop_indexes	0
-ddl_online_create_index	0
+ddl_background_drop_indexes	1
+ddl_online_create_index	1
 ddl_pending_alter_table	0
 ddl_sort_file_alter_table	0
 ddl_log_file_alter_table	0
 
mysqltest: Result content mismatch

The test scenario is the KILL of the following statement:

SET DEBUG_SYNC = 'row_log_apply_before SIGNAL c2d_created WAIT_FOR kill_done';
CREATE INDEX c2d ON t1(c2);

When this operation is rolled back, in row_merge_drop_indexes() we would normally have table->get_ref_count() == 1. In a failure case, a purge operation would intervene and open a table handle while row_merge_build_indexes() is blocked by the DEBUG_SYNC. This extra table handle would remain open across the rollback operation.

The following patch (which depends on MDEV-32050, where we started to ignore innodb_purge_rseg_truncate_frequency) will make the test stable for me. The first hunk makes the test ./mtr --repeat proof, for easier testing.

diff --git a/mysql-test/suite/innodb/t/innodb-index-online.test b/mysql-test/suite/innodb/t/innodb-index-online.test
index 3a38afa84b7..059c9e01a0c 100644
--- a/mysql-test/suite/innodb/t/innodb-index-online.test
+++ b/mysql-test/suite/innodb/t/innodb-index-online.test
@@ -4,6 +4,10 @@
 --source include/have_debug_sync.inc
 --source include/no_valgrind_without_big.inc
 
+SET GLOBAL innodb_monitor_reset_all=all;
+--disable_warnings
+SET GLOBAL innodb_monitor_reset_all=default;
+--enable_warnings
 let $innodb_metrics_select=
 SELECT name, count FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE subsystem = 'ddl';
 
@@ -154,6 +158,7 @@ let $ID= `SELECT @id := CONNECTION_ID()`;
 --error ER_QUERY_INTERRUPTED
 KILL QUERY @id;
 
+SET GLOBAL innodb_max_purge_lag_wait=0;
 SET DEBUG_SYNC = 'row_log_apply_before SIGNAL c2d_created WAIT_FOR kill_done';
 --send
 CREATE INDEX c2d ON t1(c2);



 Comments   
Comment by Marko Mäkelä [ 2023-11-24 ]

With the above fix, the following invocation completed successfully:

./mtr --repeat=100 --parallel=48 innodb.innodb-index-online{,,,}{,}

11.0 5b6134b040b59d6c896ec0a2c86029a0bcca788c

innodb.innodb-index-online '16k,crypt,innodb' w8 [ 100 pass ]   5817
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 27490.284 of 680 seconds executing testcases
 
Completed: All 4800 tests were successful.

I checked also the ‘sibling’ test that covers table-rebuilding online ALTER TABLE. It does not seem to require any action, because it last failed on 2021-10-06, and mostly on Microsoft Windows. Here is the latest non-Windows failure (one day before the last failure on Windows):

10.6 4ca56e8348c7bd20cb7544a607b730d6

innodb.innodb-table-online '4k,clear,innodb' w4 [ fail ]
        Test ended at 2021-10-05 08:57:46
 
CURRENT_TEST: innodb.innodb-table-online
--- /usr/share/mysql-test/suite/innodb/r/innodb-table-online.result	2021-10-05 07:22:49.000000000 +0000
+++ /dev/shm/var/4/log/innodb-table-online.reject	2021-10-05 08:57:46.150463032 +0000
@@ -212,7 +212,7 @@
 ddl_online_create_index	1
 ddl_pending_alter_table	1
 ddl_sort_file_alter_table	0
-ddl_log_file_alter_table	1
+ddl_log_file_alter_table	0
 BEGIN;
 DELETE FROM t1;
 ROLLBACK;
 
mysqltest: Result content mismatch

Comment by Marko Mäkelä [ 2023-11-24 ]

After making the test innodb.innodb-table-online work with ./mtr --repeat, I was able to reproduce a result difference for it as well:

11.0 5b6134b040b59d6c896ec0a2c86029a0bcca788c

innodb.innodb-table-online '16k,crypt,innodb' w6 [ 2 fail ]
        Test ended at 2023-11-24 12:54:43
 
CURRENT_TEST: innodb.innodb-table-online
--- /mariadb/11/mysql-test/suite/innodb/r/innodb-table-online,crypt.result~	2023-11-24 12:54:34.235128293 +0200
+++ /mariadb/11/mysql-test/suite/innodb/r/innodb-table-online,crypt.reject	2023-11-24 12:54:43.067237664 +0200
@@ -251,7 +251,7 @@
 @merge_encrypt_1>@merge_encrypt_0, @merge_decrypt_1>@merge_decrypt_0,
 @rowlog_encrypt_1>@rowlog_encrypt_0;
 sort_balance	@merge_encrypt_1>@merge_encrypt_0	@merge_decrypt_1>@merge_decrypt_0	@rowlog_encrypt_1>@rowlog_encrypt_0
-0	0	0	1
+0	0	0	0
 SET DEBUG_SYNC = 'now SIGNAL dml2_done';
 connection con1;
 ERROR HY000: Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again
 
mysqltest: Result content mismatch

This has something to do with the encryption of the log of concurrent DML operations. I will try for some time if I can reproduce this with rr, but it does not look promising.

Comment by Marko Mäkelä [ 2023-11-24 ]

I filed MDEV-32874 for the innodb.innodb-table-online,crypt result difference.

Generated at Thu Feb 08 10:34:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.