[MDEV-24661] The test innodb.innodb_wl6326_big often times out Created: 2021-01-23  Updated: 2021-01-29  Resolved: 2021-01-29

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.5, 10.6
Fix Version/s: 10.5.9

Type: Bug Priority: Minor
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-16678 Use MDL for innodb background threads... Closed
relates to MDEV-23855 InnoDB log checkpointing causes regre... Closed

 Description   

The test innodb.innodb_wl6326_big often fails on 10.5 with a timeout like this:

10.5 bf1f9b59c7d0b619d8bf350b96436970c6edc118

CURRENT_TEST: innodb.innodb_wl6326_big
--- /mariadb/10.5m/mysql-test/suite/innodb/r/innodb_wl6326_big.result	2020-11-18 09:12:22.901175040 +0200
+++ /mariadb/10.5m/mysql-test/suite/innodb/r/innodb_wl6326_big.reject	2021-01-23 17:50:07.760902099 +0200
@@ -365,6 +365,22 @@
 COMMIT;
 SHOW ENGINE INNODB STATUS;
 #    pass
+Timeout in wait_condition.inc for SELECT COUNT(*) = @sx_count
+FROM information_schema.innodb_metrics t_after
+JOIN my_metrics t_before
+ON t_after.COUNT_RESET > t_before.COUNT_RESET AND t_after.NAME = t_before.NAME
+Id	User	Host	db	Command	Time	State	Info	Progress
+7	root	localhost	my_schema	Query	0	starting	show full processlist	0.000
+8	root	localhost	my_schema	Sleep	142		NULL	0.000
+9	root	localhost	my_schema	Sleep	142		NULL	0.000
+10	root	localhost	my_schema	Sleep	142		NULL	0.000
+11	root	localhost	my_schema	Sleep	142		NULL	0.000
+12	root	localhost	my_schema	Sleep	142		NULL	0.000
+13	root	localhost	my_schema	Sleep	142		NULL	0.000
+14	root	localhost	my_schema	Sleep	142		NULL	0.000
+15	root	localhost	my_schema	Sleep	142		NULL	0.000
+16	root	localhost	my_schema	Sleep	142		NULL	0.000
+17	root	localhost	my_schema	Sleep	142		NULL	0.000
 INSERT INTO my_metrics
 SELECT NAME, COUNT, MAX_COUNT, MIN_COUNT, AVG_COUNT,
 COUNT_RESET, MAX_COUNT_RESET, MIN_COUNT_RESET, AVG_COUNT_RESET,
@@ -379,71 +395,18 @@
 #       - One UPDATE statement modifying a huge slice of t1.
 #       Any of them causes heavy use of SX lock and therefore COUNT_RESET
 #       must have grown for ALL = @sx_count entries.
-#    pass
-# TC-09 Heavy activity after reset.
-#       COUNT_RESET = MAX_COUNT_RESET for ALL = @sx_count entries
-#       needs to stay valid though he counters will have grown.
-#    pass
-DELETE FROM my_metrics;
-INSERT INTO my_metrics
-SELECT NAME, COUNT, MAX_COUNT, MIN_COUNT, AVG_COUNT,
-COUNT_RESET, MAX_COUNT_RESET, MIN_COUNT_RESET, AVG_COUNT_RESET,
-TIME_ENABLED, TIME_DISABLED, TIME_ELAPSED, TIME_RESET,
-ENABLED, 'before'
-FROM information_schema.innodb_metrics
-WHERE NAME LIKE 'innodb_rwlock_sx_%';
-SET GLOBAL innodb_monitor_reset = "innodb_rwlock_sx_%";
-INSERT INTO my_metrics
-SELECT NAME, COUNT, MAX_COUNT, MIN_COUNT, AVG_COUNT,
-COUNT_RESET, MAX_COUNT_RESET, MIN_COUNT_RESET, AVG_COUNT_RESET,
-TIME_ENABLED, TIME_DISABLED, TIME_ELAPSED, TIME_RESET,
-ENABLED, 'after'
-FROM information_schema.innodb_metrics
-WHERE NAME LIKE 'innodb_rwlock_sx_%';
-# TC-08 There was a reset. COUNT_RESET = MAX_COUNT_RESET for ALL
-#       = @sx_count entries.
-#    pass
-# TC-17 We had heavy activity causing big counters and after that a reset.
-#       Reset causes COUNT > COUNT_RESET AND MAX_COUNT > MAX_COUNT_RESET
-#       for ALL @sx_count entries.
-#    pass
-# TC-18 We had some reset but this must not decrease COUNT or MAX_COUNT
-#       after.COUNT >= before.COUNT AND
-#       after.MAX_COUNT >= before.MAX_COUNT for ALL @sx_count entries.
-#    pass
-# TC-19 We had some reset after heavy activity and this must cause
-#       after.COUNT_RESET < before.COUNT_RESET
-#       AND after.MAX_COUNT_RESET < before.MAX_COUNT_RESET AND
-#       for ALL @sx_count entries.
-#    pass
-connection con10;
-disconnect con10;
-connection con9;
-disconnect con9;
-connection con8;
-disconnect con8;
-connection con7;
-disconnect con7;
-connection con6;
-disconnect con6;
-connection con5;
-disconnect con5;
-connection con4;
-disconnect con4;
-connection con3;
-disconnect con3;
-connection con2;
-disconnect con2;
-connection con1;
-disconnect con1;
-connection default;
-USE test;
-DROP SCHEMA my_schema;
-SET GLOBAL innodb_monitor_disable   = all;
-SET GLOBAL innodb_monitor_reset_all = all;
-SET GLOBAL innodb_monitor_enable    = default;
-SET GLOBAL innodb_monitor_disable   = default;
-SET GLOBAL innodb_monitor_reset     = default;
-SET GLOBAL innodb_monitor_reset_all = default;
-SET GLOBAL innodb_monitor_disable   = "innodb_rwlock_sx_%";
-SET GLOBAL innodb_monitor_reset     = "innodb_rwlock_sx_%";
+#    fail
+SELECT COUNT(*) <> @sx_count FROM my_metrics t_after JOIN my_metrics t_before
+ON t_after.COUNT_RESET > t_before.COUNT_RESET AND t_after.NAME = t_before.NAME
+WHERE t_after.phase = 'after' AND t_before.phase = 'before';
+COUNT(*) <> @sx_count
+1
+SELECT * FROM my_metrics
+ORDER BY NAME, phase;
+NAME	COUNT	MAX_COUNT	MIN_COUNT	AVG_COUNT	COUNT_RESET	MAX_COUNT_RESET	MIN_COUNT_RESET	AVG_COUNT_RESET	TIME_ENABLED	TIME_DISABLED	TIME_ELAPSED	TIME_RESET	ENABLED	phase
+innodb_rwlock_sx_os_waits	0	0	NULL	0	0	0	NULL	0	2021-01-23 17:46:33	NULL	214	2021-01-23 17:46:34	1after
+innodb_rwlock_sx_os_waits	0	0	NULL	0	0	0	NULL	NULL	2021-01-23 17:46:33	NULL	1	2021-01-23 17:46:34	1before
+innodb_rwlock_sx_spin_rounds	0	0	NULL	0	0	0	NULL	0	2021-01-23 17:46:33	NULL	214	2021-01-23 17:46:34	1after
+innodb_rwlock_sx_spin_rounds	0	0	NULL	0	0	0	NULL	NULL	2021-01-23 17:46:33	NULL	1	2021-01-23 17:46:34	1before
+innodb_rwlock_sx_spin_waits	0	0	NULL	0	0	0	NULL	0	2021-01-23 17:46:33	NULL	214	2021-01-23 17:46:34	1after
+innodb_rwlock_sx_spin_waits	0	0	NULL	0	0	0	NULL	NULL	2021-01-23 17:46:33	NULL	1	2021-01-23 17:46:34	1before
 
mysqltest: Result length mismatch

In 10.6, the test depends on instrumentation that was refactored or removed in MDEV-24142.



 Comments   
Comment by Matthias Leich [ 2021-01-28 ]

I vote for deleting the test because its

  • base got lost by changes regarding "innodb_metrics" in higher MariaDB versions
  • stability was more and more lost because of improvements of InnoDB etc.
Comment by Marko Mäkelä [ 2021-01-29 ]

Thank you, mleich. I have removed the test. I think that problems with page latching are best detected with stress testing and performance testing, not in a regression test that is too big to run in a normal CI system.

Generated at Thu Feb 08 09:31:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.