[MDEV-36684] main.mdl_sync fails under valgrind (test for Bug#42643) - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.11, 11.4, 11.8, 12.0(EOL)
Fix Version/s: 10.6.22, 10.11.12, 11.4.6, 11.8.2, 12.0.1
Component/s: Storage Engine - InnoDB, Tests, MTR
Labels:
None

Description

Test failure:

                      main.mdl_sync                            w25 [ fail ]

        Test ended at 2025-04-21 01:50:53

CURRENT_TEST: main.mdl_sync

--- /home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/main/mdl_sync.result	2025-04-19 16:41:39.000000000 +0000

+++ /home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/main/mdl_sync.reject	2025-04-21 01:50:52.064219508 +0000

@@ -2655,6 +2655,13 @@

 SET debug_sync='now WAIT_FOR parked_flush';

 SET debug_sync='now SIGNAL go_truncate';

 # Ensure that truncate waits for a exclusive lock

+Timeout in wait_condition.inc for SELECT COUNT(*)=1 FROM information_schema.processlist

+WHERE state='Waiting for table metadata lock' AND info='TRUNCATE TABLE t1'

+Id	User	Host	db	Command	Time	State	Info	Progress

+4	root	localhost	test	Query	0	starting	show full processlist	0.000

+34	root	localhost	test	Sleep	33		NULL	0.000

+35	root	localhost	test	Sleep	33		NULL	0.000

+36	root	localhost	test	Query	450	Waiting for table metadata lock	FLUSH TABLES t1	0.000

 SET debug_sync= 'now SIGNAL go_show';

 connection con1;

 # Reaping...

@@ -2663,10 +2670,14 @@

 # Reaping...

 Field	Type	Null	Key	Default	Extra

 a	int(11)	YES		NULL

+Warnings:

+Warning	1639	debug sync point wait timed out

 connection default;

 SET debug_sync= 'now SIGNAL go_flush';

 connection con3;

 # Reaping...

+Warnings:

+Warning	1639	debug sync point wait timed out

 disconnect con1;

 disconnect con2;

 disconnect con3;

Result length mismatch

 - saving '/home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/var/25/log/main.mdl_sync/' to '/home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/var/log/main.mdl_sync/'

The problem here appears to be:
1. TRUNCATE (after "now SIGNAL go_truncate") starts waiting for MDL_EXCLUSIVE lock on t1
2. InnoDB purge thread chimes in, attempts taking MDL_SHARED lock on t1, fails, retries in a loop

Given valgrind scheduling specifics, InnoDB purge thread occupies the whole CPU and never yields CPU to user connections, causing sync point timeout.

Fix for this issue was proposed a while ago: https://github.com/MariaDB/server/commit/0c6c580137146492e234570df30d302cafd94131

Test for Bug#42643 consistently failing without the fix (mtr --repeat=100 --valgrind --parallel=20). No failures were observed with the fix.

Cross-reference.

Attachments

Issue Links

is part of

MDEV-36647 No red leaves in the forest

Open

Activity

People

Assignee:: Sergey Vojtovich

Reporter:: Sergey Vojtovich

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2025-04-24 10:32

Updated:: 2025-04-29 15:05

Resolved:: 2025-04-29 15:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.