Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36684

main.mdl_sync fails under valgrind (test for Bug#42643)

    XMLWordPrintable

Details

    Description

      Test failure:

                            main.mdl_sync                            w25 [ fail ]
              Test ended at 2025-04-21 01:50:53
       
      CURRENT_TEST: main.mdl_sync
      --- /home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/main/mdl_sync.result	2025-04-19 16:41:39.000000000 +0000
      +++ /home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/main/mdl_sync.reject	2025-04-21 01:50:52.064219508 +0000
      @@ -2655,6 +2655,13 @@
       SET debug_sync='now WAIT_FOR parked_flush';
       SET debug_sync='now SIGNAL go_truncate';
       # Ensure that truncate waits for a exclusive lock
      +Timeout in wait_condition.inc for SELECT COUNT(*)=1 FROM information_schema.processlist
      +WHERE state='Waiting for table metadata lock' AND info='TRUNCATE TABLE t1'
      +Id	User	Host	db	Command	Time	State	Info	Progress
      +4	root	localhost	test	Query	0	starting	show full processlist	0.000
      +34	root	localhost	test	Sleep	33		NULL	0.000
      +35	root	localhost	test	Sleep	33		NULL	0.000
      +36	root	localhost	test	Query	450	Waiting for table metadata lock	FLUSH TABLES t1	0.000
       SET debug_sync= 'now SIGNAL go_show';
       connection con1;
       # Reaping...
      @@ -2663,10 +2670,14 @@
       # Reaping...
       Field	Type	Null	Key	Default	Extra
       a	int(11)	YES		NULL	
      +Warnings:
      +Warning	1639	debug sync point wait timed out
       connection default;
       SET debug_sync= 'now SIGNAL go_flush';
       connection con3;
       # Reaping...
      +Warnings:
      +Warning	1639	debug sync point wait timed out
       disconnect con1;
       disconnect con2;
       disconnect con3;
       
      Result length mismatch
       
       - saving '/home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/var/25/log/main.mdl_sync/' to '/home/buildbot/amd64-fedora-40-valgrind/build/mysql-test/var/log/main.mdl_sync/'
      

      The problem here appears to be:
      1. TRUNCATE (after "now SIGNAL go_truncate") starts waiting for MDL_EXCLUSIVE lock on t1
      2. InnoDB purge thread chimes in, attempts taking MDL_SHARED lock on t1, fails, retries in a loop

      Given valgrind scheduling specifics, InnoDB purge thread occupies the whole CPU and never yields CPU to user connections, causing sync point timeout.

      Fix for this issue was proposed a while ago: https://github.com/MariaDB/server/commit/0c6c580137146492e234570df30d302cafd94131

      Test for Bug#42643 consistently failing without the fix (mtr --repeat=100 --valgrind --parallel=20). No failures were observed with the fix.

      Cross-reference.

      Attachments

        Issue Links

          Activity

            People

              svoj Sergey Vojtovich
              svoj Sergey Vojtovich
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.