[MDEV-6582] DEBUG_SYNC does not reset mysys_var->current_mutex, causes assertion "Trying to unlock mutex that wasn't locked" Created: 2014-08-14  Updated: 2014-12-01  Resolved: 2014-11-26

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0.13
Fix Version/s: 10.0.16

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-7069 Fix buildbot failures in main server ... Stalled

 Description   

https://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/5937

rpl_parallel.test asserts with the following stack trace:

safe_mutex: Trying to unlock mutex LOCK_wait_commit that wasn't locked at /var/lib/buildbot/maria-slave/work-opensuse-amd64/build/sql/sql_class.cc, line 1817
Last used at /var/lib/buildbot/maria-slave/work-opensuse-amd64/build/sql/log.cc, line: 6791
 
mysys/thr_mutex.c:424(safe_mutex_unlock)[0xe1c6f6]
sql/sql_class.cc:1817(THD::awake(killed_state))[0x5b88af]
psi/mysql_thread.h:756(inline_mysql_mutex_unlock)[0x5e59e9]
sql/sql_parse.cc:7453(sql_kill)[0x5e5a52]
sql/sql_parse.cc:4313(mysql_execute_command(THD*))[0x5e9203]
sql/sql_parse.cc:6411(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5ebf5a]
sql/sql_parse.cc:1309(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5ed96c]
sql/sql_parse.cc:1005(do_command(THD*))[0x5ef152]
sql/sql_connect.cc:1379(do_handle_one_connection(THD*))[0x6dd5ff]
sql/sql_connect.cc:1295(handle_one_connection)[0x6dd725]

I think I have seen similar failures before. I guess it happens because
mysys_var->current_mutex changes while THD::awake() is trying to kill a
thread. IIRC, this is related to incorrect use of ENTER_COND and EXIT_COND
(eg. missing exit or something like that).

I also think that in THD::awake, we should take a local copy of
mysys_var->current_mutex, we really don't want to reload the value when we
unlock the mutex.

But we could add a debug assertion that the value does not change, to try and
catch more of these problems (incorrect usage of ENTER_COND/EXIT_COND).



 Comments   
Comment by Kristian Nielsen [ 2014-11-26 ]

I found another instance of this:

http://buildbot.askmonty.org/buildbot/builders/kvm-dgcov-jaunty-i386/builds/4408

Comment by Kristian Nielsen [ 2014-11-26 ]

Another way this bug can occur is if DEBUG_SYNC is used inside ENTER_COND / EXIT_COND. Because DEBUG_SYNC changes current_mutex, which can race to give the problem seen here.

Comment by Kristian Nielsen [ 2014-11-26 ]

I think I found it.

I managed to reproduce the bug reliably by inserting these sleeps in the code:

=== modified file 'sql/log.cc'
--- sql/log.cc	2014-11-17 07:53:42 +0000
+++ sql/log.cc	2014-11-26 08:59:16 +0000
@@ -6833,6 +6833,7 @@ MYSQL_BIN_LOG::queue_for_group_commit(gr
       */
       wfc->opaque_pointer= orig_entry;
       DEBUG_SYNC(orig_entry->thd, "group_commit_waiting_for_prior");
+my_sleep(100000);
       orig_entry->thd->ENTER_COND(&wfc->COND_wait_commit,
                                   &wfc->LOCK_wait_commit,
                                   &stage_waiting_for_prior_transaction_to_commit,
 
=== modified file 'sql/sql_class.cc'
--- sql/sql_class.cc	2014-11-13 10:01:31 +0000
+++ sql/sql_class.cc	2014-11-26 08:59:05 +0000
@@ -1810,6 +1810,7 @@ void THD::awake(killed_state state_to_se
       for (i= 0; i < WAIT_FOR_KILL_TRY_TIMES * SECONDS_TO_WAIT_FOR_KILL; i++)
       {
         int ret= mysql_mutex_trylock(mysys_var->current_mutex);
+my_sleep(200000);
         mysql_cond_broadcast(mysys_var->current_cond);
         if (!ret)
         {
 

I think the problem is in debug_sync_execute():

      if (thd->mysys_var)
      {
        old_mutex= thd->mysys_var->current_mutex;
        thd->mysys_var->current_mutex= &debug_sync_global.ds_mutex;
...
      if (old_mutex)
        thd->mysys_var->current_mutex= old_mutex;

Note, that if old_mutex is NULL, then debug_sync_execute() will leave a
dangling non-NULL value in thd->mysys_var->current_mutex. This allows a race,
where first THD::awake() fetches and locks the dangling mutex, then ENTER_COND
changes current_mutex to a new value, then THD::awake() reloads the new value
and unlocks a different mutex from the one it locked (the dangling one).

Seems clear enough now.

Comment by Kristian Nielsen [ 2014-11-26 ]

http://lists.askmonty.org/pipermail/commits/2014-November/007071.html

Generated at Thu Feb 08 07:13:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.