[MDEV-360] safe_mutex: Trying to destroy a mutex keycache->cache_lock that was locked Created: 2012-06-20  Updated: 2012-06-24  Resolved: 2012-06-24

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.5.27

Type: Bug Priority: Major
Reporter: Vladislav Vaintroub Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: None


 Description   

RQG test crashes with assetion in the safemutex code

http://buildbot.askmonty.org/buildbot/builders/rqg-perpush-bugfix-tests/builds/25/steps/rqg_bugfix_tests/logs/stdio

the crash callstack points to "repartition_key_cache" function.

mysys/thr_mutex.c:608(safe_mutex_destroy)[0xc39f1d]
psi/mysql_thread.h:597(inline_mysql_mutex_destroy)[0xc1071e]
mysys/mf_keycache.c:1002(end_simple_key_cache)[0xc1066d]
mysys/mf_keycache.c:5342(end_partitioned_key_cache)[0xc17814]
mysys/mf_keycache.c:6109(end_key_cache_internal)[0xc184ae]
mysys/mf_keycache.c:6476(repartition_key_cache_internal)[0xc188ea]
mysys/mf_keycache.c:6527(repartition_key_cache)[0xc1898b]



 Comments   
Comment by Elena Stepanova [ 2012-06-20 ]

FYI, the RQG test in question was added as a regression test for LP:1008293. It runs the same 2 grammars that were provided in the bug report.

Comment by Igor Babaev [ 2012-06-21 ]

Elena,
It's not clear from the above where the test was added.
The original fix for LP:1008293 was pushed into 5.2. I don't see any failures in 5.2.

Comment by Elena Stepanova [ 2012-06-21 ]

Igor,
The test was added to 5.2, 5.3 and 5.5. It passed on 5.2 and 5.3 after the fix was pushed/merged in the corresponding tree, but failed on 5.5 with the failure Wlad mentioned above (safe_mutex: Trying to destroy a mutex keycache->cache_lock) – it's different from the initial crash.

Please note however that the new failure is sporadic, so unless you can guess a source of it by just looking at the stack trace, you'll probably want to assign it to me and wait till I come up with a test case for it (which might take time because from my previous experience, these destroying mutex race conditions might be not easy to catch).

Comment by Igor Babaev [ 2012-06-21 ]

I would prefer to have a test case to start working on this bug.

Comment by Elena Stepanova [ 2012-06-22 ]

Igor,

Please try the MTR test case below. It crashes on two machines out of 3 that i tried (the 3rd is a slow 32-bit box, not sure whether it's slowness or the bits that stop it from crashing).
Please run the test with --repeat=100. (It usually fails for me in the first 10 repetitions)

  1. MTR test case

CREATE TABLE t1 (a INT, b DATE, KEY(a), KEY(b)) ENGINE=MyISAM;
INSERT INTO t1 VALUES (8, '2008-10-02');
--send SET GLOBAL key_cache_segments = 1
--connect (con8,127.0.0.1,root,,test)
SET GLOBAL keycache1.key_buffer_size = 1024*1024;
--send CACHE INDEX t1 IN keycache1
--connection default
--reap
SET GLOBAL key_cache_segments = 7;
--connection con8
--reap

  1. End of MTR test case
  1. If it does not work, please try to use the following RQG grammar
  2. (it's one of the grammars from lp:1008293).
  3. cat 3.yy

query_init:
SET GLOBAL keycache1.key_buffer_size = 1024*1024;

thread1:
SET GLOBAL key_cache_segments = _digit;

query:
CACHE INDEX _table IN keycache1;

  1. end of RQG grammar 3.yy
  1. Run it as

perl runall.pl \
--no-mask \
--queries=100M \
--duration=300 \
--threads=2 \
--engine=MyISAM \
--grammar=3.yy \
--basedir=<your basedir> --vardir=<your vardir>

  1. Or, on an already started server, as

perl gentest.pl \
--gendata= \
--engine=MyISAM \
--threads=2 \
--queries=100M \
--duration=300 \
--grammar=3.yy \
--dsn=dbi:mysql:host=127.0.0.1:port=19300:user=root:database=test

(replace 19300 with your port).

Again, normally it fails within seconds after start, but sometimes it does not.

If neither of this works for you, please let me know.

Comment by Elena Stepanova [ 2012-06-23 ]

Algrorithm to start the MTR test above:

  • copy the test case into t/t1.test
  • run
    perl ./mtr --repeat=100 t1
Comment by Igor Babaev [ 2012-06-24 ]

The fix was applied to 5.2, them merged into 5.3 and 5.5.
The problem was not observed anymore.

Generated at Thu Feb 08 06:28:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.