Details
Description
SET max_session_mem_used=50000; |
SET SESSION max_statement_time=1; |
--error ER_STATEMENT_TIMEOUT
|
SELECT SLEEP (3); |
INSTALL PLUGIN RocksDB SONAME 'ha_rocksdb.so'; |
CREATE TABLE t1 (i INT) ENGINE=RocksDB max_rows=100000000000; |
--error ER_BAD_FIELD_ERROR
|
DELETE FROM t1 WHERE c1='-838:59:59' AND c2='-838:59:59'; |
UNINSTALL PLUGIN RocksDB;
|
Leads to:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Debug) |
safe_mutex: Found wrong usage of mutex 'LOCK_plugin' and 'LOCK_thd_kill'
|
Mutex currently locked (in reverse order):
|
LOCK_thd_kill /test/11.4_dbg/sql/sql_class.h line 4703
|
LOCK_plugin /test/11.4_dbg/sql/sql_plugin.cc line 2477
|
Present in 10.4+, debug builds
Attachments
Issue Links
- is duplicated by
-
MDEV-32477 safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_plugin'
-
- Closed
-
- relates to
-
MDEV-32477 safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_plugin'
-
- Closed
-
Interestingly, this doesn't look specific to RocksDB, or even plugin loading. It's a general issue with the --max-session-mem-used implementation. But it is tricky to trigger the problem, requires fine-tuning the limit to hit just the right memory allocation, so we just got lucky with this RocksDB-using reproduction.
The problem is this code in my_malloc_size_cb_func():
if (size > 0 &&
thd->status_var.local_memory_used > (int64)thd->variables.max_mem_used &&
likely(!thd->killed) && !thd->get_stmt_da()->is_set())
{
/* Ensure we don't get called here again */
char buf[50], *buf2;
thd->set_killed(KILL_QUERY);
thd->set_killed() takes LOCK_thd_kill. That's quite dangerous, as my_malloc_size_cb_func() can be called from almost any context, including where LOCK_thd_kill might already be held or is otherwise not safe to lock.
But to trigger requires finding a memory allocation that's marked thread specific in a place where LOCK_thd_kill is not safe to lock, and then arranging for --max-session-mem-used to be set just so that this specific memory allocation triggers the limit enforcement. Tricky.
One idea for a fix is to use _trylock on LOCK_thd_kill, and skip enforcing the limit in the unlikely case that the lock cannot be obtained, which I think is acceptable.
But the problem may be even more general, as apparently something is elsewhere waiting on LOCK_plugin while holding LOCK_thd_kill.UPDATE: Well, this is just THD::awake() calling into each storage engine's kill_query(), that cannot really be avoided and should be ok. So I think the problem is waiting for LOCK_thd_kill in my_malloc_size_cb_func().Good catch on the test failure!