[MDEV-33443] Unsafe use of LOCK_thd_kill in my_malloc_size_cb_func() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.4(EOL), 10.5, 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4
Fix Version/s: 10.4.34, 10.5.25, 10.11.8, 10.6.18, 11.0.6, 11.1.5, 11.2.4, 11.4.2
Component/s: Plugins, Storage Engine - RocksDB
Labels:
- mutex

Description

SET max_session_mem_used=50000;

SET SESSION max_statement_time=1;

--error ER_STATEMENT_TIMEOUT

SELECT SLEEP (3);

INSTALL PLUGIN RocksDB SONAME 'ha_rocksdb.so';

CREATE TABLE t1 (i INT) ENGINE=RocksDB max_rows=100000000000;

--error ER_BAD_FIELD_ERROR

DELETE FROM t1 WHERE c1='-838:59:59' AND c2='-838:59:59';

UNINSTALL PLUGIN RocksDB;

Leads to:

11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Debug)
safe_mutex: Found wrong usage of mutex 'LOCK_plugin' and 'LOCK_thd_kill'
Mutex currently locked (in reverse order):
LOCK_thd_kill /test/11.4_dbg/sql/sql_class.h line 4703
LOCK_plugin /test/11.4_dbg/sql/sql_plugin.cc line 2477

Present in 10.4+, debug builds

Attachments

Issue Links

is duplicated by

MDEV-32477 safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_plugin'

Closed

relates to

MDEV-32477 safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_plugin'

Closed

Activity

Ascending order - Click to sort in descending order

Kristian Nielsen added a comment - 2024-02-12 10:06 - edited

Interestingly, this doesn't look specific to RocksDB, or even plugin loading. It's a general issue with the --max-session-mem-used implementation. But it is tricky to trigger the problem, requires fine-tuning the limit to hit just the right memory allocation, so we just got lucky with this RocksDB-using reproduction.

The problem is this code in my_malloc_size_cb_func():

    if (size > 0 &&

        thd->status_var.local_memory_used > (int64)thd->variables.max_mem_used &&

        likely(!thd->killed) && !thd->get_stmt_da()->is_set())

      /* Ensure we don't get called here again */

      char buf[50], *buf2;

      thd->set_killed(KILL_QUERY);

thd->set_killed() takes LOCK_thd_kill. That's quite dangerous, as my_malloc_size_cb_func() can be called from almost any context, including where LOCK_thd_kill might already be held or is otherwise not safe to lock.

But to trigger requires finding a memory allocation that's marked thread specific in a place where LOCK_thd_kill is not safe to lock, and then arranging for --max-session-mem-used to be set just so that this specific memory allocation triggers the limit enforcement. Tricky.

One idea for a fix is to use _trylock on LOCK_thd_kill, and skip enforcing the limit in the unlikely case that the lock cannot be obtained, which I think is acceptable.

~~But the problem may be even more general, as apparently something is elsewhere waiting on LOCK_plugin while holding LOCK_thd_kill.~~ UPDATE: Well, this is just THD::awake() calling into each storage engine's kill_query(), that cannot really be avoided and should be ok. So I think the problem is waiting for LOCK_thd_kill in my_malloc_size_cb_func().

Good catch on the test failure!

Kristian Nielsen added a comment - 2024-02-12 10:06 - edited Interestingly, this doesn't look specific to RocksDB, or even plugin loading. It's a general issue with the --max-session-mem-used implementation. But it is tricky to trigger the problem, requires fine-tuning the limit to hit just the right memory allocation, so we just got lucky with this RocksDB-using reproduction. The problem is this code in my_malloc_size_cb_func(): if (size > 0 && thd->status_var.local_memory_used > (int64)thd->variables.max_mem_used && likely(!thd->killed) && !thd->get_stmt_da()->is_set()) { /* Ensure we don't get called here again */ char buf[50], *buf2; thd->set_killed(KILL_QUERY); thd->set_killed() takes LOCK_thd_kill. That's quite dangerous, as my_malloc_size_cb_func() can be called from almost any context, including where LOCK_thd_kill might already be held or is otherwise not safe to lock. But to trigger requires finding a memory allocation that's marked thread specific in a place where LOCK_thd_kill is not safe to lock, and then arranging for --max-session-mem-used to be set just so that this specific memory allocation triggers the limit enforcement. Tricky. One idea for a fix is to use _trylock on LOCK_thd_kill, and skip enforcing the limit in the unlikely case that the lock cannot be obtained, which I think is acceptable. But the problem may be even more general, as apparently something is elsewhere waiting on LOCK_plugin while holding LOCK_thd_kill. UPDATE: Well, this is just THD::awake() calling into each storage engine's kill_query(), that cannot really be avoided and should be ok. So I think the problem is waiting for LOCK_thd_kill in my_malloc_size_cb_func(). Good catch on the test failure!

Kristian Nielsen added a comment - 2024-02-12 11:03

https://lists.mariadb.org/hyperkitty/list/commits@lists.mariadb.org/thread/AH5FMP5TVKPYFRP6LRQUYZV6HN3C7E3B/

Kristian Nielsen added a comment - 2024-02-12 11:03 https://lists.mariadb.org/hyperkitty/list/commits@lists.mariadb.org/thread/AH5FMP5TVKPYFRP6LRQUYZV6HN3C7E3B/

Roel Van de Paar added a comment - 2024-02-12 20:13 - edited

knielsen Thank you, very helpful and kind of you to fix.

Is ~~MDEV-32477~~ a similar issue (though this one is on LOCK_thd_data/LOCK_plugin)?

Roel Van de Paar added a comment - 2024-02-12 20:13 - edited knielsen Thank you, very helpful and kind of you to fix. Is MDEV-32477 a similar issue (though this one is on LOCK_thd_data/LOCK_plugin)?

Kristian Nielsen added a comment - 2024-02-13 13:39

Yes, ~~MDEV-32477~~ is the same issue, is fixed by the patch as well.

Kristian Nielsen added a comment - 2024-02-13 13:39 Yes, MDEV-32477 is the same issue, is fixed by the patch as well.

Kristian Nielsen added a comment - 2024-02-16 13:55

Pushed to 10.4.

Kristian Nielsen added a comment - 2024-02-16 13:55 Pushed to 10.4.

People

Assignee:: Kristian Nielsen

Reporter:: Roel Van de Paar

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2024-02-12 02:23

Updated:: 2024-02-20 09:34

Resolved:: 2024-02-16 13:55

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server