Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33443

Unsafe use of LOCK_thd_kill in my_malloc_size_cb_func()

Details

    Description

      SET max_session_mem_used=50000;
      SET SESSION max_statement_time=1;
      --error ER_STATEMENT_TIMEOUT
      SELECT SLEEP (3);
      INSTALL PLUGIN RocksDB SONAME 'ha_rocksdb.so';
      CREATE TABLE t1 (i INT) ENGINE=RocksDB max_rows=100000000000;
      --error ER_BAD_FIELD_ERROR
      DELETE FROM t1 WHERE c1='-838:59:59' AND c2='-838:59:59';
      UNINSTALL PLUGIN RocksDB;
      

      Leads to:

      11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Debug)

      safe_mutex: Found wrong usage of mutex 'LOCK_plugin' and 'LOCK_thd_kill'
      Mutex currently locked (in reverse order):
      LOCK_thd_kill                     /test/11.4_dbg/sql/sql_class.h  line 4703
      LOCK_plugin                       /test/11.4_dbg/sql/sql_plugin.cc  line 2477
      

      Present in 10.4+, debug builds

      Attachments

        Issue Links

          Activity

            Roel Roel Van de Paar created issue -
            Roel Roel Van de Paar made changes -
            Field Original Value New Value
            Roel Roel Van de Paar made changes -
            Labels mutex
            Roel Roel Van de Paar made changes -
            Description Use CLI to replay testcase
            {code:sql}
            SET max_session_mem_used=50000;
            SET SESSION max_statement_time=1;
            SELECT SLEEP (3);
            INSTALL PLUGIN RocksDB SONAME 'ha_rocksdb.so';
            CREATE TABLE t1 (i INT) ENGINE=RocksDB max_rows=100000000000;
            DELETE FROM t1 WHERE c1='-838:59:59' AND c2='-838:59:59';
            UNINSTALL PLUGIN RocksDB;
            {code}
            Leads to:
            {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Debug)}
            safe_mutex: Found wrong usage of mutex 'LOCK_plugin' and 'LOCK_thd_kill'
            Mutex currently locked (in reverse order):
            LOCK_thd_kill /test/11.4_dbg/sql/sql_class.h line 4703
            LOCK_plugin /test/11.4_dbg/sql/sql_plugin.cc line 2477
            {noformat}
            {code:sql}
            SET max_session_mem_used=50000;
            SET SESSION max_statement_time=1;
            --error ER_STATEMENT_TIMEOUT
            SELECT SLEEP (3);
            INSTALL PLUGIN RocksDB SONAME 'ha_rocksdb.so';
            CREATE TABLE t1 (i INT) ENGINE=RocksDB max_rows=100000000000;
            --error ER_BAD_FIELD_ERROR
            DELETE FROM t1 WHERE c1='-838:59:59' AND c2='-838:59:59';
            UNINSTALL PLUGIN RocksDB;
            {code}
            Leads to:
            {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Debug)}
            safe_mutex: Found wrong usage of mutex 'LOCK_plugin' and 'LOCK_thd_kill'
            Mutex currently locked (in reverse order):
            LOCK_thd_kill /test/11.4_dbg/sql/sql_class.h line 4703
            LOCK_plugin /test/11.4_dbg/sql/sql_plugin.cc line 2477
            {noformat}
            Roel Roel Van de Paar made changes -
            Description {code:sql}
            SET max_session_mem_used=50000;
            SET SESSION max_statement_time=1;
            --error ER_STATEMENT_TIMEOUT
            SELECT SLEEP (3);
            INSTALL PLUGIN RocksDB SONAME 'ha_rocksdb.so';
            CREATE TABLE t1 (i INT) ENGINE=RocksDB max_rows=100000000000;
            --error ER_BAD_FIELD_ERROR
            DELETE FROM t1 WHERE c1='-838:59:59' AND c2='-838:59:59';
            UNINSTALL PLUGIN RocksDB;
            {code}
            Leads to:
            {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Debug)}
            safe_mutex: Found wrong usage of mutex 'LOCK_plugin' and 'LOCK_thd_kill'
            Mutex currently locked (in reverse order):
            LOCK_thd_kill /test/11.4_dbg/sql/sql_class.h line 4703
            LOCK_plugin /test/11.4_dbg/sql/sql_plugin.cc line 2477
            {noformat}
            {code:sql}
            SET max_session_mem_used=50000;
            SET SESSION max_statement_time=1;
            --error ER_STATEMENT_TIMEOUT
            SELECT SLEEP (3);
            INSTALL PLUGIN RocksDB SONAME 'ha_rocksdb.so';
            CREATE TABLE t1 (i INT) ENGINE=RocksDB max_rows=100000000000;
            --error ER_BAD_FIELD_ERROR
            DELETE FROM t1 WHERE c1='-838:59:59' AND c2='-838:59:59';
            UNINSTALL PLUGIN RocksDB;
            {code}
            Leads to:
            {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Debug)}
            safe_mutex: Found wrong usage of mutex 'LOCK_plugin' and 'LOCK_thd_kill'
            Mutex currently locked (in reverse order):
            LOCK_thd_kill /test/11.4_dbg/sql/sql_class.h line 4703
            LOCK_plugin /test/11.4_dbg/sql/sql_plugin.cc line 2477
            {noformat}
            Present in 10.4+, debug builds
            knielsen Kristian Nielsen added a comment - - edited

            Interestingly, this doesn't look specific to RocksDB, or even plugin loading. It's a general issue with the --max-session-mem-used implementation. But it is tricky to trigger the problem, requires fine-tuning the limit to hit just the right memory allocation, so we just got lucky with this RocksDB-using reproduction.

            The problem is this code in my_malloc_size_cb_func():

                if (size > 0 &&
                    thd->status_var.local_memory_used > (int64)thd->variables.max_mem_used &&
                    likely(!thd->killed) && !thd->get_stmt_da()->is_set())
                {
                  /* Ensure we don't get called here again */
                  char buf[50], *buf2;
                  thd->set_killed(KILL_QUERY);
            

            thd->set_killed() takes LOCK_thd_kill. That's quite dangerous, as my_malloc_size_cb_func() can be called from almost any context, including where LOCK_thd_kill might already be held or is otherwise not safe to lock.

            But to trigger requires finding a memory allocation that's marked thread specific in a place where LOCK_thd_kill is not safe to lock, and then arranging for --max-session-mem-used to be set just so that this specific memory allocation triggers the limit enforcement. Tricky.

            One idea for a fix is to use _trylock on LOCK_thd_kill, and skip enforcing the limit in the unlikely case that the lock cannot be obtained, which I think is acceptable.

            But the problem may be even more general, as apparently something is elsewhere waiting on LOCK_plugin while holding LOCK_thd_kill. UPDATE: Well, this is just THD::awake() calling into each storage engine's kill_query(), that cannot really be avoided and should be ok. So I think the problem is waiting for LOCK_thd_kill in my_malloc_size_cb_func().

            Good catch on the test failure!

            knielsen Kristian Nielsen added a comment - - edited Interestingly, this doesn't look specific to RocksDB, or even plugin loading. It's a general issue with the --max-session-mem-used implementation. But it is tricky to trigger the problem, requires fine-tuning the limit to hit just the right memory allocation, so we just got lucky with this RocksDB-using reproduction. The problem is this code in my_malloc_size_cb_func(): if (size > 0 && thd->status_var.local_memory_used > (int64)thd->variables.max_mem_used && likely(!thd->killed) && !thd->get_stmt_da()->is_set()) { /* Ensure we don't get called here again */ char buf[50], *buf2; thd->set_killed(KILL_QUERY); thd->set_killed() takes LOCK_thd_kill. That's quite dangerous, as my_malloc_size_cb_func() can be called from almost any context, including where LOCK_thd_kill might already be held or is otherwise not safe to lock. But to trigger requires finding a memory allocation that's marked thread specific in a place where LOCK_thd_kill is not safe to lock, and then arranging for --max-session-mem-used to be set just so that this specific memory allocation triggers the limit enforcement. Tricky. One idea for a fix is to use _trylock on LOCK_thd_kill, and skip enforcing the limit in the unlikely case that the lock cannot be obtained, which I think is acceptable. But the problem may be even more general, as apparently something is elsewhere waiting on LOCK_plugin while holding LOCK_thd_kill. UPDATE: Well, this is just THD::awake() calling into each storage engine's kill_query(), that cannot really be avoided and should be ok. So I think the problem is waiting for LOCK_thd_kill in my_malloc_size_cb_func(). Good catch on the test failure!
            knielsen Kristian Nielsen made changes -
            Summary safe_mutex: Found wrong usage of mutex LOCK_plugin and LOCK_thd_kill Unsafe use of LOCK_thd_kill in my_malloc_size_cb_func()
            knielsen Kristian Nielsen made changes -
            Assignee Sergei Petrunia [ psergey ] Kristian Nielsen [ knielsen ]
            knielsen Kristian Nielsen made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            knielsen Kristian Nielsen added a comment - https://lists.mariadb.org/hyperkitty/list/commits@lists.mariadb.org/thread/AH5FMP5TVKPYFRP6LRQUYZV6HN3C7E3B/
            Roel Roel Van de Paar added a comment - - edited

            knielsen Thank you, very helpful and kind of you to fix.

            Is MDEV-32477 a similar issue (though this one is on LOCK_thd_data/LOCK_plugin)?

            Roel Roel Van de Paar added a comment - - edited knielsen Thank you, very helpful and kind of you to fix. Is MDEV-32477 a similar issue (though this one is on LOCK_thd_data/LOCK_plugin)?

            Yes, MDEV-32477 is the same issue, is fixed by the patch as well.

            knielsen Kristian Nielsen added a comment - Yes, MDEV-32477 is the same issue, is fixed by the patch as well.

            Pushed to 10.4.

            knielsen Kristian Nielsen added a comment - Pushed to 10.4.
            knielsen Kristian Nielsen made changes -
            Fix Version/s 10.4.34 [ 29625 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.5.25 [ 29626 ]
            Fix Version/s 10.6.18 [ 29627 ]
            Fix Version/s 10.11.8 [ 29630 ]
            Fix Version/s 11.0.6 [ 29628 ]
            Fix Version/s 11.1.5 [ 29629 ]
            Fix Version/s 11.2.4 [ 29631 ]
            Fix Version/s 11.3.3 [ 29632 ]
            knielsen Kristian Nielsen made changes -
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 11.4.2 [ 29633 ]
            Fix Version/s 11.3.3 [ 29632 ]

            People

              knielsen Kristian Nielsen
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.