Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12070

Introduce thd_query_safe() from MySQL 5.7

Details

    Description

      MariaDB Server 10.2 lacks this race condition bug fix from Oracle:

      commit 565d20b44f24fcc855dc616164d87b03cfad10bc
      Author: Jon Olav Hauglid <jon.hauglid@oracle.com>
      Date:   Thu Mar 6 12:15:07 2014 +0100
       
          Bug#17606098: DEADLOCK INVOLVING SRV_MONITOR_THREAD AND LOCK_THD_DATA
          
          This deadlock involved LOCK_thd_data and InnoDB's trx_sys mutex.
          It could occur if InnoDB code (e.g. the srv_monitor_thread) called
          thd_security_context() while having trx_sys locked and at the same time
          the server tried to notify InnoDB about a deadlock (which involves calling
          mysql_lock_abort_for_thread() while holding LOCK_thd_data).
          This could lead to deadlock if mysql_lock_abort_for_thread() lead to
          trx_allocate_for_mysql() being called inside InnoDB.
          
          This very rarely happens - the problem was found with RQG and is very
          difficult to reproduce.
          
          This patch solves the problem by splitting LOCK_thd_data so that a
          separate new mutex LOCK_thd_query protects the query string
          accessed by thd_security_context().
          
          The patch also strengthens the protection of the query string by
          enforcing that it can only be set by the owner thread and that this
          requires locking of LOCK_thd_query. Reading the query string can
          be done by the owner thread without holding LOCK_thd_query but
          other threads reading the query string have to have LOCK_thd_query
          locked.
          
          This also solves a separate problem where other threads could
          read the query string while it was being deleted by the owner thread.
          
          Finally, the patch updates the performance schema mutex
          heuristics - including correcting an issue introduced by WL#6369.
      

      Only the InnoDB part of this patch was merged, with a mock-up version of innobase_get_stmt_safe().

      Attachments

        Issue Links

          Activity

            I feel like this patch does more than what is described in the comment. But I'd argue correctness of what was described in the comment. Number of after-push fixes scares me a bit.

            They also added another mutex lock on a hot path. Even though it is uncontested, I would do my best to avoid this for "micro" performance reasons.

            FWICS they implemented this in 5.7, but they don't use THR_LOCK's with InnoDB. We don't use them in 10.2 either.

            The question is: why does InnoDB have to allocate transaction for a thread that just wants to abort a lock in another thread?

            Or even more precise 5.7/10.2 variant of this question: why does InnoDB have to allocate transaction for a thread that just wants to perform noop in another thread?

            svoj Sergey Vojtovich added a comment - I feel like this patch does more than what is described in the comment. But I'd argue correctness of what was described in the comment. Number of after-push fixes scares me a bit. They also added another mutex lock on a hot path. Even though it is uncontested, I would do my best to avoid this for "micro" performance reasons. FWICS they implemented this in 5.7, but they don't use THR_LOCK's with InnoDB. We don't use them in 10.2 either. The question is: why does InnoDB have to allocate transaction for a thread that just wants to abort a lock in another thread? Or even more precise 5.7/10.2 variant of this question: why does InnoDB have to allocate transaction for a thread that just wants to perform noop in another thread?

            Also thd_query_safe() doesn't seem to relate to this described in a comment problem. So what are we actually looking for in this patch?

            svoj Sergey Vojtovich added a comment - Also thd_query_safe() doesn't seem to relate to this described in a comment problem. So what are we actually looking for in this patch?

            svoj, as far as I can tell, there are two separate issues:

            First, in mysql_lock_abort_for_thread(), ha_innobase::store_lock() is apparently unnecessarily starting a transaction in an error handling code path. The scenario should involve LOCK TABLES and INSERT. svoj, please file a separate bug for that and assign to me.

            My main concern is that SHOW ENGINE INNODB STATUS and certain InnoDB background threads can report the THD::query_string of any InnoDB thread, without any concurrency protection. This race condition was originally fixed in MySQL 4.0.20, which introduced the function innobase_mysql_prepare_print_arbitrary_thd(). This function was removed in MySQL 5.0.90, 5.1.42, 5.5.1-m2 (see also this follow-up in 5.1.43) without any replacement, reintroducing the race condition. This could lead to data races where freed memory is being accessed, or a string is being read and written concurrently, possibly missing the terminating NUL character. To my knowledge, this race condition was fixed later in MySQL, but I do not remember where exactly: it might be any of 5.5, 5.6, or 5.7. You should find more by reviewing the history of the code that is modified by the 5.7 patch that I mentioned.

            marko Marko Mäkelä added a comment - svoj , as far as I can tell, there are two separate issues: First, in mysql_lock_abort_for_thread(), ha_innobase::store_lock() is apparently unnecessarily starting a transaction in an error handling code path. The scenario should involve LOCK TABLES and INSERT. svoj , please file a separate bug for that and assign to me. My main concern is that SHOW ENGINE INNODB STATUS and certain InnoDB background threads can report the THD::query_string of any InnoDB thread, without any concurrency protection. This race condition was originally fixed in MySQL 4.0.20 , which introduced the function innobase_mysql_prepare_print_arbitrary_thd(). This function was removed in MySQL 5.0.90, 5.1.42, 5.5.1-m2 (see also this follow-up in 5.1.43 ) without any replacement, reintroducing the race condition. This could lead to data races where freed memory is being accessed, or a string is being read and written concurrently, possibly missing the terminating NUL character. To my knowledge, this race condition was fixed later in MySQL, but I do not remember where exactly: it might be any of 5.5, 5.6, or 5.7. You should find more by reviewing the history of the code that is modified by the 5.7 patch that I mentioned.

            serg, please review fix for this bug.

            svoj Sergey Vojtovich added a comment - serg , please review fix for this bug.

            ok to push

            serg Sergei Golubchik added a comment - ok to push

            People

              svoj Sergey Vojtovich
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.