Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18464

Port kill_one_trx fixes from 10.4 to 10.1

Details

    Description

      There following issues here:

      • Whenever Galera BF (brute force) transaction decides to abort conflicting transaction it will kill that thread using thd::awake()
      • Whenever replication selects a thread as a victim it will call thd::awake()
      • User KILL [QUERY|CONNECTION] ... for a thread it will also call thd::awake()

      Whenever one of these actions is executed we will hold number of InnoDB internal mutexes and thd mutexes.
      Sometimes these mutexes are taken in different order causing mutex deadlock (see one detailed case below).

      Refs

      Attachments

        Issue Links

          Activity

            jplindst Jan Lindström (Inactive) added a comment - - edited

            10.4 contains dirty read to trx->lock.was_chosen_as_wsrep_victim variable suspect to race

            lock_trx_handle_wait(
            /*=================*/
            	trx_t*	trx)	/*!< in/out: trx lock state */
            {
            #ifdef WITH_WSREP
            	/* We already own mutexes */
            	if (trx->lock.was_chosen_as_wsrep_victim) {
            		return lock_trx_handle_wait_low(trx);
            	}
            #endif /* WITH_WSREP */
            	lock_mutex_enter();
            	trx_mutex_enter(trx);
            	dberr_t err = lock_trx_handle_wait_low(trx);
            	lock_mutex_exit();
            	trx_mutex_exit(trx);
            	return err;
            }
            

            Note that LOCK thd_data protects from two concurrent KILLs.

            jplindst Jan Lindström (Inactive) added a comment - - edited 10.4 contains dirty read to trx->lock.was_chosen_as_wsrep_victim variable suspect to race lock_trx_handle_wait( /*=================*/ trx_t* trx) /*!< in/out: trx lock state */ { #ifdef WITH_WSREP /* We already own mutexes */ if (trx->lock.was_chosen_as_wsrep_victim) { return lock_trx_handle_wait_low(trx); } #endif /* WITH_WSREP */ lock_mutex_enter(); trx_mutex_enter(trx); dberr_t err = lock_trx_handle_wait_low(trx); lock_mutex_exit(); trx_mutex_exit(trx); return err; } Note that LOCK thd_data protects from two concurrent KILLs.

            https://github.com/MariaDB/server/commit/82c44f8298831ad167130741b6e48e7316ef8e47

            This should fix issue on 10.2/10.3. 10.4 fix will have same idea.

            jplindst Jan Lindström (Inactive) added a comment - https://github.com/MariaDB/server/commit/82c44f8298831ad167130741b6e48e7316ef8e47 This should fix issue on 10.2/10.3. 10.4 fix will have same idea.

            Given that there already was a commit in 10.1.39, 10.2.24, 10.3.14, 10.4.4, 10.5.0 that references this ticket, I think that to avoid further confusion, a separate ticket should be filed for the remaining fix and the original title of this ticket should be restored.

            Furthermore, given that 10.4 introduced Galera 4 and that there are significant differences around trx_sys between 10.2 and 10.3, I think that for a proper review, we would need thoroughly tested fixes for all of 10.2, 10.3, 10.4.

            marko Marko Mäkelä added a comment - Given that there already was a commit in 10.1.39, 10.2.24, 10.3.14, 10.4.4, 10.5.0 that references this ticket , I think that to avoid further confusion, a separate ticket should be filed for the remaining fix and the original title of this ticket should be restored. Furthermore, given that 10.4 introduced Galera 4 and that there are significant differences around trx_sys between 10.2 and 10.3, I think that for a proper review, we would need thoroughly tested fixes for all of 10.2, 10.3, 10.4.

            Found in disabled.def that it was disabled, also I'm adding the missing label.

            julien.fritsch Julien Fritsch added a comment - Found in disabled.def that it was disabled, also I'm adding the missing label.

            Found in disabled.def for ES that it was disabled, also I'm adding the missing label.

            julien.fritsch Julien Fritsch added a comment - Found in disabled.def for ES that it was disabled, also I'm adding the missing label.

            People

              jplindst Jan Lindström (Inactive)
              jplindst Jan Lindström (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.