Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17541

KILL QUERY during lock wait in FOREIGN KEY check causes hang

    XMLWordPrintable

Details

    Description

      Bug #28068293 KILLING QUERY THAT IS WAITING FOR LOCK VIA FOREIGN KEY CAUSES MYSQL CRASH because a hard error code is being overwritten by the soft error DB_LOCK_WAIT.

      In MariaDB, the code is different because of MDEV-15219. We do not seem to be crashing, but is seems that trx->error_state = DB_INTERRUPTED is not being returned by row_ins_check_foreign_constraint(), like it should.

      Here is the signalling path from KILL QUERY that would cause lock_wait_suspend_thread() to abort the wait:

      #0  lock_wait_release_thread_if_suspended()
      #1  0x0000555556005f0d in lock_cancel_waiting_and_release()
      #2  0x0000555556006688 in lock_trx_handle_wait_low()
      #3  0x0000555556006709 in lock_trx_handle_wait()
      #4  0x0000555555f7c576 in innobase_kill_query()
      #5  0x0000555555d7a8ed in kill_handlerton()
      

      Here is a partial stack trace of the blocked thread:

      #6  0x000055555600d7a6 in lock_wait_suspend_thread()
      #7  0x000055555609b440 in row_ins_check_foreign_constraint()
      #8  0x000055555609b719 in row_ins_check_foreign_constraints()
      #9  0x000055555609e6f3 in row_ins_clust_index_entry()
      #10 0x000055555609ebbf in row_ins_index_entry()
      #11 0x000055555609f0c7 in row_ins_index_entry_step()
      #12 0x000055555609f485 in row_ins()
      #13 0x000055555609faa6 in row_ins_step()
      #14 0x00005555560bc8fc in row_insert_for_mysql()
      #15 0x0000555555f831c4 in ha_innobase::write_row()
      

      The return value from row_ins_check_foreign_constraint() would propagate up the call stack all the way to row_ins(), which does some special handling for some error codes related to foreign keys and duplicate keys, and ultimately to this code in row_ins_step():

      	err = row_ins(node, thr);
       
      error_handling:
      	trx->error_state = err;
       
      	if (err != DB_SUCCESS) {
      		/* err == DB_LOCK_WAIT or SQL error detected */
      		return(NULL);
      	}
      

      So, here we could be overwriting trx->error_state with the soft error DB_LOCK_WAIT. The caller row_insert_for_mysql() would then decide what to do:

      	row_ins_step(thr);
       
      	DEBUG_SYNC_C("ib_after_row_insert_step");
       
      	err = trx->error_state;
       
      	if (err != DB_SUCCESS) {
      error_exit:
      		que_thr_stop_for_mysql(thr);
       
      		/* FIXME: What's this ? */
      		thr->lock_state = QUE_THR_LOCK_ROW;
       
      		was_lock_wait = row_mysql_handle_errors(
      			&err, trx, thr, &savept);
      

      In case of DB_LOCK_WAIT, row_mysql_handle_errors() would invoke lock_wait_suspend_thread() again. It would return immediately due to thr->state == QUE_THR_RUNNING. Finally, we would invoke row_ins_step() again, and we would get stuck in lock_wait_suspend_thread() until the wait is terminated by the savior thread:

      #0  lock_wait_release_thread_if_suspended (thr=0x7fff84023c58)
          at /mariadb/10.2m/storage/innobase/lock/lock0wait.cc:444
      #1  0x0000555556005f0d in lock_cancel_waiting_and_release (
          lock=0x7ffff1eab388)
          at /mariadb/10.2m/storage/innobase/lock/lock0lock.cc:6753
      #2  0x000055555600ddbd in lock_wait_check_and_cancel (slot=0x555557960fc8)
          at /mariadb/10.2m/storage/innobase/lock/lock0wait.cc:512
      #3  0x000055555600df33 in lock_wait_timeout_thread ()
          at /mariadb/10.2m/storage/innobase/lock/lock0wait.cc:569
      

      Alas, we would again return DB_LOCK_WAIT, and again row_insert_for_mysql() would call row_ins_step() to retry the insert.

      The server would still be able to shut down. It looks like only way to end the infinite loop in the INSERT thread is to shut down the server.

      Here is the test case:

      --source include/have_innodb.inc
      --source include/count_sessions.inc
       
      CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
      CREATE TABLE t2 (a INT PRIMARY KEY, FOREIGN KEY (a) REFERENCES t1(a))
      ENGINE=InnoDB;
       
      connect (con1,localhost,root,,);
      INSERT INTO t1 SET a=1;
      BEGIN;
      DELETE FROM t1;
       
      connection default;
      let $ID= `SELECT @id := CONNECTION_ID()`;
      send INSERT INTO t2 SET a=1;
       
      connection con1;
      let $wait_condition=
        select count(*) = 1 from information_schema.processlist
        where state = 'update' and info = 'INSERT INTO t2 SET a=1';
      --source include/wait_condition.inc
      let $ignore= `SELECT @id := $ID`;
      kill query @id;
       
      connection default;
      --error ER_QUERY_INTERRUPTED
      reap;
      disconnect con1;
       
      DROP TABLE t2,t1;
       
      --source include/wait_until_count_sessions.inc
      

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.