Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22529

thd_query_safe() isn’t, causing InnoDB to hang

    XMLWordPrintable

    Details

      Description

      The function thd_query_safe() is used in the implementation of the following INFORMATION_SCHEMA views:

      • information_schema.innodb_trx
      • information_schema.innodb_locks
      • information_schema.innodb_lock_waits
      • information_schema.rocksdb_trx

      The implementation of the InnoDB views is in trx_i_s_common_fill_table(). This function invokes trx_i_s_possibly_fetch_data_into_cache(), which will acquire lock_sys->mutex and trx_sys->mutex in order to protect the set of active transactions and explicit locks. While holding those mutexes, it will traverse the collection of InnoDB transactions. For each transaction, thd_query_safe() will be invoked. That function is acquiring THD::LOCK_thd_data while holding the InnoDB locks. This is violates the latching order and causes a deadlock with THD::awake() (such as executing KILL QUERY):

      mariadb-10.2.27

      Thread 34 (Thread 0x7fba01f27700 (LWP 15746)):
      #0  __lll_lock_wait (futex=futex@entry=0x7fb9b00024e8, private=0) at lowlevellock.c:52
      #1  0x00007fbae78f3a25 in __GI___pthread_mutex_lock (mutex=0x7fb9b00024e8) at ../nptl/pthread_mutex_lock.c:135
      #2  0x000055efc101b5c8 in inline_mysql_mutex_lock (that=0x7fb9b00024e8, src_file=<optimized out>, src_line=4536) at /mariadb/10.3o/include/mysql/psi/mysql_thread.h:683
      #3  thd_query_safe (thd=0x7fb9b0000c48, buf=0x7fba01f24480 "\350\026\001\240\271\177", buflen=1025) at /mariadb/10.3o/sql/sql_class.cc:4536
      #4  0x000055efc14625db in fill_trx_row (row=0x7fb9a00248e0, trx=0x7fbae44c4090, requested_lock_row=0x0, cache=0x55efc1d89988 <trx_i_s_cache_static>) at /mariadb/10.3o/storage/innobase/trx/trx0i_s.cc:488
      #5  fetch_data_into_cache_low (cache=0x55efc1d89988 <trx_i_s_cache_static>, read_write=<optimized out>, trx_list=0x55efc712e460) at /mariadb/10.3o/storage/innobase/trx/trx0i_s.cc:1287
      #6  0x000055efc1461903 in fetch_data_into_cache (cache=0x55efc1d89988 <trx_i_s_cache_static>) at /mariadb/10.3o/storage/innobase/trx/trx0i_s.cc:1313
      #7  trx_i_s_possibly_fetch_data_into_cache (cache=0x55efc1d89988 <trx_i_s_cache_static>) at /mariadb/10.3o/storage/innobase/trx/trx0i_s.cc:1341
      #8  0x000055efc1389a41 in trx_i_s_common_fill_table (thd=0x7fb9a0000c48, tables=0x7fb9a000f500) at /mariadb/10.3o/storage/innobase/handler/i_s.cc:1225
      #9  0x000055efc10b050e in get_schema_tables_result (join=0x7fb9a00116e8, executed_place=PROCESSED_BY_JOIN_EXEC) at /mariadb/10.3o/sql/sql_show.cc:8431
      #10 0x000055efc107f835 in JOIN::exec_inner (this=0x7fb9a00116e8) at /mariadb/10.3o/sql/sql_select.cc:3591
      #11 0x000055efc106ad98 in JOIN::exec (this=0x7fb9a00116e8) at /mariadb/10.3o/sql/sql_select.cc:3422
      #12 mysql_select (thd=0x7fb9a0000c48, tables=<optimized out>, wild_num=<optimized out>, fields=..., conds=<optimized out>, og_num=<optimized out>, order=<optimized out>, group=<optimized out>, having=<optimized out>, proc_param=<optimized out>, select_options=<optimized out>, result=<optimized out>, unit=<optimized out>, select_lex=<optimized out>) at /mariadb/10.3o/sql/sql_select.cc:3822
      #13 0x000055efc106aa2f in handle_select (thd=0x7fb9a0000c48, lex=0x7fb9a00045c0, result=0x7fb9a00116c8, setup_tables_done_option=<optimized out>) at /mariadb/10.3o/sql/sql_select.cc:365
      #14 0x000055efc104c2b6 in execute_sqlcom_select (thd=0x7fb9a0000c48, all_tables=0x7fb9a000f500) at /mariadb/10.3o/sql/sql_parse.cc:6226
      #15 0x000055efc10454ae in mysql_execute_command (thd=<optimized out>) at /mariadb/10.3o/sql/sql_parse.cc:3533
      #16 0x000055efc1042ee0 in mysql_parse (thd=0x7fb9a0000c48, rawbuf=0x7fb9a000f2c0 "SELECT * FROM INFORMATION_SCHEMA.innodb_trx", length=<optimized out>, parser_state=0x7fba01f26340, is_com_multi=<optimized out>, is_next_command=false) at /mariadb/10.3o/sql/sql_parse.cc:7760
      #17 0x000055efc1040911 in dispatch_command (command=<optimized out>, thd=0x7fb9a0000c48, packet=0x7fb9a0006fa9 "SELECT * FROM INFORMATION_SCHEMA.innodb_trx", packet_length=2684416704, is_com_multi=false, is_next_command=false) at /mariadb/10.3o/sql/sql_parse.cc:1831
       
      Thread 33 (Thread 0x7fba01f72700 (LWP 15570)):
      #8  0x000055efc13af778 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=<optimized out>, n_spins=30, n_delay=6, name=<optimized out>, line=6580) at /mariadb/10.3o/storage/innobase/include/ib0mutex.h:637
      #9  lock_trx_handle_wait (trx=0x7fbae44c4090) at /mariadb/10.3o/storage/innobase/lock/lock0lock.cc:6580
      #10 0x000055efc11d7dd4 in kill_handlerton (thd=0x7fb9b0000c48, plugin=<optimized out>, level=0x7fba01f702b4) at /mariadb/10.3o/sql/handler.cc:843
      #11 0x000055efc10555af in plugin_foreach_with_mask (thd=0x7fb9b0000c48, func=0x55efc11d7da0 <kill_handlerton(THD*, st_plugin_int*, void*)>, type=<optimized out>, state_mask=8, arg=0x7fba01f702b4) at /mariadb/10.3o/sql/sql_plugin.cc:2402
      #12 0x000055efc11d7d90 in ha_kill_query (thd=0x55efc712bee4, level=THD_ABORT_ASAP) at /mariadb/10.3o/sql/handler.cc:850
      #13 0x000055efc1011b4d in THD::awake (this=0x7fb9b0000c48, state_to_set=KILL_QUERY_HARD) at /mariadb/10.3o/sql/sql_class.cc:1723
      #14 0x000055efc104fbf2 in kill_one_thread (thd=0x7fb99c000c48, id=<optimized out>, kill_signal=KILL_QUERY_HARD, type=KILL_TYPE_ID) at /mariadb/10.3o/sql/sql_parse.cc:8916
      #15 0x000055efc1049a98 in sql_kill (thd=0x7fb99c000c48, id=8, state=NOT_KILLED, type=(KILL_TYPE_USER | unknown: 3884938724)) at /mariadb/10.3o/sql/sql_parse.cc:9022
      #16 mysql_execute_command (thd=<optimized out>) at /mariadb/10.3o/sql/sql_parse.cc:5220
      #17 0x000055efc1042ee0 in mysql_parse (thd=0x7fb99c000c48, rawbuf=0x7fb99c00f2c0 "kill query 8", length=<optimized out>, parser_state=0x7fba01f71340, is_com_multi=<optimized out>, is_next_command=false) at /mariadb/10.3o/sql/sql_parse.cc:7760

      Thread 34 (fetch_data_into_cache_low() for information_schema.innodb_trx) is holding lock_sys->mutex, trx_sys->mutex and trying to acquire THD::LOCK_thd_data in thd_query_safe().

      Thread 33 (THD::awake() for KILL QUERY) is holding THD::LOCK_thd_data on the killee connection (which Thread 34 is waiting for) while invoking lock_trx_handle_wait(), which is stuck, trying to acquire lock_sys->mutex (which Thread 34 is holding).

      I can reproduce the hang with 4 client connections:

      1. A session that will be repeatedly killed
      2. A loop executing SHOW PROCESSLIST;
      3. A loop of KILL QUERY of Connection 1
      4. SELECT * FROM information_schema.…

      I do not know why, but I was unable to reproduce the hang without executing SHOW PROCESSLIST.

      It is unclear to me how information_schema.rocksdb_trx is protected against concurrent creation or destruction of transactions. I did not notice any mutex acquisition there. Thus, the server hang may be limited to the 3 information_schema.innodb_ views only.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: