Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22890

DEADLOCK of threads detected: row0sel.cc S-LOCK / btr0cur.cc S-LOCK / row0quiesce.cc X-LOCK

Details

    Description

      Note: I can't tell whether it's the same deadlock as MDEV-21596 or not. They look very similar, other than MDEV-21596 is in row0row.cc and this one is in row0sel.cc. Please feel free to mark as a duplicate.

      rr profile is available.

      10.5 9ed08f3576

      DEADLOCK of threads detected!
      rw-lock 0x7f6a5405f800 --Thread 140088265795328 has waited at row0sel.cc line 3509 for 0.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0x7f6a5405f800 created in file buf0buf.cc line 1286
      a writer (thread id 140088265795328) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 0, lock_word: ffffffff
      Last time write locked in file row0sel.cc line 3509
      Locked: thread 140088269248256 file row0sel.cc line 3509  S-LOCK
      Locked: thread 140088269248256 file row0sel.cc line 3509  S-LOCK
      rw-lock 0x7f68dc1a1b58 --Thread 140090090170112 has waited at row0quiesce.cc line 666 for 0.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0x7f68dc1a1b58 created in file dict0dict.cc line 2155
      a writer (thread id 140090090170112) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 0, lock_word: ffffffff
      Last time write locked in file btr0cur.cc line 1468
      Locked: thread 140088265795328 file btr0cur.cc line 1499  S-LOCK
      Locked: thread 140088265795328 file btr0cur.cc line 1499  S-LOCK
      rw-lock 0x5595755bb218 --Thread 140088269248256 has waited at btr0cur.cc line 1499 for 0.00 seconds the semaphore:
      S-lock on RW-latch at 0x5595755bb218 created in file dict0dict.cc line 2155
      a writer (thread id 140090090170112) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: 0
      Last time write locked in file row0quiesce.cc line 666
      Locked: thread 140090090170112 file row0quiesce.cc line 666  X-LOCK
      2020-06-14  0:07:40 22 [ERROR] [FATAL] InnoDB: ######################################## Deadlock Detected!
      

      #1  0x00007f6a79220801 in __GI_abort () at abort.c:79
      #2  0x000055956fd693bd in ib::fatal::~fatal (this=0x7f68d7865520, __in_chrg=<optimized out>) at /home/mariadb/deadlock_detected/10.5/storage/innobase/ut/ut0ut.cc:586
      #3  0x000055956fd020c2 in sync_array_wait_event (arr=0x559572023268, cell=@0x7f68d78656e0: 0x559572023778) at /home/mariadb/deadlock_detected/10.5/storage/innobase/sync/sync0arr.cc:447
      #4  0x000055956fd069af in rw_lock_s_lock_spin (lock=0x5595755bb218, pass=0, file_name=0x5595704c1210 "/home/mariadb/deadlock_detected/10.5/storage/innobase/btr/btr0cur.cc", line=1499)
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/sync/sync0rw.cc:367
      #5  0x000055956fc4e71e in rw_lock_s_lock_func (lock=0x5595755bb218, pass=0, file_name=0x5595704c1210 "/home/mariadb/deadlock_detected/10.5/storage/innobase/btr/btr0cur.cc", line=1499)
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/include/sync0rw.ic:308
      #6  0x000055956fc4e7d0 in pfs_rw_lock_s_lock_func (lock=0x5595755bb218, pass=0, file_name=0x5595704c1210 "/home/mariadb/deadlock_detected/10.5/storage/innobase/btr/btr0cur.cc", 
          line=1499) at /home/mariadb/deadlock_detected/10.5/storage/innobase/include/sync0rw.ic:658
      #7  0x000055956fc5f19b in mtr_t::s_lock (this=0x7f68d7866df0, lock=0x5595755bb218, file=0x5595704c1210 "/home/mariadb/deadlock_detected/10.5/storage/innobase/btr/btr0cur.cc", 
          line=1499) at /home/mariadb/deadlock_detected/10.5/storage/innobase/include/mtr0mtr.h:222
      #8  0x000055956fd9f425 in btr_cur_search_to_nth_level_func (index=0x5595755bb018, level=0, tuple=0x7f68ec3354a0, mode=PAGE_CUR_LE, latch_mode=1, cursor=0x7f68ec334e90, ahi_latch=0x0, 
          file=0x559570471258 "/home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0sel.cc", line=3256, mtr=0x7f68d7866df0, autoinc=0)
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/btr/btr0cur.cc:1499
      #9  0x000055956fcbfb02 in btr_pcur_open_with_no_init_func (index=0x5595755bb018, tuple=0x7f68ec3354a0, mode=PAGE_CUR_LE, latch_mode=1, cursor=0x7f68ec334e90, ahi_latch=0x0, 
          file=0x559570471258 "/home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0sel.cc", line=3256, mtr=0x7f68d7866df0)
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/include/btr0pcur.ic:504
      #10 0x000055956fcc8f0e in Row_sel_get_clust_rec_for_mysql::operator() (this=0x7f68d78669e0, prebuilt=0x7f68ec334bc8, sec_index=0x7f68dc1a1958, 
          rec=0x7f6a581c1055 "a", ' ' <repeats 199 times>..., thr=0x7f68ec3356b8, out_rec=0x7f68d7866910, offsets=0x7f68d7866930, offset_heap=0x7f68d7866928, vrow=0x0, mtr=0x7f68d7866df0)
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0sel.cc:3254
      #11 0x000055956fccef25 in row_search_mvcc (buf=0x7f68ec200550 "\217\207\217", mode=PAGE_CUR_G, prebuilt=0x7f68ec334bc8, match_mode=0, direction=1)
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0sel.cc:5244
      #12 0x000055956fad21e8 in ha_innobase::general_fetch (this=0x7f68ec14da60, buf=0x7f68ec200550 "\217\207\217", direction=1, match_mode=0)
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/handler/ha_innodb.cc:9227
      #13 0x000055956fad2466 in ha_innobase::index_next (this=0x7f68ec14da60, buf=0x7f68ec200550 "\217\207\217")
          at /home/mariadb/deadlock_detected/10.5/storage/innobase/handler/ha_innodb.cc:9294
      #14 0x000055956f6725f3 in handler::ha_index_next (this=0x7f68ec14da60, buf=0x7f68ec200550 "\217\207\217") at /home/mariadb/deadlock_detected/10.5/sql/handler.cc:3096
      #15 0x000055956f397f5d in join_read_next (info=0x7f68ec0795e0) at /home/mariadb/deadlock_detected/10.5/sql/sql_select.cc:21618
      #16 0x000055956f25fc6b in READ_RECORD::read_record (this=0x7f68ec0795e0) at /home/mariadb/deadlock_detected/10.5/sql/records.h:79
      #17 0x000055956f3956ee in sub_select (join=0x7f68ec1fad10, join_tab=0x7f68ec079518, end_of_records=false) at /home/mariadb/deadlock_detected/10.5/sql/sql_select.cc:20613
      #18 0x000055956f394a91 in do_select (join=0x7f68ec1fad10, procedure=0x0) at /home/mariadb/deadlock_detected/10.5/sql/sql_select.cc:20128
      #19 0x000055956f368a79 in JOIN::exec_inner (this=0x7f68ec1fad10) at /home/mariadb/deadlock_detected/10.5/sql/sql_select.cc:4444
      #20 0x000055956f367ba5 in JOIN::exec (this=0x7f68ec1fad10) at /home/mariadb/deadlock_detected/10.5/sql/sql_select.cc:4225
      #21 0x000055956f4326f9 in st_select_lex_unit::exec (this=0x7f68ec02a280) at /home/mariadb/deadlock_detected/10.5/sql/sql_union.cc:2196
      #22 0x000055956f42ae79 in mysql_union (thd=0x7f68ec026268, lex=0x7f68ec02a1b8, result=0x7f68ec1fabd0, unit=0x7f68ec02a280, setup_tables_done_option=0)
          at /home/mariadb/deadlock_detected/10.5/sql/sql_union.cc:41
      #23 0x000055956f358edf in handle_select (thd=0x7f68ec026268, lex=0x7f68ec02a1b8, result=0x7f68ec1fabd0, setup_tables_done_option=0)
          at /home/mariadb/deadlock_detected/10.5/sql/sql_select.cc:407
      #24 0x000055956f31e5a5 in execute_sqlcom_select (thd=0x7f68ec026268, all_tables=0x7f68ec031428) at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:6208
      #25 0x000055956f3158d6 in mysql_execute_command (thd=0x7f68ec026268) at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:3939
      #26 0x000055956f3233f1 in mysql_parse (thd=0x7f68ec026268, 
          rawbuf=0x7f68ec030a90 "/* QNO 1386 CON_ID 22  GenTest::Transform::ExecuteAsUnion  */ ( SELECT /* QNO 1375 CON_ID 22 */ `col_char_255_utf8_key` FROM `view_table100_int_autoinc` ) UNION ALL ( SELECT /* QNO 1375 CON_ID 22 */ `"..., length=305, parser_state=0x7f68d7868510, is_com_multi=false, is_next_command=false)
          at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:7992
      #27 0x000055956f30f77e in dispatch_command (command=COM_QUERY, thd=0x7f68ec026268, packet=0x7f68ec01c499 "", packet_length=305, is_com_multi=false, is_next_command=false)
          at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:1874
      #28 0x000055956f30deb5 in do_command (thd=0x7f68ec026268) at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:1355
      #29 0x000055956f4b2751 in do_handle_one_connection (connect=0x5595755b19e8, put_in_cache=true) at /home/mariadb/deadlock_detected/10.5/sql/sql_connect.cc:1411
      #30 0x000055956f4b24b5 in handle_one_connection (arg=0x5595755b19e8) at /home/mariadb/deadlock_detected/10.5/sql/sql_connect.cc:1313
      ---Type <return> to continue, or q <return> to quit---
      #31 0x000055956f9ed777 in pfs_spawn_thread (arg=0x5595755b1ac8) at /home/mariadb/deadlock_detected/10.5/storage/perfschema/pfs.cc:2201
      #32 0x00007f6a7a11b6db in start_thread (arg=0x7f68d7869700) at pthread_create.c:463
      #33 0x00007f6a7930188f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Observed on 10.5, never seen on 10.4 so far.

      Attachments

        Issue Links

          Activity

            t35 holds s-lock on col_char_255_utf8_key (index) and tries to acquire x-lock on block 0x510000001b (81, 27)
            t1 holds s-lock on block (81, 27) and waiting to acquire s-lock on primary key
            t24 holds x-lock on primary key and waiting to acquire x-lock on col_char_255_utf8_key (index)

            t35 and t1 does select query.

            t24 does FLUSH TABLES `table100_int_autoinc` FOR EXPORT
            The stack trace is

            #18 0x000055956fd071ec in rw_lock_x_lock_func (lock=0x7f68dc1a1b58, pass=0, 
                file_name=0x55957047d3a8 "/home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0quiesce.cc", line=666)
                at /home/mariadb/deadlock_detected/10.5/storage/innobase/sync/sync0rw.cc:657
            #19 0x000055956fce5139 in pfs_rw_lock_x_lock_func (lock=0x7f68dc1a1b58, 
                pass=0, 
                file_name=0x55957047d3a8 "/home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0quiesce.cc", line=666)
                at /home/mariadb/deadlock_detected/10.5/storage/innobase/include/sync0rw.ic:567
            #20 0x000055956fce744e in row_quiesce_set_state (table=0x7f68dc1da468, 
                state=QUIESCE_START, trx=0x7f6a63a1c710)
                at /home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0quiesce.cc:666
            #21 0x000055956fae184f in ha_innobase::store_lock (this=0x7f68e408f890, 
                thd=0x7f68e8000d78, to=0x7f68e80cf208, lock_type=TL_READ_NO_INSERT)
                at /home/mariadb/deadlock_detected/10.5/storage/innobase/handler/ha_innodb.cc:16618
            #22 0x000055956f7b86f3 in get_lock_data (thd=0x7f68e8000d78, 
                table_ptr=0x7f68e8013100, count=1, flags=1)
                at /home/mariadb/deadlock_detected/10.5/sql/lock.cc:806
            #23 0x000055956f7b6f57 in mysql_lock_tables (thd=0x7f68e8000d78, 
                tables=0x7f68e8013100, count=1, flags=4096)
                at /home/mariadb/deadlock_detected/10.5/sql/lock.cc:298
            #24 0x000055956f273a56 in lock_tables (thd=0x7f68e8000d78, 
            ---Type <return> to continue, or q <return> to quit---
                tables=0x7f68e8012a28, count=1, flags=4096)
                at /home/mariadb/deadlock_detected/10.5/sql/sql_base.cc:5445
            #25 0x000055956f272f7e in open_and_lock_tables (thd=0x7f68e8000d78, 
                options=..., tables=0x7f68e8012a28, derived=false, flags=4096, 
                prelocking_strategy=0x7f69440f8cf0)
                at /home/mariadb/deadlock_detected/10.5/sql/sql_base.cc:5165
            #26 0x000055956f4f50a4 in open_and_lock_tables (thd=0x7f68e8000d78, 
                tables=0x7f68e8012a28, derived=false, flags=4096, 
                prelocking_strategy=0x7f69440f8cf0)
                at /home/mariadb/deadlock_detected/10.5/sql/sql_base.h:277
            #27 0x000055956f4f61af in flush_tables_with_read_lock (thd=0x7f68e8000d78, 
                all_tables=0x7f68e8012a28)
                at /home/mariadb/deadlock_detected/10.5/sql/sql_reload.cc:568
            #28 0x000055956f31af85 in mysql_execute_command (thd=0x7f68e8000d78)
                at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:5370
            #29 0x000055956f3233f1 in mysql_parse (thd=0x7f68e8000d78, 
                rawbuf=0x7f68e80128a0 "FLUSH /* QNO 1471 CON_ID 11 */ TABLES `table100_int_autoinc` FOR EXPORT", length=71, parser_state=0x7f69440f9510, 
                is_com_multi=false, is_next_command=false)
                at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:7992
            #30 0x000055956f30f77e in dispatch_command (command=COM_QUERY, 
                thd=0x7f68e8000d78, 
                packet=0x7f68e8008dc9 "FLUSH /* QNO 1471 CON_ID 11 */ TABLES `table100_int_autoinc` FOR EXPORT", packet_length=71, is_com_multi=false, 
                is_next_command=false)
                at /home/mariadb/deadlock_
            

            row_quiesce_set_state() tries to acquire x-lock on all indexes:

                      row_mysql_lock_data_dictionary(trx);
                      for (dict_index_t* index = dict_table_get_first_index(table);
                           index != NULL;
                           index = dict_table_get_next_index(index)) {
                              rw_lock_x_lock(&index->lock);
                      }
            

            thiru Thirunarayanan Balathandayuthapani added a comment - - edited t35 holds s-lock on col_char_255_utf8_key (index) and tries to acquire x-lock on block 0x510000001b (81, 27) t1 holds s-lock on block (81, 27) and waiting to acquire s-lock on primary key t24 holds x-lock on primary key and waiting to acquire x-lock on col_char_255_utf8_key (index) t35 and t1 does select query. t24 does FLUSH TABLES `table100_int_autoinc` FOR EXPORT The stack trace is #18 0x000055956fd071ec in rw_lock_x_lock_func (lock=0x7f68dc1a1b58, pass=0, file_name=0x55957047d3a8 "/home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0quiesce.cc", line=666) at /home/mariadb/deadlock_detected/10.5/storage/innobase/sync/sync0rw.cc:657 #19 0x000055956fce5139 in pfs_rw_lock_x_lock_func (lock=0x7f68dc1a1b58, pass=0, file_name=0x55957047d3a8 "/home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0quiesce.cc", line=666) at /home/mariadb/deadlock_detected/10.5/storage/innobase/include/sync0rw.ic:567 #20 0x000055956fce744e in row_quiesce_set_state (table=0x7f68dc1da468, state=QUIESCE_START, trx=0x7f6a63a1c710) at /home/mariadb/deadlock_detected/10.5/storage/innobase/row/row0quiesce.cc:666 #21 0x000055956fae184f in ha_innobase::store_lock (this=0x7f68e408f890, thd=0x7f68e8000d78, to=0x7f68e80cf208, lock_type=TL_READ_NO_INSERT) at /home/mariadb/deadlock_detected/10.5/storage/innobase/handler/ha_innodb.cc:16618 #22 0x000055956f7b86f3 in get_lock_data (thd=0x7f68e8000d78, table_ptr=0x7f68e8013100, count=1, flags=1) at /home/mariadb/deadlock_detected/10.5/sql/lock.cc:806 #23 0x000055956f7b6f57 in mysql_lock_tables (thd=0x7f68e8000d78, tables=0x7f68e8013100, count=1, flags=4096) at /home/mariadb/deadlock_detected/10.5/sql/lock.cc:298 #24 0x000055956f273a56 in lock_tables (thd=0x7f68e8000d78, ---Type <return> to continue, or q <return> to quit--- tables=0x7f68e8012a28, count=1, flags=4096) at /home/mariadb/deadlock_detected/10.5/sql/sql_base.cc:5445 #25 0x000055956f272f7e in open_and_lock_tables (thd=0x7f68e8000d78, options=..., tables=0x7f68e8012a28, derived=false, flags=4096, prelocking_strategy=0x7f69440f8cf0) at /home/mariadb/deadlock_detected/10.5/sql/sql_base.cc:5165 #26 0x000055956f4f50a4 in open_and_lock_tables (thd=0x7f68e8000d78, tables=0x7f68e8012a28, derived=false, flags=4096, prelocking_strategy=0x7f69440f8cf0) at /home/mariadb/deadlock_detected/10.5/sql/sql_base.h:277 #27 0x000055956f4f61af in flush_tables_with_read_lock (thd=0x7f68e8000d78, all_tables=0x7f68e8012a28) at /home/mariadb/deadlock_detected/10.5/sql/sql_reload.cc:568 #28 0x000055956f31af85 in mysql_execute_command (thd=0x7f68e8000d78) at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:5370 #29 0x000055956f3233f1 in mysql_parse (thd=0x7f68e8000d78, rawbuf=0x7f68e80128a0 "FLUSH /* QNO 1471 CON_ID 11 */ TABLES `table100_int_autoinc` FOR EXPORT", length=71, parser_state=0x7f69440f9510, is_com_multi=false, is_next_command=false) at /home/mariadb/deadlock_detected/10.5/sql/sql_parse.cc:7992 #30 0x000055956f30f77e in dispatch_command (command=COM_QUERY, thd=0x7f68e8000d78, packet=0x7f68e8008dc9 "FLUSH /* QNO 1471 CON_ID 11 */ TABLES `table100_int_autoinc` FOR EXPORT", packet_length=71, is_com_multi=false, is_next_command=false) at /home/mariadb/deadlock_ row_quiesce_set_state() tries to acquire x-lock on all indexes: row_mysql_lock_data_dictionary(trx); for (dict_index_t* index = dict_table_get_first_index(table); index != NULL; index = dict_table_get_next_index(index)) { rw_lock_x_lock(&index->lock); }

            This type of violation is not being caught by the latching order checks in InnoDB.

            Another example of latching order violation that is not caught is that when a lower-level B-tree page latch is already being held, and a latch for an higher-level page (towards the root) is being acquired. Such violation caused the hang that was reported in MDEV-14637.

            Here, the violation is obvious if you know the InnoDB design constraints. By design, read operations are always allowed to hold a secondary index leaf page latch and then look up the corresponding clustered index record (visiting pages from the root to the leaf), and finally fetch undo log records and BLOB pages while holding both the secondary and clustered index leaf page latch.

            The code in row_quiesce_set_state() is wrong. Acquiring a secondary index latch while holding a clustered index latch is an obvious violation of the latching order.

            Such a wrong pattern appeared already in MySQL 5.1, and it was extended to the FLUSH TABLES…FOR EXPORT code later.

            It might be safe to first acquire all secondary index latches and then the clustered index latch, if we really have to acquire all those latches. Can we not protect the operation by any other means?

            marko Marko Mäkelä added a comment - This type of violation is not being caught by the latching order checks in InnoDB. Another example of latching order violation that is not caught is that when a lower-level B-tree page latch is already being held, and a latch for an higher-level page (towards the root) is being acquired. Such violation caused the hang that was reported in MDEV-14637 . Here, the violation is obvious if you know the InnoDB design constraints. By design, read operations are always allowed to hold a secondary index leaf page latch and then look up the corresponding clustered index record (visiting pages from the root to the leaf), and finally fetch undo log records and BLOB pages while holding both the secondary and clustered index leaf page latch. The code in row_quiesce_set_state() is wrong. Acquiring a secondary index latch while holding a clustered index latch is an obvious violation of the latching order. Such a wrong pattern appeared already in MySQL 5.1 , and it was extended to the FLUSH TABLES…FOR EXPORT code later. It might be safe to first acquire all secondary index latches and then the clustered index latch, if we really have to acquire all those latches. Can we not protect the operation by any other means?

            I think that it suffices to change dict_table_x_lock_indexes(). The order of unlocking cannot affect the occurrence of deadlocks. In fact, I would advocate unlocking the most likely contentious resource first (which often translates to FIFO lock-unlock order). Hence, I would retain dict_table_x_unlock_indexes() in its current form, unlocking the clustered index first.

            marko Marko Mäkelä added a comment - I think that it suffices to change dict_table_x_lock_indexes() . The order of unlocking cannot affect the occurrence of deadlocks. In fact, I would advocate unlocking the most likely contentious resource first (which often translates to FIFO lock-unlock order). Hence, I would retain dict_table_x_unlock_indexes() in its current form, unlocking the clustered index first.

            People

              thiru Thirunarayanan Balathandayuthapani
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.