Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29883

Deadlock between InnoDB statistics update and BLOB insert

    XMLWordPrintable

Details

    Description

      Today, I was lucky and got the test innodb.innodb-wl5522-debug hanging once. The deadlock would seem to be between these two threads:

      Thread 10 (Thread 0x7ff9beffd640 (LWP 1487571) "mariadbd"):
      ...
      #5  0x000055770b6b3c90 in fil_space_t::x_lock (this=0x7ff9940c6fa0) at /mariadb/10.6/storage/innobase/include/fil0fil.h:1058
      #6  0x000055770b7d1996 in mtr_t::x_lock_space (this=this@entry=0x7ff9beffc3a0, space=0x7ff9940c6fa0) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:816
      #7  0x000055770b73d3bf in dict_stats_analyze_index (index=index@entry=0x7ff9940c4740) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2573
      #8  0x000055770b73dde7 in dict_stats_update_persistent (table=table@entry=0x7ff99405ecf0) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2879
      #9  0x000055770b73ea92 in dict_stats_update (table=table@entry=0x7ff99405ecf0, stats_upd_option=stats_upd_option@entry=DICT_STATS_RECALC_PERSISTENT) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:3929
      #10 0x000055770b740c84 in dict_stats_process_entry_from_recalc_pool (thd=thd@entry=0x7ff98c001168) at /mariadb/10.6/storage/innobase/dict/dict0stats_bg.cc:343
      #11 0x000055770b740d83 in dict_stats_func () at /mariadb/10.6/storage/innobase/dict/dict0stats_bg.cc:382
      ...
      Thread 8 (Thread 0x7ff9de2b8640 (LWP 1485618) "mariadbd"):
      ...
      #5  0x000055770b6a11c8 in sux_lock<ssux_lock_impl<true> >::u_lock (this=0x7ff9de640d58) at /mariadb/10.6/storage/innobase/include/sux_lock.h:378
      #6  0x000055770b724a14 in btr_page_alloc_low (index=index@entry=0x7ff9940c4740, hint_page_no=hint_page_no@entry=29, file_direction=file_direction@entry=113 'q', level=level@entry=0, mtr=mtr@entry=0x7ff9de2b4fd0, init_mtr=init_mtr@entry=0x7ff9de2b4fd0, err=0x7ff9de2b4f18) at /mariadb/10.6/storage/innobase/btr/btr0btr.cc:531
      #7  0x000055770b724a80 in btr_page_alloc (index=index@entry=0x7ff9940c4740, hint_page_no=hint_page_no@entry=29, file_direction=file_direction@entry=113 'q', level=level@entry=0, mtr=mtr@entry=0x7ff9de2b4fd0, init_mtr=init_mtr@entry=0x7ff9de2b4fd0, err=0x7ff9de2b4f18) at /mariadb/10.6/storage/innobase/btr/btr0btr.cc:566
      #8  0x000055770b6fe462 in btr_store_big_rec_extern_fields (pcur=pcur@entry=0x7ff9de2b54e0, offsets=offsets@entry=0x7ff9de2b5c10, big_rec_vec=big_rec_vec@entry=0x7ff9942132b0, btr_mtr=btr_mtr@entry=0x7ff9de2b55d0, op=op@entry=BTR_STORE_INSERT) at /mariadb/10.6/storage/innobase/btr/btr0cur.cc:6936
      #9  0x000055770b83063e in row_ins_index_entry_big_rec (entry=entry@entry=0x7ff994064540, big_rec=0x7ff9942132b0, offsets=0x7ff9de2b5c10, heap=heap@entry=0x7ff9de2b5b08, index=index@entry=0x7ff9940c4740, thd=0x7ff994000d58) at /mariadb/10.6/storage/innobase/row/row0ins.cc:2486
      ...
      #32 0x000055770b3b18f8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7ff994000d58, packet=packet@entry=0x7ff99400ad29 "INSERT INTO t1\nSELECT 100, REPEAT('Karanbir', 128), REPEAT('Ajeeth', 1200)\nFROM seq_1_to_256", packet_length=packet_length@entry=92, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1896
      

      This was with some patches, removing the flags BTR_LATCH_FOR_INSERT and BTR_MODIFY_EXTERNAL. I attempted to run the test 300 more times, but the hang was not reproduced.

      Thread 8 (writing a BLOB) is holding the clustered index U-latch and the tablespace U-latch or X-latch. Thread 10 (updating persistent statistics) in dict_stats_analyze_index() had executed the following code:

      	mtr.start();
      	mtr_s_lock_index(index, &mtr);
      	dberr_t err;
              buf_block_t* root = btr_root_block_get(index, RW_SX_LATCH, &mtr, &err);
      	if (!root) {
      empty_index:
      		mtr.commit();
      		dict_stats_assert_initialized_index(index);
      		DBUG_RETURN(result);
      	}
       
      	uint16_t root_level = btr_page_get_level(root->page.frame);
      	mtr.x_lock_space(index->table->space);
      

      That is, it is holding an index S-latch (which does not conflict with the U latch that the BLOB insert is holding), the clustered index root page latch, and waiting for an exclusive latch on the tablespace.

      It was served an additional portion of luck, and the following hung on the first try:

      ./mtr --rr innodb.innodb-wl5522-debug
      

      After killall -ABRT mariadbd I got a nice trace of the hang between an INSERT of a BLOB and an update of statistics. The BLOB write had acquired the tablespace latch before trying to acquire any further page latches:

      #3  0x000055692e64c996 in mtr_t::x_lock_space (this=this@entry=0x7fd01cefbfd0, space=space@entry=0x7fd00c107c70) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:816
      #4  0x000055692e537fe1 in fsp_reserve_free_extents (n_reserved=n_reserved@entry=0x7fd01cefbf1c, space=0x7fd00c107c70, n_ext=n_ext@entry=1, alloc_type=alloc_type@entry=FSP_BLOB, mtr=mtr@entry=0x7fd01cefbfd0, 
          n_pages=n_pages@entry=1) at /mariadb/10.6/storage/innobase/fsp/fsp0fsp.cc:2391
      #5  0x000055692e5793d2 in btr_store_big_rec_extern_fields (pcur=pcur@entry=0x7fd01cefc4e0, offsets=offsets@entry=0x7fd00c0f22c8, big_rec_vec=big_rec_vec@entry=0x7fd00c0e1ef0, 
          btr_mtr=btr_mtr@entry=0x7fd01cefc5d0, op=op@entry=BTR_STORE_INSERT) at /mariadb/10.6/storage/innobase/btr/btr0cur.cc:6927
      

      Hence, the culprit for invalid latching order must be dict_stats_analyze_index().

      I think that I saw some waits for fil_space_t::latch in some core dumps for MDEV-29835. Therefore, this bug might explain at least some of the hangs for which MDEV-29835 was filed.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.