Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28454

Latching order violation (and hang) in ibuf_insert_low

Details

    Description

      A hang was found during the stress test of a 10.6-based branch:

      10.6 03e703fd74b43ffd27e858e03e9d147029de086e

      (gdb) thread apply 10 bt
       
      Thread 10 (Thread 0x7f4307d58700 (LWP 3829108)):
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x0000564bfa889c5f in srw_mutex_impl<true>::wait (this=0x564bfd654080 <lock_sys+64>, lk=2147483650) at /data/Server/bb-10.6-thiru/storage/innobase/sync/srw_lock.cc:239
      #2  0x0000564bfa8894a9 in srw_mutex_impl<true>::wait_and_lock (this=0x564bfd654080 <lock_sys+64>) at /data/Server/bb-10.6-thiru/storage/innobase/sync/srw_lock.cc:326
      #3  0x0000564bfa47a617 in srw_mutex_impl<true>::wr_lock (this=0x564bfd654080 <lock_sys+64>) at /data/Server/bb-10.6-thiru/storage/innobase/include/srw_lock.h:134
      #4  0x0000564bfa889b62 in ssux_lock_impl<true>::rd_wait (this=0x564bfd654080 <lock_sys+64>) at /data/Server/bb-10.6-thiru/storage/innobase/sync/srw_lock.cc:392
      #5  0x0000564bfa5872d7 in ssux_lock_impl<true>::rd_lock (this=0x564bfd654080 <lock_sys+64>) at /data/Server/bb-10.6-thiru/storage/innobase/include/srw_lock.h:247
      #6  0x0000564bfa5be834 in lock_sys_t::rd_lock (this=0x564bfd654040 <lock_sys>) at /data/Server/bb-10.6-thiru/storage/innobase/include/lock0lock.h:774
      #7  0x0000564bfa58f843 in LockGuard::LockGuard (this=0x7f4307d51df0, hash=@0x564bfd654098: {static LATCH = 1, static ELEMENTS_PER_LATCH = 7, static EMPTY_SLOTS_PER_LATCH = 0, n_cells = 6599, array = 0x630000020400}, id=<error reading variable: Cannot access memory at address 0xfffffffffffffdc0>) at /data/Server/bb-10.6-thiru/storage/innobase/lock/lock0lock.cc:197
      #8  0x0000564bfa5783eb in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61600b1ca6d8, entry_size=13, index=0x616007a1bb08, page_id={m_id = 98784247875}, zip_size=0, thr=0x621000b114d8) at /data/Server/bb-10.6-thiru/storage/innobase/ibuf/ibuf0ibuf.cc:3303
      #9  0x0000564bfa579b8a in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61600b1ca6d8, index=0x616007a1bb08, page_id={m_id = 98784247875}, zip_size=0, thr=0x621000b114d8) at /data/Server/bb-10.6-thiru/storage/innobase/ibuf/ibuf0ibuf.cc:3566
      (gdb) thread apply 8 bt
      Thread 8 (Thread 0x7f43060f7700 (LWP 3829133)):
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x0000564bfa889c5f in srw_mutex_impl<true>::wait (this=0x7f433bcb1398, lk=2147483650) at /data/Server/bb-10.6-thiru/storage/innobase/sync/srw_lock.cc:239
      #2  0x0000564bfa8894a9 in srw_mutex_impl<true>::wait_and_lock (this=0x7f433bcb1398) at /data/Server/bb-10.6-thiru/storage/innobase/sync/srw_lock.cc:326
      #3  0x0000564bfa47a617 in srw_mutex_impl<true>::wr_lock (this=0x7f433bcb1398) at /data/Server/bb-10.6-thiru/storage/innobase/include/srw_lock.h:134
      #4  0x0000564bfa889b62 in ssux_lock_impl<true>::rd_wait (this=0x7f433bcb1398) at /data/Server/bb-10.6-thiru/storage/innobase/sync/srw_lock.cc:392
      #5  0x0000564bfa5872d7 in ssux_lock_impl<true>::rd_lock (this=0x7f433bcb1398) at /data/Server/bb-10.6-thiru/storage/innobase/include/srw_lock.h:247
      #6  0x0000564bfa65bb9a in sux_lock<ssux_lock_impl<true> >::s_lock (this=0x7f433bcb1398) at /data/Server/bb-10.6-thiru/storage/innobase/include/sux_lock.h:362
      #7  0x0000564bfa651bcd in mtr_t::page_lock (this=0x7f43060ef570, block=0x7f433bcb1380, rw_latch=1) at /data/Server/bb-10.6-thiru/storage/innobase/mtr/mtr0mtr.cc:1240
      #8  0x0000564bfa9dcadf in buf_page_get_low (page_id={m_id = 4}, zip_size=0, rw_latch=1, guess=0x7f433bcb1380, mode=10, mtr=0x7f43060ef570, err=0x7f43060ee150, allow_ibuf_merge=false) at /data/Server/bb-10.6-thiru/storage/innobase/buf/buf0buf.cc:2974
      #9  0x0000564bfa9dd366 in buf_page_get_gen (page_id={m_id = 4}, zip_size=0, rw_latch=1, guess=0x7f433bcb1380, mode=10, mtr=0x7f43060ef570, err=0x7f43060ee150, allow_ibuf_merge=false) at /data/Server/bb-10.6-thiru/storage/innobase/buf/buf0buf.cc:3045
      #10 0x0000564bfa9779f4 in btr_cur_search_to_nth_level_func (index=0x61600003f408, level=0, tuple=0x61600aa1af08, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f43060ef400, ahi_latch=0x0, mtr=0x7f43060ef570, autoinc=0) at /data/Server/bb-10.6-thiru/storage/innobase/btr/btr0cur.cc:1602
      #11 0x0000564bfa9a91d8 in btr_pcur_open_low (index=0x61600003f408, level=0, tuple=0x61600aa1af08, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f43060ef400, autoinc=0, mtr=0x7f43060ef570) at /data/Server/bb-10.6-thiru/storage/innobase/include/btr0pcur.inl:369
      #12 0x0000564bfa9ad9e4 in btr_pcur_open_on_user_rec (index=0x61600003f408, tuple=0x61600aa1af08, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f43060ef400, mtr=0x7f43060ef570) at /data/Server/bb-10.6-thiru/storage/innobase/btr/btr0pcur.cc:660
      #13 0x0000564bfa580d78 in ibuf_delete_for_discarded_space (space=42) at /data/Server/bb-10.6-thiru/storage/innobase/ibuf/ibuf0ibuf.cc:4477
      #14 0x0000564bfaaf3fcd in fil_delete_tablespace (id=42) at /data/Server/bb-10.6-thiru/storage/innobase/fil/fil0fil.cc:1733
      #15 0x0000564bfaadff45 in trx_t::commit (this=0x7f433ca85740, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-10.6-thiru/storage/innobase/dict/drop.cc:271
      #16 0x0000564bfa441bba in ha_innobase::delete_table (this=0x62b000168c38, name=0x7f43060f21f0 "./test/B") at /data/Server/bb-10.6-thiru/storage/innobase/handler/ha_innodb.cc:13717
      

      Thread 8 is holding exclusive lock_sys.latch while waiting for a page latch on the change buffer root page (page 4 in the system tablespace). Thread 10 is waiting for a shared lock_sys.latch while holding a write latch on page 4.

      The acquisition of lock_sys.latch in ibuf_insert_low() is part of a check to see if change buffering is safe to be attempted. We should use a trylock operation there and refuse change buffering if the latches are not available without waiting.

      Attachments

        Issue Links

          Activity

            RQG
            ====
            # git clone https://github.com/mleich1/rqg --branch experimental RQG
            #
            # GIT_SHOW: HEAD -> experimental, origin/experimental 358366308288eaadbcb62822ef4faadf6e7aebc8 2022-03-28T18:04:44+02:00
            # rqg.pl  : Version 4.0.4 (2021-12)
            #
            # $RQG_HOME/rqg.pl \
            # --grammar=conf/mariadb/innodb_compression_encryption.yy \
            # --gendata=conf/mariadb/innodb_compression_encryption.zz \
            # --max_gd_duration=1800 \
            # --mysqld=--loose-innodb-encryption-threads=1 \
            # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
            # --mysqld=--loose-idle_write_transaction_timeout=0 \
            # --mysqld=--loose-idle_transaction_timeout=0 \
            # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            # --mysqld=--connect_timeout=60 \
            # --mysqld=--interactive_timeout=28800 \
            # --mysqld=--slave_net_timeout=60 \
            # --mysqld=--net_read_timeout=30 \
            # --mysqld=--net_write_timeout=60 \
            # --mysqld=--loose-table_lock_wait_timeout=50 \
            # --mysqld=--wait_timeout=28800 \
            # --mysqld=--lock-wait-timeout=86400 \
            # --mysqld=--innodb-lock-wait-timeout=50 \
            # --no-mask \
            # --queries=10000000 \
            # --seed=random \
            # --reporters=Backtrace \
            # --reporters=ErrorLog \
            # --reporters=Deadlock1 \
            # --validators=None \
            # --mysqld=--log_output=none \
            # --mysqld=--log_bin_trust_function_creators=1 \
            # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            # --engine=InnoDB \
            # --restart_timeout=240 \
            # --mysqld=--plugin-load-add=file_key_management.so \
            # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
            # --mysqld=--plugin-load-add=provider_lzo.so \
            # --mysqld=--plugin-load-add=provider_bzip2.so \
            # --mysqld=--plugin-load-add=provider_lzma.so \
            # --mysqld=--plugin-load-add=provider_snappy.so \
            # --mysqld=--plugin-load-add=provider_lz4.so \
            # --duration=300 \
            # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            # --mysqld=--loose-innodb_read_only_compressed=OFF \
            # --mysqld=--innodb_stats_persistent=off \
            # --mysqld=--innodb_adaptive_hash_index=on \
            # --mysqld=--log-bin \
            # --mysqld=--sync-binlog=1 \
            # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
            # --mysqld=--loose-max-statement-time=30 \
            # --threads=9 \
            # --mysqld=--innodb_use_native_aio=1 \
            # --mysqld=--innodb_undo_tablespaces=3 \
            # --mysqld=--innodb_undo_log_truncate=ON \
            # --mysqld=--loose_innodb_change_buffering=all \
            # --mysqld=--innodb_rollback_on_timeout=OFF \
            # --vardir_type=fast \
            # --mysqld=--innodb_page_size=4K \
            # --mysqld=--innodb-buffer-pool-size=5M \
            # --no_mask \
            # --mtr-build-thread=877 \
            # --batch \
            # <local settings>
             
            The likelihood to replay is not bad.
            

            mleich Matthias Leich added a comment - RQG ==== # git clone https://github.com/mleich1/rqg --branch experimental RQG # # GIT_SHOW: HEAD -> experimental, origin/experimental 358366308288eaadbcb62822ef4faadf6e7aebc8 2022-03-28T18:04:44+02:00 # rqg.pl : Version 4.0.4 (2021-12) # # $RQG_HOME/rqg.pl \ # --grammar=conf/mariadb/innodb_compression_encryption.yy \ # --gendata=conf/mariadb/innodb_compression_encryption.zz \ # --max_gd_duration=1800 \ # --mysqld=--loose-innodb-encryption-threads=1 \ # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \ # --mysqld=--loose-idle_write_transaction_timeout=0 \ # --mysqld=--loose-idle_transaction_timeout=0 \ # --mysqld=--loose-idle_readonly_transaction_timeout=0 \ # --mysqld=--connect_timeout=60 \ # --mysqld=--interactive_timeout=28800 \ # --mysqld=--slave_net_timeout=60 \ # --mysqld=--net_read_timeout=30 \ # --mysqld=--net_write_timeout=60 \ # --mysqld=--loose-table_lock_wait_timeout=50 \ # --mysqld=--wait_timeout=28800 \ # --mysqld=--lock-wait-timeout=86400 \ # --mysqld=--innodb-lock-wait-timeout=50 \ # --no-mask \ # --queries=10000000 \ # --seed=random \ # --reporters=Backtrace \ # --reporters=ErrorLog \ # --reporters=Deadlock1 \ # --validators=None \ # --mysqld=--log_output=none \ # --mysqld=--log_bin_trust_function_creators=1 \ # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ # --engine=InnoDB \ # --restart_timeout=240 \ # --mysqld=--plugin-load-add=file_key_management.so \ # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \ # --mysqld=--plugin-load-add=provider_lzo.so \ # --mysqld=--plugin-load-add=provider_bzip2.so \ # --mysqld=--plugin-load-add=provider_lzma.so \ # --mysqld=--plugin-load-add=provider_snappy.so \ # --mysqld=--plugin-load-add=provider_lz4.so \ # --duration=300 \ # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ # --mysqld=--loose-innodb_read_only_compressed=OFF \ # --mysqld=--innodb_stats_persistent=off \ # --mysqld=--innodb_adaptive_hash_index=on \ # --mysqld=--log-bin \ # --mysqld=--sync-binlog=1 \ # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \ # --mysqld=--loose-max-statement-time=30 \ # --threads=9 \ # --mysqld=--innodb_use_native_aio=1 \ # --mysqld=--innodb_undo_tablespaces=3 \ # --mysqld=--innodb_undo_log_truncate=ON \ # --mysqld=--loose_innodb_change_buffering=all \ # --mysqld=--innodb_rollback_on_timeout=OFF \ # --vardir_type=fast \ # --mysqld=--innodb_page_size=4K \ # --mysqld=--innodb-buffer-pool-size=5M \ # --no_mask \ # --mtr-build-thread=877 \ # --batch \ # <local settings>   The likelihood to replay is not bad.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.