Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31088

Server freeze due to innodb_change_buffering and innodb_file_per_table=0

Details

    Description

      origin/10.6 78368e5866383333407a4e2752c8b62df8fb8c29 2023-04-19T15:52:11+03:00
      After some time SHOW PROCESSLIST provides
      # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': Content of processlist ---------- begin
      # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': ID -- COMMAND -- TIME -- INFO -- state
      # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 18 -- Sleep -- 201 -- <undef> -- ok
      # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 19 -- Killed -- 181 -- INSERT IGNORE INTO `t7` ( `pad` ) VALUES ( 'p' ) /* E_R Thread1 QNO 546 CON_ID 19 */ -- stalled?
      # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 20 -- Killed -- 181 -- ALTER TABLE `t7` FORCE /* E_R Thread2 QNO 617 CON_ID 20 */ -- stalled?
      # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 41 -- Query -- 0 -- SHOW FULL PROCESSLIST -- ok
      # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': Content of processlist ---------- end
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
      #
      # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD c8b39da52f33e094b049a842ebc60267b43ba8ed 2023-04-18T13:05:08+02:00
      # rqg.pl  : Version 4.2.1 (2022-12)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/innodb_compression_encryption.yy \
      # --gendata=conf/mariadb/innodb_compression_encryption.zz \
      # --max_gd_duration=1800 \
      # --mysqld=--innodb-encrypt-log \
      # --mysqld=--innodb-encrypt-tables \
      # --reporters=CrashRecovery \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no_mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma.so \
      # --mysqld=--plugin-load-add=provider_snappy.so \
      # --mysqld=--plugin-load-add=provider_lz4.so \
      # --mysqld=--loose-innodb_compression_level=1 \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--innodb_file_per_table=0 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--loose-innodb-sync-debug \
      # --mysqld=--innodb_stats_persistent=on \
      # --mysqld=--innodb_adaptive_hash_index=on \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=2 \
      # --mysqld=--innodb_use_native_aio=1 \
      # --mysqld=--innodb_undo_log_truncate=ON \
      # --mysqld=--loose_innodb_change_buffering=changes \
      # --mysqld=--innodb_undo_tablespaces=16 \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --mysqld=--innodb_page_size=32K \
      # --mysqld=--innodb-buffer-pool-size=24M \
      # <local settings>
       
      sdp:/data1/results/1681917174/Freeze$ gdb -c ./1/data/core.1431428 /data/Server_bin/10.6B_RelWithDebInfo/bin/mysqld
      

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich created issue -

            The hang occurs due to the non-default settings of innodb_change_buffering (MDEV-27734, MDEV-27735, MDEV-29694) and innodb_file_per_table (MDEV-29983).
            Thread 8 is waiting for the system tablespace latch in a purge operation, processing a DROP INDEX or DROP TABLE for a table in the system tablespace:

            10.6 78368e5866383333407a4e2752c8b62df8fb8c29

            #8  0x00005603734695ba in mtr_t::x_lock_space (this=this@entry=0x7fe2b3ffd850, space_id=space_id@entry=0) at /data/Server/10.6B/storage/innobase/mtr/mtr0mtr.cc:512
            #9  0x0000560373553fca in fseg_free_step (header=0x7fe3416c004a "", mtr=mtr@entry=0x7fe2b3ffd850, ahi=ahi@entry=false) at /data/Server/10.6B/storage/innobase/fsp/fsp0fsp.cc:2807
            #10 0x00005603734edcf6 in btr_free_but_not_root (block=0x7fe340ca7800, log_mode=<optimized out>, ahi=false) at /data/Server/10.6B/storage/innobase/btr/btr0btr.cc:1155
            #11 0x00005603734ef5a7 in btr_free_if_exists (space=0x560375e9ae20, page=<optimized out>, index_id=338, mtr=0x7fe2b3ffdf30) at /data/Server/10.6B/storage/innobase/btr/btr0btr.cc:1242
            #12 0x0000560373520032 in dict_drop_index_tree (pcur=pcur@entry=0x560375f8d538, trx=trx@entry=0x0, mtr=mtr@entry=0x7fe2b3ffdf30) at /data/Server/10.6B/storage/innobase/include/mach0data.inl:363
            

            Also Thread 18 is waiting on the system tablespace latch due to a change buffer merge:

            10.6 78368e5866383333407a4e2752c8b62df8fb8c29

            #7  mtr_t::x_lock_space (this=this@entry=0x7fe33c35d160, space=space@entry=0x560375e9ae20) at /data/Server/10.6B/storage/innobase/mtr/mtr0mtr.cc:518
            No locals.
            #8  0x000056037354f41b in fseg_page_is_allocated …
            #9  0x000056037344def8 in ibuf_merge_or_delete_for_page (block=block@entry=0x7fe340cb82e0, page_id=<optimized out>, page_id@entry={m_id = 314}, zip_size=0) at /data/Server/10.6B/storage/innobase/ibuf/ibuf0ibuf.cc:4194
            

            The system tablespace latch is being held by Thread 15, which is waiting for the page latch that Thread 18 is holding (0x7fe340cb82e0+0x18 = 0x7fe340cb82f8):

            10.6 78368e5866383333407a4e2752c8b62df8fb8c29

            #5  sux_lock<ssux_lock_impl<true> >::x_lock (for_io=false, this=0x7fe340cb82f8) at /data/Server/10.6B/storage/innobase/include/sux_lock.h:395
            #6  buf_page_create_low (page_id={m_id = 314}, zip_size=<optimized out>, mtr=<optimized out>, free_block=0x7fe340c9bb40) at /data/Server/10.6B/storage/innobase/buf/buf0buf.cc:3163
            #7  0x000056037354f117 in fsp_page_create (space=0x560375e9ae20, offset=314, mtr=0x7fdf64082e30) at /data/Server/10.6B/storage/innobase/fsp/fsp0fsp.cc:1062
            #8  0x00005603735517f3 in fsp_alloc_free_page (space=0x560375e9ae20, hint=<optimized out>, mtr=0x7fe33c2b5720, init_mtr=0x7fdf64082e30, err=0x7fe33c2b5714) at /data/Server/10.6B/storage/innobase/fsp/fsp0fsp.cc:1175
            

            marko Marko Mäkelä added a comment - The hang occurs due to the non-default settings of innodb_change_buffering ( MDEV-27734 , MDEV-27735 , MDEV-29694 ) and innodb_file_per_table ( MDEV-29983 ). Thread 8 is waiting for the system tablespace latch in a purge operation, processing a DROP INDEX or DROP TABLE for a table in the system tablespace: 10.6 78368e5866383333407a4e2752c8b62df8fb8c29 #8 0x00005603734695ba in mtr_t::x_lock_space (this=this@entry=0x7fe2b3ffd850, space_id=space_id@entry=0) at /data/Server/10.6B/storage/innobase/mtr/mtr0mtr.cc:512 #9 0x0000560373553fca in fseg_free_step (header=0x7fe3416c004a "", mtr=mtr@entry=0x7fe2b3ffd850, ahi=ahi@entry=false) at /data/Server/10.6B/storage/innobase/fsp/fsp0fsp.cc:2807 #10 0x00005603734edcf6 in btr_free_but_not_root (block=0x7fe340ca7800, log_mode=<optimized out>, ahi=false) at /data/Server/10.6B/storage/innobase/btr/btr0btr.cc:1155 #11 0x00005603734ef5a7 in btr_free_if_exists (space=0x560375e9ae20, page=<optimized out>, index_id=338, mtr=0x7fe2b3ffdf30) at /data/Server/10.6B/storage/innobase/btr/btr0btr.cc:1242 #12 0x0000560373520032 in dict_drop_index_tree (pcur=pcur@entry=0x560375f8d538, trx=trx@entry=0x0, mtr=mtr@entry=0x7fe2b3ffdf30) at /data/Server/10.6B/storage/innobase/include/mach0data.inl:363 Also Thread 18 is waiting on the system tablespace latch due to a change buffer merge: 10.6 78368e5866383333407a4e2752c8b62df8fb8c29 #7 mtr_t::x_lock_space (this=this@entry=0x7fe33c35d160, space=space@entry=0x560375e9ae20) at /data/Server/10.6B/storage/innobase/mtr/mtr0mtr.cc:518 No locals. #8 0x000056037354f41b in fseg_page_is_allocated … #9 0x000056037344def8 in ibuf_merge_or_delete_for_page (block=block@entry=0x7fe340cb82e0, page_id=<optimized out>, page_id@entry={m_id = 314}, zip_size=0) at /data/Server/10.6B/storage/innobase/ibuf/ibuf0ibuf.cc:4194 The system tablespace latch is being held by Thread 15, which is waiting for the page latch that Thread 18 is holding (0x7fe340cb82e0+0x18 = 0x7fe340cb82f8): 10.6 78368e5866383333407a4e2752c8b62df8fb8c29 #5 sux_lock<ssux_lock_impl<true> >::x_lock (for_io=false, this=0x7fe340cb82f8) at /data/Server/10.6B/storage/innobase/include/sux_lock.h:395 #6 buf_page_create_low (page_id={m_id = 314}, zip_size=<optimized out>, mtr=<optimized out>, free_block=0x7fe340c9bb40) at /data/Server/10.6B/storage/innobase/buf/buf0buf.cc:3163 #7 0x000056037354f117 in fsp_page_create (space=0x560375e9ae20, offset=314, mtr=0x7fdf64082e30) at /data/Server/10.6B/storage/innobase/fsp/fsp0fsp.cc:1062 #8 0x00005603735517f3 in fsp_alloc_free_page (space=0x560375e9ae20, hint=<optimized out>, mtr=0x7fe33c2b5720, init_mtr=0x7fdf64082e30, err=0x7fe33c2b5714) at /data/Server/10.6B/storage/innobase/fsp/fsp0fsp.cc:1175
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Description
            {noformat}
            origin/10.6 78368e5866383333407a4e2752c8b62df8fb8c29 2023-04-19T15:52:11+03:00
            After some time SHOW PROCESSLIST provides
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': Content of processlist ---------- begin
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': ID -- COMMAND -- TIME -- INFO -- state
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 18 -- Sleep -- 201 -- <undef> -- ok
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 19 -- Killed -- 181 -- INSERT IGNORE INTO `t7` ( `pad` ) VALUES ( 'p' ) /* E_R Thread1 QNO 546 CON_ID 19 */ -- stalled?
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 20 -- Killed -- 181 -- ALTER TABLE `t7` FORCE /* E_R Thread2 QNO 617 CON_ID 20 */ -- stalled?
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 41 -- Query -- 0 -- SHOW FULL PROCESSLIST -- ok
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': Content of processlist ---------- end

            RQG
            ===
            # git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
            #
            # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD c8b39da52f33e094b049a842ebc60267b43ba8ed 2023-04-18T13:05:08+02:00
            # rqg.pl : Version 4.2.1 (2022-12)
            #
            # $RQG_HOME/rqg.pl \
            # --grammar=conf/mariadb/innodb_compression_encryption.yy \
            # --gendata=conf/mariadb/innodb_compression_encryption.zz \
            # --max_gd_duration=1800 \
            # --mysqld=--innodb-encrypt-log \
            # --mysqld=--innodb-encrypt-tables \
            # --reporters=CrashRecovery \
            # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
            # --mysqld=--loose-idle_write_transaction_timeout=0 \
            # --mysqld=--loose-idle_transaction_timeout=0 \
            # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            # --mysqld=--connect_timeout=60 \
            # --mysqld=--interactive_timeout=28800 \
            # --mysqld=--slave_net_timeout=60 \
            # --mysqld=--net_read_timeout=30 \
            # --mysqld=--net_write_timeout=60 \
            # --mysqld=--loose-table_lock_wait_timeout=50 \
            # --mysqld=--wait_timeout=28800 \
            # --mysqld=--lock-wait-timeout=86400 \
            # --mysqld=--innodb-lock-wait-timeout=50 \
            # --no_mask \
            # --queries=10000000 \
            # --seed=random \
            # --reporters=Backtrace \
            # --reporters=ErrorLog \
            # --reporters=Deadlock \
            # --validators=None \
            # --mysqld=--log_output=none \
            # --mysqld=--log_bin_trust_function_creators=1 \
            # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            # --engine=InnoDB \
            # --restart_timeout=240 \
            # --mysqld=--plugin-load-add=file_key_management.so \
            # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
            # --mysqld=--plugin-load-add=provider_lzo.so \
            # --mysqld=--plugin-load-add=provider_bzip2.so \
            # --mysqld=--plugin-load-add=provider_lzma.so \
            # --mysqld=--plugin-load-add=provider_snappy.so \
            # --mysqld=--plugin-load-add=provider_lz4.so \
            # --mysqld=--loose-innodb_compression_level=1 \
            # --duration=300 \
            # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            # --mysqld=--innodb_file_per_table=0 \
            # --mysqld=--loose-innodb_read_only_compressed=OFF \
            # --mysqld=--loose-innodb-sync-debug \
            # --mysqld=--innodb_stats_persistent=on \
            # --mysqld=--innodb_adaptive_hash_index=on \
            # --mysqld=--log-bin \
            # --mysqld=--sync-binlog=1 \
            # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
            # --mysqld=--loose-max-statement-time=30 \
            # --threads=2 \
            # --mysqld=--innodb_use_native_aio=1 \
            # --mysqld=--innodb_undo_log_truncate=ON \
            # --mysqld=--loose_innodb_change_buffering=changes \
            # --mysqld=--innodb_undo_tablespaces=16 \
            # --mysqld=--innodb_rollback_on_timeout=OFF \
            # --mysqld=--innodb_page_size=32K \
            # --mysqld=--innodb-buffer-pool-size=24M \
            # <local settings>

            sdp:/data1/results/1681917174/Freeze$ gdb -c ./1/data/core.1431428 /data/Server_bin/10.6B_RelWithDebInfo/bin/mysqld
            {noformat}
            {noformat}
            origin/10.6 78368e5866383333407a4e2752c8b62df8fb8c29 2023-04-19T15:52:11+03:00
            After some time SHOW PROCESSLIST provides
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': Content of processlist ---------- begin
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': ID -- COMMAND -- TIME -- INFO -- state
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 18 -- Sleep -- 201 -- <undef> -- ok
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 19 -- Killed -- 181 -- INSERT IGNORE INTO `t7` ( `pad` ) VALUES ( 'p' ) /* E_R Thread1 QNO 546 CON_ID 19 */ -- stalled?
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 20 -- Killed -- 181 -- ALTER TABLE `t7` FORCE /* E_R Thread2 QNO 617 CON_ID 20 */ -- stalled?
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': 41 -- Query -- 0 -- SHOW FULL PROCESSLIST -- ok
            # 2023-04-19T08:21:45 [1441179] Reporter 'Deadlock': Content of processlist ---------- end

            RQG
            ===
            # git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
            #
            # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD c8b39da52f33e094b049a842ebc60267b43ba8ed 2023-04-18T13:05:08+02:00
            # rqg.pl : Version 4.2.1 (2022-12)
            #
            # $RQG_HOME/rqg.pl \
            # --grammar=conf/mariadb/innodb_compression_encryption.yy \
            # --gendata=conf/mariadb/innodb_compression_encryption.zz \
            # --max_gd_duration=1800 \
            # --mysqld=--innodb-encrypt-log \
            # --mysqld=--innodb-encrypt-tables \
            # --reporters=CrashRecovery \
            # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
            # --mysqld=--loose-idle_write_transaction_timeout=0 \
            # --mysqld=--loose-idle_transaction_timeout=0 \
            # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            # --mysqld=--connect_timeout=60 \
            # --mysqld=--interactive_timeout=28800 \
            # --mysqld=--slave_net_timeout=60 \
            # --mysqld=--net_read_timeout=30 \
            # --mysqld=--net_write_timeout=60 \
            # --mysqld=--loose-table_lock_wait_timeout=50 \
            # --mysqld=--wait_timeout=28800 \
            # --mysqld=--lock-wait-timeout=86400 \
            # --mysqld=--innodb-lock-wait-timeout=50 \
            # --no_mask \
            # --queries=10000000 \
            # --seed=random \
            # --reporters=Backtrace \
            # --reporters=ErrorLog \
            # --reporters=Deadlock \
            # --validators=None \
            # --mysqld=--log_output=none \
            # --mysqld=--log_bin_trust_function_creators=1 \
            # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            # --engine=InnoDB \
            # --restart_timeout=240 \
            # --mysqld=--plugin-load-add=file_key_management.so \
            # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
            # --mysqld=--plugin-load-add=provider_lzo.so \
            # --mysqld=--plugin-load-add=provider_bzip2.so \
            # --mysqld=--plugin-load-add=provider_lzma.so \
            # --mysqld=--plugin-load-add=provider_snappy.so \
            # --mysqld=--plugin-load-add=provider_lz4.so \
            # --mysqld=--loose-innodb_compression_level=1 \
            # --duration=300 \
            # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            # --mysqld=--innodb_file_per_table=0 \
            # --mysqld=--loose-innodb_read_only_compressed=OFF \
            # --mysqld=--loose-innodb-sync-debug \
            # --mysqld=--innodb_stats_persistent=on \
            # --mysqld=--innodb_adaptive_hash_index=on \
            # --mysqld=--log-bin \
            # --mysqld=--sync-binlog=1 \
            # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
            # --mysqld=--loose-max-statement-time=30 \
            # --threads=2 \
            # --mysqld=--innodb_use_native_aio=1 \
            # --mysqld=--innodb_undo_log_truncate=ON \
            # --mysqld=--loose_innodb_change_buffering=changes \
            # --mysqld=--innodb_undo_tablespaces=16 \
            # --mysqld=--innodb_rollback_on_timeout=OFF \
            # --mysqld=--innodb_page_size=32K \
            # --mysqld=--innodb-buffer-pool-size=24M \
            # <local settings>

            sdp:/data1/results/1681917174/Freeze$ gdb -c ./1/data/core.1431428 /data/Server_bin/10.6B_RelWithDebInfo/bin/mysqld
            {noformat}
            Labels hang
            Summary Server freeze Server freeze due to innodb_change_buffering and innodb_file_per_table=0
            marko Marko Mäkelä made changes -

            The call to fseg_page_is_allocated() (before MDEV-13542, it was called fseg_page_is_free()) was added to ibuf_merge_or_delete_for_page() in MDEV-24569.

            marko Marko Mäkelä added a comment - The call to fseg_page_is_allocated() (before MDEV-13542 , it was called fseg_page_is_free() ) was added to ibuf_merge_or_delete_for_page() in MDEV-24569 .

            Potentially this could also cause a hang between DROP INDEX and a concurrent page allocation in an .ibd file, but the probability of that should be much lower.

            A possible fix seems to be to acquire an exclusive tablespace latch earlier, among other places, in dict_drop_index_tree(). It is somewhat difficult to arrange that, because btr_free_if_exists() involves executing nested mini-transactions, and the fil_space_t::latch is not recursive starting with MariaDB Server 10.6.

            marko Marko Mäkelä added a comment - Potentially this could also cause a hang between DROP INDEX and a concurrent page allocation in an .ibd file, but the probability of that should be much lower. A possible fix seems to be to acquire an exclusive tablespace latch earlier, among other places, in dict_drop_index_tree() . It is somewhat difficult to arrange that, because btr_free_if_exists() involves executing nested mini-transactions, and the fil_space_t::latch is not recursive starting with MariaDB Server 10.6.
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -

            I don’t think that we can easily fix this completely. What we can do is the following:

            • In ibuf_merge_or_delete_for_page(nullptr, …), avoid acquiring the tablespace latch, because in that case we will discard the changes.
            • In ibuf_read_merge_pages() acquire the tablespace latch upfront, to avoid the latching order violation in ibuf_merge_or_delete_for_page().

            I believe that a server deadlock is possible in any operation that involves the change buffer merge (such as SELECT or CHECK TABLE when buffered changes existed), and a page is being allocated in the same tablespace. The probability of that occurring for tables stored in .ibd files (the default is innodb_file_per_table=1) should be much lower than for tables stored in the system tablespace; I imagine that it might happen if the server had been killed right after a DROP INDEX operation was committed, and right after recovery there were writes to the table that caused page allocation.

            Note: the change buffer was disabled by default in MDEV-27734.

            marko Marko Mäkelä added a comment - I don’t think that we can easily fix this completely. What we can do is the following: In ibuf_merge_or_delete_for_page(nullptr, …) , avoid acquiring the tablespace latch, because in that case we will discard the changes. In ibuf_read_merge_pages() acquire the tablespace latch upfront, to avoid the latching order violation in ibuf_merge_or_delete_for_page() . I believe that a server deadlock is possible in any operation that involves the change buffer merge (such as SELECT or CHECK TABLE when buffered changes existed), and a page is being allocated in the same tablespace. The probability of that occurring for tables stored in .ibd files (the default is innodb_file_per_table=1 ) should be much lower than for tables stored in the system tablespace; I imagine that it might happen if the server had been killed right after a DROP INDEX operation was committed, and right after recovery there were writes to the table that caused page allocation. Note: the change buffer was disabled by default in MDEV-27734 .
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2023-06-02 13:57:50.0 2023-06-02 13:57:50.819
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.15 [ 29013 ]
            Fix Version/s 10.9.8 [ 29015 ]
            Fix Version/s 10.10.6 [ 29017 ]
            Fix Version/s 10.11.5 [ 29019 ]
            Fix Version/s 11.0.3 [ 28920 ]
            Fix Version/s 11.1.2 [ 28921 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]

            origin/bb-10.6-MDEV-30986 d9d63d9ec180f7783d58fdc82607533493b8e306 2023-06-01T12:39:04+03:00
            which also contains the fix for the current MDEV performed well in RQG testing.
            

            mleich Matthias Leich added a comment - origin/bb-10.6-MDEV-30986 d9d63d9ec180f7783d58fdc82607533493b8e306 2023-06-01T12:39:04+03:00 which also contains the fix for the current MDEV performed well in RQG testing.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.