Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27700

SUMMARY: AddressSanitizer: heap-use-after-free storage/innobase/btr/btr0sea.cc:1305 in btr_search_drop_page_hash_index(buf_block_t*)

Details

    Description

      origin/bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f 2022-01-21T21:00:54+03:00
      Per Vladislav his modifications for MDEV-20605 cannot be the reason for the bad effect.
       
      [rr 2508867 1214435][rr 2508867 1214439]==2508867==ERROR: AddressSanitizer: heap-use-after-free on address 0x616004721520 at pc 0x55ac1b2df542 bp 0x2b8f042ec780 sp 0x2b8f042ec770
      [rr 2508867 1214442][rr 2508867 1214444]READ of size 8 at 0x616004721520 thread T2
      [rr 2508867 1214630]2022-01-31 17:21:11 40 [Warning] Aborted connection 40 to db: 'test' user: 'root' host: 'localhost' (This connection closed normally)
      [rr 2508867 1220937]    #0 0x55ac1b2df541 in btr_search_drop_page_hash_index(buf_block_t*) /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:1305
      [rr 2508867 1220954]    #1 0x55ac1b3505ec in buf_LRU_free_page(buf_page_t*, bool) /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0lru.cc:959
      [rr 2508867 1220974]    #2 0x55ac1b33a539 in buf_flush_discard_page /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:1222
      [rr 2508867 1220986]    #3 0x55ac1b33bf07 in buf_do_flush_list_batch /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:1427
      [rr 2508867 1220988]    #4 0x55ac1b33c964 in buf_flush_list /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:1511
      [rr 2508867 1220990]    #5 0x55ac1b342475 in buf_flush_page_cleaner /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:2320
      [rr 2508867 1220992]    #6 0x55ac1b348f83 in void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79f83)
      [rr 2508867 1220994]    #7 0x55ac1b348ebf in std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79ebf)
      [rr 2508867 1221004]    #8 0x55ac1b348de5 in void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79de5)
      [rr 2508867 1221006]    #9 0x55ac1b348d66 in std::thread::_Invoker<std::tuple<void (*)()> >::operator()() (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79d66)
      [rr 2508867 1221008]    #10 0x55ac1b348ccb in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79ccb)
      [rr 2508867 1221010]    #11 0x7f797408dde3  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd6de3)
      [rr 2508867 1221016]    #12 0x17447d5c9608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
      [rr 2508867 1221018]    #13 0x7f7973ee5292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)
      ...
      rr 2508867 1221774]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:1305 in btr_search_drop_page_hash_index(buf_block_t*)
      [rr 2508867 1222208]Shadow bytes around the buggy address:
        0x0c2c808dc250: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        0x0c2c808dc260: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        0x0c2c808dc270: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
        0x0c2c808dc280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c2c808dc290: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
      =>0x0c2c808dc2a0: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd
        0x0c2c808dc2b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        0x0c2c808dc2c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        0x0c2c808dc2d0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
        0x0c2c808dc2e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c2c808dc2f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
       
      pluto:/data/results/1643649542/TBR-1001/dev/shm/rqg/1643649542/48/1/rr
       
      RQG
      ====
      # git clone https://github.com/mleich1/rqg --branch experimental RQG
      #
      # GIT_SHOW: HEAD -> experimental, origin/experimental 067340e8ec78c25880ccd2e67dd62ad134edc531 2022-01-31T17:10:39+01:00
      # rqg.pl  : Version 4.0.4 (2021-12)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/table_stress_innodb_dml.yy \
      # --gendata=conf/mariadb/table_stress.zz \
      # --gendata_sql=conf/mariadb/table_stress.sql \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock1 \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma \
      # --mysqld=--plugin-load-add=provider_snappy \
      # --mysqld=--plugin-load-add=provider_lz4 \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=off \
      # --mysqld=--innodb_adaptive_hash_index=on \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=9 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --vardir_type=fast \
      # --mysqld=--innodb_page_size=4K \
      # --mysqld=--innodb-buffer-pool-size=5M \
      # --no_mask \
      # <local settings>
      

      Attachments

        Issue Links

          Activity

            This appears to be a race condition in the MDEV-22456 "lazy freeing" code when another thread is concurrently disabling the adaptive hash index. It occurred during buffer pool resizing:

            ssh pluto
            rr replay /data/results/1643649542/TBR-1001/dev/shm/rqg/1643649542/48/1/rr/latest-trace
            

            watch -l *(char*)0x0c2c808dc2a4
            reverse-continue
            reverse-continue
            backtrace
            thread apply 3 backtrace
            

            At the time the memory was freed, Thread 3, which would later trip heap-use-after-free, was already trying to drop the adaptive hash index:

            (rr) backtrace
            #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:190
            #1  0x000028850b6d49d9 in ?? () from /lib/x86_64-linux-gnu/libasan.so.5
            #2  0x000028850b7b5799 in free () from /lib/x86_64-linux-gnu/libasan.so.5
            #3  0x000055ac1af6db6e in mem_heap_block_free (heap=0x611000193b00, block=0x616004721480) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/mem/mem0mem.cc:416
            #4  0x000055ac1b3c150e in mem_heap_free (heap=0x611000193b00) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/include/mem0mem.ic:419
            #5  0x000055ac1b3c9e41 in dict_mem_index_free (index=0x616004721508) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/dict/dict0mem.cc:1096
            #6  0x000055ac1b2d92fa in btr_search_lazy_free (index=0x616004721508) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:213
            #7  0x000055ac1b2e8874 in buf_pool_t::clear_hash_index (this=0x55ac1d344d00 <buf_pool>) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:953
            #8  0x000055ac1b2d95ea in btr_search_disable () at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:262
            #9  0x000055ac1b319553 in buf_pool_t::resize (this=0x55ac1d344d00 <buf_pool>) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0buf.cc:1664
            #10 0x000055ac1b2fdc22 in buf_resize_callback () at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0buf.cc:2002
            …
            Thread 3 (Thread 2508867.2509177):
            …
            #13 0x000055ac1ad9e9f0 in ssux_lock_impl<true>::wr_lock (this=0x615000003300) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/include/srw_lock.h:226
            #14 0x000055ac1b2df424 in btr_search_drop_page_hash_index (block=0x4bc064c83310) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:1289
            #15 0x000055ac1b3505ed in buf_LRU_free_page (bpage=0x4bc064c83310, zip=true) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0lru.cc:959
            

            marko Marko Mäkelä added a comment - This appears to be a race condition in the MDEV-22456 "lazy freeing" code when another thread is concurrently disabling the adaptive hash index. It occurred during buffer pool resizing: ssh pluto rr replay /data/results/1643649542/TBR-1001/dev/shm/rqg/1643649542/48/1/rr/latest-trace watch -l *(char*)0x0c2c808dc2a4 reverse-continue reverse-continue backtrace thread apply 3 backtrace At the time the memory was freed, Thread 3, which would later trip heap-use-after-free, was already trying to drop the adaptive hash index: (rr) backtrace #0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:190 #1 0x000028850b6d49d9 in ?? () from /lib/x86_64-linux-gnu/libasan.so.5 #2 0x000028850b7b5799 in free () from /lib/x86_64-linux-gnu/libasan.so.5 #3 0x000055ac1af6db6e in mem_heap_block_free (heap=0x611000193b00, block=0x616004721480) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/mem/mem0mem.cc:416 #4 0x000055ac1b3c150e in mem_heap_free (heap=0x611000193b00) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/include/mem0mem.ic:419 #5 0x000055ac1b3c9e41 in dict_mem_index_free (index=0x616004721508) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/dict/dict0mem.cc:1096 #6 0x000055ac1b2d92fa in btr_search_lazy_free (index=0x616004721508) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:213 #7 0x000055ac1b2e8874 in buf_pool_t::clear_hash_index (this=0x55ac1d344d00 <buf_pool>) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:953 #8 0x000055ac1b2d95ea in btr_search_disable () at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:262 #9 0x000055ac1b319553 in buf_pool_t::resize (this=0x55ac1d344d00 <buf_pool>) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0buf.cc:1664 #10 0x000055ac1b2fdc22 in buf_resize_callback () at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0buf.cc:2002 … Thread 3 (Thread 2508867.2509177): … #13 0x000055ac1ad9e9f0 in ssux_lock_impl<true>::wr_lock (this=0x615000003300) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/include/srw_lock.h:226 #14 0x000055ac1b2df424 in btr_search_drop_page_hash_index (block=0x4bc064c83310) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:1289 #15 0x000055ac1b3505ed in buf_LRU_free_page (bpage=0x4bc064c83310, zip=true) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0lru.cc:959

            Patch is in bb-10.6-MDEV-27700

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.6- MDEV-27700

            MDEV-24863 introduced this race condition. Before that, btr_search_drop_page_hash_index() did not dereference block->index before acquiring a latch on the AHI partition.

            marko Marko Mäkelä added a comment - MDEV-24863 introduced this race condition. Before that, btr_search_drop_page_hash_index() did not dereference block->index before acquiring a latch on the AHI partition.

            The fix is on the right track. However, it is releasing a shared AHI latch, then doing something with the index, and finally acquiring an exclusive latch. During that time, another thread could execute btr_search_lazy_free(index) while btr_search_drop_page_hash_index() is accessing the object.

            I think that btr_search_drop_page_hash_index() must continuously hold an exclusive adaptive hash index latch if index->freed() holds. We should probably first acquire a shared latch, then check index->freed(), and ‘upgrade’ to an exclusive latch if it holds.

            marko Marko Mäkelä added a comment - The fix is on the right track. However, it is releasing a shared AHI latch, then doing something with the index , and finally acquiring an exclusive latch. During that time, another thread could execute btr_search_lazy_free(index) while btr_search_drop_page_hash_index() is accessing the object. I think that btr_search_drop_page_hash_index() must continuously hold an exclusive adaptive hash index latch if index->freed() holds. We should probably first acquire a shared latch, then check index->freed() , and ‘upgrade’ to an exclusive latch if it holds.

            The revised fix looks correct to me. I only suggested some minor optimization.

            Initially, I forgot that during ALTER TABLE, the entire old dict_table_t and all dict_index_t may be freed and reallocated without any index pages being freed or allocated. This is because currently, virtual columns can only be added when the table definition is being loaded from the table. It could be fixed in MDEV-14341.

            marko Marko Mäkelä added a comment - The revised fix looks correct to me. I only suggested some minor optimization. Initially, I forgot that during ALTER TABLE , the entire old dict_table_t and all dict_index_t may be freed and reallocated without any index pages being freed or allocated. This is because currently, virtual columns can only be added when the table definition is being loaded from the table. It could be fixed in MDEV-14341 .

            This looks OK, but you probably will have to adjust mtr_t::defer_drop_ahi() to address a prominent test failure.

            marko Marko Mäkelä added a comment - This looks OK, but you probably will have to adjust mtr_t::defer_drop_ahi() to address a prominent test failure.

            origin/bb-10.6-MDEV-29336 6251ae4ea219a5b14f978b79efd8c8584a655fc5 2022-08-22T16:24:10+03:00
            which contains fixes for MDEV-27700, MDEV-29336 and MDEV-29250 performed well in RQG testing.
            

            mleich Matthias Leich added a comment - origin/bb-10.6-MDEV-29336 6251ae4ea219a5b14f978b79efd8c8584a655fc5 2022-08-22T16:24:10+03:00 which contains fixes for MDEV-27700, MDEV-29336 and MDEV-29250 performed well in RQG testing.

            People

              thiru Thirunarayanan Balathandayuthapani
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.