[MDEV-27700] SUMMARY: AddressSanitizer: heap-use-after-free storage/innobase/btr/btr0sea.cc:1305 in btr_search_drop_page_hash_index(buf_block_t*) Created: 2022-02-01  Updated: 2023-11-16  Resolved: 2022-08-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.38, 10.3.29, 10.4.19, 10.5.10, 10.6.0, 10.6.6
Fix Version/s: 10.9.2, 10.10.1, 10.3.37, 10.4.27, 10.5.18, 10.6.10, 10.7.6, 10.8.5

Type: Bug Priority: Critical
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: race, rr-profile

Attachments: File rqg.log    
Issue Links:
Problem/Incident
is caused by MDEV-22456 Dropping the adaptive hash index may ... Closed
is caused by MDEV-24863 InnoDB: Failing assertion: index == c... Closed
Relates
relates to MDEV-20203 InnoDB: Failing assertion: (block)->i... Confirmed
relates to MDEV-28462 AddressSanitizer: use-after-poison d... Closed

 Description   

origin/bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f 2022-01-21T21:00:54+03:00
Per Vladislav his modifications for MDEV-20605 cannot be the reason for the bad effect.
 
[rr 2508867 1214435][rr 2508867 1214439]==2508867==ERROR: AddressSanitizer: heap-use-after-free on address 0x616004721520 at pc 0x55ac1b2df542 bp 0x2b8f042ec780 sp 0x2b8f042ec770
[rr 2508867 1214442][rr 2508867 1214444]READ of size 8 at 0x616004721520 thread T2
[rr 2508867 1214630]2022-01-31 17:21:11 40 [Warning] Aborted connection 40 to db: 'test' user: 'root' host: 'localhost' (This connection closed normally)
[rr 2508867 1220937]    #0 0x55ac1b2df541 in btr_search_drop_page_hash_index(buf_block_t*) /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:1305
[rr 2508867 1220954]    #1 0x55ac1b3505ec in buf_LRU_free_page(buf_page_t*, bool) /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0lru.cc:959
[rr 2508867 1220974]    #2 0x55ac1b33a539 in buf_flush_discard_page /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:1222
[rr 2508867 1220986]    #3 0x55ac1b33bf07 in buf_do_flush_list_batch /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:1427
[rr 2508867 1220988]    #4 0x55ac1b33c964 in buf_flush_list /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:1511
[rr 2508867 1220990]    #5 0x55ac1b342475 in buf_flush_page_cleaner /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0flu.cc:2320
[rr 2508867 1220992]    #6 0x55ac1b348f83 in void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79f83)
[rr 2508867 1220994]    #7 0x55ac1b348ebf in std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79ebf)
[rr 2508867 1221004]    #8 0x55ac1b348de5 in void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79de5)
[rr 2508867 1221006]    #9 0x55ac1b348d66 in std::thread::_Invoker<std::tuple<void (*)()> >::operator()() (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79d66)
[rr 2508867 1221008]    #10 0x55ac1b348ccb in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() (/data/Server_bin/bb-10.6-MDEV-20605-cur-pos-fix_asan/bin/mariadbd+0x2d79ccb)
[rr 2508867 1221010]    #11 0x7f797408dde3  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd6de3)
[rr 2508867 1221016]    #12 0x17447d5c9608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
[rr 2508867 1221018]    #13 0x7f7973ee5292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)
...
rr 2508867 1221774]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:1305 in btr_search_drop_page_hash_index(buf_block_t*)
[rr 2508867 1222208]Shadow bytes around the buggy address:
  0x0c2c808dc250: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c808dc260: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c808dc270: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
  0x0c2c808dc280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2c808dc290: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c2c808dc2a0: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c808dc2b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c808dc2c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c808dc2d0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
  0x0c2c808dc2e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2c808dc2f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
 
pluto:/data/results/1643649542/TBR-1001/dev/shm/rqg/1643649542/48/1/rr
 
RQG
====
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental 067340e8ec78c25880ccd2e67dd62ad134edc531 2022-01-31T17:10:39+01:00
# rqg.pl  : Version 4.0.4 (2021-12)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/table_stress_innodb_dml.yy \
# --gendata=conf/mariadb/table_stress.zz \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma \
# --mysqld=--plugin-load-add=provider_snappy \
# --mysqld=--plugin-load-add=provider_lz4 \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--innodb_stats_persistent=off \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=9 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--chaos --wait \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --vardir_type=fast \
# --mysqld=--innodb_page_size=4K \
# --mysqld=--innodb-buffer-pool-size=5M \
# --no_mask \
# <local settings>



 Comments   
Comment by Marko Mäkelä [ 2022-08-02 ]

This appears to be a race condition in the MDEV-22456 "lazy freeing" code when another thread is concurrently disabling the adaptive hash index. It occurred during buffer pool resizing:

ssh pluto
rr replay /data/results/1643649542/TBR-1001/dev/shm/rqg/1643649542/48/1/rr/latest-trace

watch -l *(char*)0x0c2c808dc2a4
reverse-continue
reverse-continue
backtrace
thread apply 3 backtrace

At the time the memory was freed, Thread 3, which would later trip heap-use-after-free, was already trying to drop the adaptive hash index:

(rr) backtrace
#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:190
#1  0x000028850b6d49d9 in ?? () from /lib/x86_64-linux-gnu/libasan.so.5
#2  0x000028850b7b5799 in free () from /lib/x86_64-linux-gnu/libasan.so.5
#3  0x000055ac1af6db6e in mem_heap_block_free (heap=0x611000193b00, block=0x616004721480) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/mem/mem0mem.cc:416
#4  0x000055ac1b3c150e in mem_heap_free (heap=0x611000193b00) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/include/mem0mem.ic:419
#5  0x000055ac1b3c9e41 in dict_mem_index_free (index=0x616004721508) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/dict/dict0mem.cc:1096
#6  0x000055ac1b2d92fa in btr_search_lazy_free (index=0x616004721508) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:213
#7  0x000055ac1b2e8874 in buf_pool_t::clear_hash_index (this=0x55ac1d344d00 <buf_pool>) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:953
#8  0x000055ac1b2d95ea in btr_search_disable () at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:262
#9  0x000055ac1b319553 in buf_pool_t::resize (this=0x55ac1d344d00 <buf_pool>) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0buf.cc:1664
#10 0x000055ac1b2fdc22 in buf_resize_callback () at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0buf.cc:2002
Thread 3 (Thread 2508867.2509177):
#13 0x000055ac1ad9e9f0 in ssux_lock_impl<true>::wr_lock (this=0x615000003300) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/include/srw_lock.h:226
#14 0x000055ac1b2df424 in btr_search_drop_page_hash_index (block=0x4bc064c83310) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0sea.cc:1289
#15 0x000055ac1b3505ed in buf_LRU_free_page (bpage=0x4bc064c83310, zip=true) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/buf/buf0lru.cc:959

Comment by Thirunarayanan Balathandayuthapani [ 2022-08-03 ]

Patch is in bb-10.6-MDEV-27700

Comment by Marko Mäkelä [ 2022-08-03 ]

MDEV-24863 introduced this race condition. Before that, btr_search_drop_page_hash_index() did not dereference block->index before acquiring a latch on the AHI partition.

Comment by Marko Mäkelä [ 2022-08-03 ]

The fix is on the right track. However, it is releasing a shared AHI latch, then doing something with the index, and finally acquiring an exclusive latch. During that time, another thread could execute btr_search_lazy_free(index) while btr_search_drop_page_hash_index() is accessing the object.

I think that btr_search_drop_page_hash_index() must continuously hold an exclusive adaptive hash index latch if index->freed() holds. We should probably first acquire a shared latch, then check index->freed(), and ‘upgrade’ to an exclusive latch if it holds.

Comment by Marko Mäkelä [ 2022-08-05 ]

The revised fix looks correct to me. I only suggested some minor optimization.

Initially, I forgot that during ALTER TABLE, the entire old dict_table_t and all dict_index_t may be freed and reallocated without any index pages being freed or allocated. This is because currently, virtual columns can only be added when the table definition is being loaded from the table. It could be fixed in MDEV-14341.

Comment by Marko Mäkelä [ 2022-08-16 ]

This looks OK, but you probably will have to adjust mtr_t::defer_drop_ahi() to address a prominent test failure.

Comment by Matthias Leich [ 2022-08-22 ]

origin/bb-10.6-MDEV-29336 6251ae4ea219a5b14f978b79efd8c8584a655fc5 2022-08-22T16:24:10+03:00
which contains fixes for MDEV-27700, MDEV-29336 and MDEV-29250 performed well in RQG testing.

Generated at Thu Feb 08 09:54:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.