Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30648

btr_estimate_n_rows_in_range() accesses unfixed, unlatched page

Details

    Description

      Some RQG test ends with a SEGV
      (backtrace taken from server error log)
      # 2023-02-13T05:10:43 [3309421] | ==3313011==ERROR: AddressSanitizer: use-after-poison on address 0x7ff13d319028 at pc 0x5586b23a4291 bp 0x7ff118f792b0 sp 0x7ff118f792a0
      # 2023-02-13T05:10:43 [3309421] | READ of size 1 at 0x7ff13d319028 thread T63
      # 2023-02-13T05:10:43 [3309421] |     #0 0x5586b23a4290 in mach_read_from_2 /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/include/mach0data.inl:85
      # 2023-02-13T05:10:43 [3309421] |     #1 0x5586b23a4290 in page_header_get_field(unsigned char const*, unsigned long) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/include/page0page.h:504
      # 2023-02-13T05:10:43 [3309421] |     #2 0x5586b23a4290 in page_rec_check /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/include/page0page.inl:310
      # 2023-02-13T05:10:43 [3309421] |     #3 0x5586b24080f5 in page_rec_is_infimum(unsigned char const*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/include/page0page.inl:176
      # 2023-02-13T05:10:43 [3309421] |     #4 0x5586b2998c0a in btr_est_cur_t::should_count_the_left_border() const /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/btr/btr0cur.cc:5014
      # 2023-02-13T05:10:43 [3309421] |     #5 0x5586b2998c0a in btr_estimate_n_rows_in_range(dict_index_t*, btr_pos_t*, btr_pos_t*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/btr/btr0cur.cc:5394
      # 2023-02-13T05:10:43 [3309421] |     #6 0x5586b23c4fe1 in ha_innobase::records_in_range(unsigned int, st_key_range const*, st_key_range const*, st_page_range*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/handler/ha_innodb.cc:14341
      # 2023-02-13T05:10:43 [3309421] |     #7 0x5586b15a5d2e in handler::multi_range_read_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/multi_range_read.cc:177
      # 2023-02-13T05:10:43 [3309421] |     #8 0x5586b15aebc3 in DsMrr_impl::dsmrr_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/multi_range_read.cc:1710
      # 2023-02-13T05:10:43 [3309421] |     #9 0x5586b239d028 in ha_innobase::multi_range_read_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/storage/innobase/handler/ha_innodb.cc:20018
      # 2023-02-13T05:10:43 [3309421] |     #10 0x5586b1f094f8 in check_quick_select /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/opt_range.cc:11573
      # 2023-02-13T05:10:43 [3309421] |     #11 0x5586b1f1ef55 in get_key_scans_params /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/opt_range.cc:7484
      # 2023-02-13T05:10:43 [3309421] |     #12 0x5586b1f66f36 in SQL_SELECT::test_quick_select(THD*, Bitmap<64u>, unsigned long long, unsigned long long, bool, bool, bool, bool) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/opt_range.cc:2935
      # 2023-02-13T05:10:43 [3309421] |     #13 0x5586b1115c0e in get_quick_record_count /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_select.cc:5106
      # 2023-02-13T05:10:43 [3309421] |     #14 0x5586b11fc111 in make_join_statistics /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_select.cc:5833
      # 2023-02-13T05:10:43 [3309421] |     #15 0x5586b120e98b in JOIN::optimize_inner() /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_select.cc:2504
      # 2023-02-13T05:10:43 [3309421] |     #16 0x5586b120f293 in JOIN::optimize() /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_select.cc:1843
      # 2023-02-13T05:10:43 [3309421] |     #17 0x5586b120f48f in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_select.cc:5048
      # 2023-02-13T05:10:43 [3309421] |     #18 0x5586b1210f08 in handle_select(THD*, LEX*, select_result*, unsigned long) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_select.cc:554
      # 2023-02-13T05:10:43 [3309421] |     #19 0x5586b1066791 in execute_sqlcom_select /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_parse.cc:6261
      # 2023-02-13T05:10:43 [3309421] |     #20 0x5586b108e2a5 in mysql_execute_command(THD*, bool) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_parse.cc:3949
      # 2023-02-13T05:10:43 [3309421] |     #21 0x5586b109e230 in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_parse.cc:8021
      # 2023-02-13T05:10:43 [3309421] |     #22 0x5586b10a4a6b in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_parse.cc:1896
      # 2023-02-13T05:10:43 [3309421] |     #23 0x5586b10a9aca in do_command(THD*, bool) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_parse.cc:1409
      # 2023-02-13T05:10:43 [3309421] |     #24 0x5586b14c8f33 in do_handle_one_connection(CONNECT*, bool) /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_connect.cc:1416
      # 2023-02-13T05:10:43 [3309421] |     #25 0x5586b14c9599 in handle_one_connection /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638/sql/sql_connect.cc:1318
      # 2023-02-13T05:10:43 [3309421] |     #26 0x7ff1495a7b42 in start_thread nptl/pthread_create.c:442
      # 2023-02-13T05:10:43 [3309421] |     #27 0x7ff1496399ff  (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff)
      ...
      # 2023-02-13T05:10:43 [3309421] | Query (0x62b0001e3238): SELECT SUM(`pad`) FROM `oltp3` WHERE `id` BETWEEN 9294 AND 59338 /* E_R Thread19 QNO 11060 CON_ID 32 */
      ...
      # 2023-02-13T05:10:43 [3309421] | Status: NOT_KILLED
      origin/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638 1758349676aebb176e4c6d50b2eaa78b7298fe39 2023-02-13T10:37:23+02:00
       
      sdp:/data1/results/1676285404/SEGV/
      gdb -c ./1/data/core.3313011 /data/Server_bin/bb-10.6-MDEV-26055-2-MDEV-26827-MDEV-30134-MDEV-30638_asan_Og/bin/mysqld
       
      Per initial analysis of Marko the problem should be in the main trees too.
       
      RQG
      ------
      # git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
      #
      # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 59501a648871529e16fe0ae18cb24183685590e1 2023-02-13T11:22:10+01:00
      # rqg.pl  : Version 4.2.1 (2022-12)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/oltp-transactional.yy \
      # --gendata=conf/mariadb/oltp.zz \
      # --max_gd_duration=900 \
      # --mysqld=--transaction-isolation=REPEATABLE-READ \
      # --validator=SelectStability \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma.so \
      # --mysqld=--plugin-load-add=provider_snappy.so \
      # --mysqld=--plugin-load-add=provider_lz4.so \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--innodb_file_per_table=0 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=off \
      # --mysqld=--innodb_adaptive_hash_index=off \
      # --mysqld=--innodb_sort_buffer_size=65536 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb_use_native_aio=1 \
      # --mysqld=--innodb_undo_log_truncate=OFF \
      # --mysqld=--loose_innodb_change_buffering=purges \
      # --mysqld=--innodb_undo_tablespaces=3 \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --mysqld=--innodb_page_size=4K \
      # --mysqld=--innodb-buffer-pool-size=5M \
      # <local settings>
      

      Attachments

        Issue Links

          Activity

            mleich produced an rr replay trace where the interesting part of the error log output starts like this.

            ==3866041==ERROR: AddressSanitizer: use-after-poison on address 0x7fb1fa1cc028 at pc 0x559eb6b2fc6b bp 0x7fb1dff49210 sp 0x7fb1dff49200
            READ of size 1 at 0x7fb1fa1cc028 thread T39
                #0 0x559eb6b2fc6a in mach_read_from_2 /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/mach0data.inl:85
                #1 0x559eb6b2fc6a in page_header_get_field(unsigned char const*, unsigned long) /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.h:504
                #2 0x559eb6b2fc6a in page_rec_check /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:310
                #3 0x559eb6b94c41 in page_rec_is_infimum(unsigned char const*) /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:176
                #4 0x559eb71354ab in btr_est_cur_t::should_count_the_left_border() const /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5214
                #5 0x559eb71354ab in btr_estimate_n_rows_in_range(dict_index_t*, btr_pos_t*, btr_pos_t*) /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5594
            

            The block was indeed poisoned (evicted from the buffer pool) by the page cleaner thread:

            #4  buf_LRU_block_free_non_file_page (block=block@entry=0x7fb1fa0a2980) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0lru.cc:1012
            #5  0x0000559eb71d7a41 in buf_LRU_block_free_hashed_page (block=block@entry=0x7fb1fa0a2980) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0lru.cc:132
            #6  0x0000559eb71dd867 in buf_LRU_free_page (bpage=bpage@entry=0x7fb1fa0a2980, zip=zip@entry=true) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0lru.cc:991
            #7  0x0000559eb71c9195 in buf_flush_LRU_list_batch (max=max@entry=2000, evict=evict@entry=false, n=n@entry=0x7fb1f7e94a50) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:1267
            #8  0x0000559eb71c9e6a in buf_do_LRU_batch (max=max@entry=2000, evict=evict@entry=false, n=n@entry=0x7fb1f7e94a50) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:1382
            #9  0x0000559eb71ca195 in buf_flush_LRU (max_n=max_n@entry=2000, evict=evict@entry=false) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:1728
            #10 0x0000559eb71cecc1 in buf_flush_page_cleaner () at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:2338
            

            At this time, the problematic thread was executing a search on p2:

            Thread 3 (Thread 3866041.3879434 (mariadbd)):
            #0  0x0000559eb6e1e235 in cmp_dtuple_rec_with_match_low (dtuple=dtuple@entry=0x6170001c4008, rec=rec@entry=0x7fb1fa3b46ed "", offsets=offsets@entry=0x7fb1dff48af0, n_cmp=1, matched_fields=matched_fields@entry=0x7fb1dff48ab0) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/rem/rem0cmp.cc:421
            #1  0x0000559eb6da503d in page_cur_search_with_match (tuple=0x6170001c4008, mode=PAGE_CUR_G, iup_matched_fields=iup_matched_fields@entry=0x7fb1dff495d0, ilow_matched_fields=<optimized out>, cursor=cursor@entry=0x7fb1dff49590, rtr_info=rtr_info@entry=0x0) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/page/page0cur.cc:472
            #2  0x0000559eb714194a in btr_est_cur_t::search_on_page (this=this@entry=0x7fb1dff49570, level=level@entry=0, root_height=root_height@entry=2, left=left@entry=false) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5158
            #3  0x0000559eb7133f3d in btr_estimate_n_rows_in_range (index=index@entry=0x616000851b08, range_start=range_start@entry=0x7fb1dff49e00, range_end=range_end@entry=0x7fb1dff49e40) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5521
            #4  0x0000559eb6b51542 in ha_innobase::records_in_range (this=0x61d000aa46b8, keynr=<optimized out>, min_key=<optimized out>, max_key=<optimized out>, pages=<optimized out>) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/handler/ha_innodb.cc:14381
            

            The use-after-poison will be reported a little later on p1:

            #0  __asan::__asan_report_load1 (addr=140402382127144) at ../../../../src/libsanitizer/asan/asan_rtl.cpp:118
            #1  0x0000559eb6b2fc6b in mach_read_from_2 (b=0x7fb1fa1cc028 "\016&\200\023") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/mach0data.inl:85
            #2  page_header_get_field (field=2, page=0x7fb1fa1cc000 "") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.h:504
            #3  page_rec_check (rec=0x7fb1fa1cc070 "supremum") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:310
            #4  0x0000559eb6b94c42 in page_rec_is_infimum (rec=0x7fb1fa1cc070 "supremum") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:176
            #5  0x0000559eb71354ac in btr_est_cur_t::should_count_the_left_border (this=0x7fb1dff494d0) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5214
            #6  btr_estimate_n_rows_in_range (index=index@entry=0x616000851b08, range_start=range_start@entry=0x7fb1dff49e00, range_end=range_end@entry=0x7fb1dff49e40)
                at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5594
            

            A buffer-fix on the block descriptor 0x7fb1fa0a2980 would prevent the block from being evicted. The buffer-fix as well as a shared latch on the page had been released here, prematurely, it seems:

            0x0000559eb7133f55 in btr_est_cur_t::fetch_child (right_parent=0x0, mtr=@0x7fb1dff498f0: {m_start = true, m_commit = false, m_freeing_tree = false, m_last = 0x0, m_last_offset = 0, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_user_space_id = 0, m_memo = {<small_vector_base> = {BeginX = 0x7fb1dff49918, Size = 4, Capacity = 16}, small = {{object = 0x616000851c80, type = MTR_MEMO_S_LOCK}, {object = 0x7fb1fa0a4d40, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x7fb1fa0a2980, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x7fb1fa0bb600, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x7fb1fa0aa500, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x4, type = 7}, {object = 0x7fb2072972e2 <MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)+178>, type = 58897496}, {object = 0x90e07dbe11f5c400, type = 7}, {object = 0x6170001c4040, type = 451848}, {object = 0x6210007dddc0, type = 6}, {object = 0x7fb1dff49d20, type = 8723208}, {object = 0x559eb6f71501 <row_sel_convert_mysql_key_to_innobase(dtuple_t*, unsigned char*, unsigned long, dict_index_t*, unsigned char const*, unsigned long)+3890>, type = 3757349376}, {object = 0x0, type = 8248772}, {object = 0x621000d36b64, type = 0}, {object = 0x621000d36b60, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x8, type = 1851400}}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7fb1dff49a40, prev = 0x7fb1dff49a40}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7fb1dff49a20, prev = 0x7fb1dff49a20}, m_buf_end = 0, m_magic_n = 375767, m_data = "\004\000\000\000\000\000\000\000.\272+\a\262\177\000\000X\223\376;\366\017\000\000\300t\032\000 a\000\000X\223\376;\366\017\000\000\000\000\000\000\000\000\000\000 \235\364߱\177\000\000\354~ֶ\236U\000\000\300\a\001\000``\000\000\300\232\364߱\177\000\000 \001\000\000\000\000\000\000pv(\000\375\067\000\000\263\212\265A\000\000\000\000`\347߷\236U\000\000\235{ֶ\236U\000\000@\366\364߱\177\000\000 \001\000\000\000\000\000\000\360\232\364߱\177\000\000\263\212\265A\000\000\000\000@tෞU\000\000vhض\236U\000\000`\365\216\267\236U\000\000\374\023ʶ\236U\000\000\000\304\365\021\276}\340\220P\233\364߱\177\000\000\000\304\365\021\276}\340\220\200\301\002\000"..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}, level=<optimized out>, this=0x7fb1dff49570) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5130
            5130	      mtr.rollback_to_savepoint(parent_savepoint, parent_savepoint + 1);
            (rr) info locals
            parent_block = 0x7fb1fa09eaf0
            parent_savepoint = 2
            

            marko Marko Mäkelä added a comment - mleich produced an rr replay trace where the interesting part of the error log output starts like this. ==3866041==ERROR: AddressSanitizer: use-after-poison on address 0x7fb1fa1cc028 at pc 0x559eb6b2fc6b bp 0x7fb1dff49210 sp 0x7fb1dff49200 READ of size 1 at 0x7fb1fa1cc028 thread T39 #0 0x559eb6b2fc6a in mach_read_from_2 /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/mach0data.inl:85 #1 0x559eb6b2fc6a in page_header_get_field(unsigned char const*, unsigned long) /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.h:504 #2 0x559eb6b2fc6a in page_rec_check /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:310 #3 0x559eb6b94c41 in page_rec_is_infimum(unsigned char const*) /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:176 #4 0x559eb71354ab in btr_est_cur_t::should_count_the_left_border() const /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5214 #5 0x559eb71354ab in btr_estimate_n_rows_in_range(dict_index_t*, btr_pos_t*, btr_pos_t*) /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5594 The block was indeed poisoned (evicted from the buffer pool) by the page cleaner thread: #4 buf_LRU_block_free_non_file_page (block=block@entry=0x7fb1fa0a2980) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0lru.cc:1012 #5 0x0000559eb71d7a41 in buf_LRU_block_free_hashed_page (block=block@entry=0x7fb1fa0a2980) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0lru.cc:132 #6 0x0000559eb71dd867 in buf_LRU_free_page (bpage=bpage@entry=0x7fb1fa0a2980, zip=zip@entry=true) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0lru.cc:991 #7 0x0000559eb71c9195 in buf_flush_LRU_list_batch (max=max@entry=2000, evict=evict@entry=false, n=n@entry=0x7fb1f7e94a50) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:1267 #8 0x0000559eb71c9e6a in buf_do_LRU_batch (max=max@entry=2000, evict=evict@entry=false, n=n@entry=0x7fb1f7e94a50) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:1382 #9 0x0000559eb71ca195 in buf_flush_LRU (max_n=max_n@entry=2000, evict=evict@entry=false) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:1728 #10 0x0000559eb71cecc1 in buf_flush_page_cleaner () at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/buf/buf0flu.cc:2338 At this time, the problematic thread was executing a search on p2 : Thread 3 (Thread 3866041.3879434 (mariadbd)): #0 0x0000559eb6e1e235 in cmp_dtuple_rec_with_match_low (dtuple=dtuple@entry=0x6170001c4008, rec=rec@entry=0x7fb1fa3b46ed "", offsets=offsets@entry=0x7fb1dff48af0, n_cmp=1, matched_fields=matched_fields@entry=0x7fb1dff48ab0) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/rem/rem0cmp.cc:421 #1 0x0000559eb6da503d in page_cur_search_with_match (tuple=0x6170001c4008, mode=PAGE_CUR_G, iup_matched_fields=iup_matched_fields@entry=0x7fb1dff495d0, ilow_matched_fields=<optimized out>, cursor=cursor@entry=0x7fb1dff49590, rtr_info=rtr_info@entry=0x0) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/page/page0cur.cc:472 #2 0x0000559eb714194a in btr_est_cur_t::search_on_page (this=this@entry=0x7fb1dff49570, level=level@entry=0, root_height=root_height@entry=2, left=left@entry=false) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5158 #3 0x0000559eb7133f3d in btr_estimate_n_rows_in_range (index=index@entry=0x616000851b08, range_start=range_start@entry=0x7fb1dff49e00, range_end=range_end@entry=0x7fb1dff49e40) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5521 #4 0x0000559eb6b51542 in ha_innobase::records_in_range (this=0x61d000aa46b8, keynr=<optimized out>, min_key=<optimized out>, max_key=<optimized out>, pages=<optimized out>) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/handler/ha_innodb.cc:14381 The use-after-poison will be reported a little later on p1 : #0 __asan::__asan_report_load1 (addr=140402382127144) at ../../../../src/libsanitizer/asan/asan_rtl.cpp:118 #1 0x0000559eb6b2fc6b in mach_read_from_2 (b=0x7fb1fa1cc028 "\016&\200\023") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/mach0data.inl:85 #2 page_header_get_field (field=2, page=0x7fb1fa1cc000 "") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.h:504 #3 page_rec_check (rec=0x7fb1fa1cc070 "supremum") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:310 #4 0x0000559eb6b94c42 in page_rec_is_infimum (rec=0x7fb1fa1cc070 "supremum") at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/include/page0page.inl:176 #5 0x0000559eb71354ac in btr_est_cur_t::should_count_the_left_border (this=0x7fb1dff494d0) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5214 #6 btr_estimate_n_rows_in_range (index=index@entry=0x616000851b08, range_start=range_start@entry=0x7fb1dff49e00, range_end=range_end@entry=0x7fb1dff49e40) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5594 A buffer-fix on the block descriptor 0x7fb1fa0a2980 would prevent the block from being evicted. The buffer-fix as well as a shared latch on the page had been released here, prematurely, it seems: 0x0000559eb7133f55 in btr_est_cur_t::fetch_child (right_parent=0x0, mtr=@0x7fb1dff498f0: {m_start = true, m_commit = false, m_freeing_tree = false, m_last = 0x0, m_last_offset = 0, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_user_space_id = 0, m_memo = {<small_vector_base> = {BeginX = 0x7fb1dff49918, Size = 4, Capacity = 16}, small = {{object = 0x616000851c80, type = MTR_MEMO_S_LOCK}, {object = 0x7fb1fa0a4d40, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x7fb1fa0a2980, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x7fb1fa0bb600, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x7fb1fa0aa500, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x4, type = 7}, {object = 0x7fb2072972e2 <MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)+178>, type = 58897496}, {object = 0x90e07dbe11f5c400, type = 7}, {object = 0x6170001c4040, type = 451848}, {object = 0x6210007dddc0, type = 6}, {object = 0x7fb1dff49d20, type = 8723208}, {object = 0x559eb6f71501 <row_sel_convert_mysql_key_to_innobase(dtuple_t*, unsigned char*, unsigned long, dict_index_t*, unsigned char const*, unsigned long)+3890>, type = 3757349376}, {object = 0x0, type = 8248772}, {object = 0x621000d36b64, type = 0}, {object = 0x621000d36b60, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x8, type = 1851400}}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7fb1dff49a40, prev = 0x7fb1dff49a40}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7fb1dff49a20, prev = 0x7fb1dff49a20}, m_buf_end = 0, m_magic_n = 375767, m_data = "\004\000\000\000\000\000\000\000.\272+\a\262\177\000\000X\223\376;\366\017\000\000\300t\032\000 a\000\000X\223\376;\366\017\000\000\000\000\000\000\000\000\000\000 \235\364߱\177\000\000\354~ֶ\236U\000\000\300\a\001\000``\000\000\300\232\364߱\177\000\000 \001\000\000\000\000\000\000pv(\000\375\067\000\000\263\212\265A\000\000\000\000`\347߷\236U\000\000\235{ֶ\236U\000\000@\366\364߱\177\000\000 \001\000\000\000\000\000\000\360\232\364߱\177\000\000\263\212\265A\000\000\000\000@tෞU\000\000vhض\236U\000\000`\365\216\267\236U\000\000\374\023ʶ\236U\000\000\000\304\365\021\276}\340\220P\233\364߱\177\000\000\000\304\365\021\276}\340\220\200\301\002\000"..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}, level=<optimized out>, this=0x7fb1dff49570) at /data/Server/bb-10.6-MDEV-30986A/storage/innobase/btr/btr0cur.cc:5130 5130 mtr.rollback_to_savepoint(parent_savepoint, parent_savepoint + 1); (rr) info locals parent_block = 0x7fb1fa09eaf0 parent_savepoint = 2
            vlad.lesin Vladislav Lesin added a comment - - edited

            The issue is caused by MDEV-30400 fix.

            There are two cursors in btr_estimate_n_rows_in_range() - p1 and p2, but both share the same mtr. Each cursor contains mtr savepoint for the previously fetched block to release it then the current block is fetched.

            Before MDEV-30400 the block was released with mtr_t::release_block_at_savepoint(), it just unfixed a block and released its page patch. In MDEV-30400 it was replaced with mtr_t::rollback_to_savepoint(), which does the same as the former mtr_t::release_block_at_savepoint(ulint begin, ulint end) but also erases the corresponding slots from mtr memo, what invalidates any stored mtr's memo savepoints, greater or equal to "begin".

            In this bug we have p1 and p2 with p1.m_savepoint = p2.m_savepoint = s. This case is possible when p1 and p2 have the same parent page, p1 does not unfix and not release its parent during fetching to let p2 to do this. After p1 is fetched, its p1.m_savepoint = s + 1, as it latches the fetched page and does not release the parent page. When p2 is fetching, it releases its parent with mtr_t::release_block_at_savepoint(s, s + 1) call. After s slot of mtr's memo is erased, p1.m_savepoint becomes invalidated, and the same value as p2.m_savepoint.

            On the next p1 fetching p1.fetch() call releases the block of p2 instead of its parent block. And p1.m_savepoint = p2.m_savepoint = s again, as btr_block_get(...) call from p1.fetch() adds one element in mtr's memo, but mtr_t::release_block_at_savepoint(s, s + 1) erases one element from it.

            So the next p2.fetch() releases p1.m_block. And then, when p1.should_count_the_left_border() is called, we have access to unfixed and unlatched page.

            vlad.lesin Vladislav Lesin added a comment - - edited The issue is caused by MDEV-30400 fix. There are two cursors in btr_estimate_n_rows_in_range() - p1 and p2, but both share the same mtr. Each cursor contains mtr savepoint for the previously fetched block to release it then the current block is fetched. Before MDEV-30400 the block was released with mtr_t::release_block_at_savepoint(), it just unfixed a block and released its page patch. In MDEV-30400 it was replaced with mtr_t::rollback_to_savepoint(), which does the same as the former mtr_t::release_block_at_savepoint(ulint begin, ulint end) but also erases the corresponding slots from mtr memo, what invalidates any stored mtr's memo savepoints, greater or equal to "begin". In this bug we have p1 and p2 with p1.m_savepoint = p2.m_savepoint = s. This case is possible when p1 and p2 have the same parent page, p1 does not unfix and not release its parent during fetching to let p2 to do this. After p1 is fetched, its p1.m_savepoint = s + 1, as it latches the fetched page and does not release the parent page. When p2 is fetching, it releases its parent with mtr_t::release_block_at_savepoint(s, s + 1) call. After s slot of mtr's memo is erased, p1.m_savepoint becomes invalidated, and the same value as p2.m_savepoint. On the next p1 fetching p1.fetch() call releases the block of p2 instead of its parent block. And p1.m_savepoint = p2.m_savepoint = s again, as btr_block_get(...) call from p1.fetch() adds one element in mtr's memo, but mtr_t::release_block_at_savepoint(s, s + 1) erases one element from it. So the next p2.fetch() releases p1.m_block. And then, when p1.should_count_the_left_border() is called, we have access to unfixed and unlatched page.

            The idea of the fix is to get rid of savepoints at all in btr_estimate_n_rows_in_range(). See the slides for details.

            vlad.lesin Vladislav Lesin added a comment - The idea of the fix is to get rid of savepoints at all in btr_estimate_n_rows_in_range(). See the slides for details.
            mleich Matthias Leich added a comment - - edited

            origin/bb-10.6-MDEV-30648-btr_estimate_n_rows_in_range 9671724cdec6111ddb85071ab382bd1062af6ffa 2023-06-26T11:59:22+03:00 performed well in RQG testing.
            Regarding the current MDEV
            - the source tree with the fix:  no replay of the bad effect within 3200 tests
            - the source tree with the fix removed: 7 replays of the bad effect within 1798 runs
            

            mleich Matthias Leich added a comment - - edited origin/bb-10.6-MDEV-30648-btr_estimate_n_rows_in_range 9671724cdec6111ddb85071ab382bd1062af6ffa 2023-06-26T11:59:22+03:00 performed well in RQG testing. Regarding the current MDEV - the source tree with the fix: no replay of the bad effect within 3200 tests - the source tree with the fix removed: 7 replays of the bad effect within 1798 runs

            Thank you, very good work. Before MDEV-30400, there was some "double bookkeeping" in the B-tree cursor routines, which duplicated the contents of mtr_t::m_memo.

            marko Marko Mäkelä added a comment - Thank you, very good work. Before MDEV-30400 , there was some "double bookkeeping" in the B-tree cursor routines, which duplicated the contents of mtr_t::m_memo .

            People

              vlad.lesin Vladislav Lesin
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.