Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25525

AddressSanitizer: use-after-poison in row_sel_convert_mysql_key_to_innobase

Details

    Description

      Work flow:
      1. Start the server and generate some initial data
      2. One sessions runs a DDL/DML mix
      3. During 2. is ongoing send SIGKILL to the DB server
      4. Restart attempt with success
      5. Checks cause that the server crashes
      The server error log does not contain any entry with
          [ERROR] InnoDB:   <whatever> 
      Around the end follows
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 65567][rr 4067420 65571]==4067420==ERROR: AddressSanitizer: use-after-poison on address 0x6120000992a8 at pc 0x55c7e10a0e1c bp 0x690d1ba63f60 sp 0x690d1ba63f50
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 65574][rr 4067420 65576]READ of size 8 at 0x6120000992a8 thread T24
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71545]    #0 0x55c7e10a0e1b in row_sel_convert_mysql_key_to_innobase(dtuple_t*, unsigned char*, unsigned long, dict_index_t*, unsigned char const*, unsigned long) /data/Server/10.6alpha/storage/innobase/row/row0sel.cc:2529
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71582]    #1 0x55c7e0cd5099 in ha_innobase::records_in_range(unsigned int, st_key_range const*, st_key_range const*, st_page_range*) /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:13601
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71586]    #2 0x55c7e00adcf8 in handler::multi_range_read_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/Server/10.6alpha/sql/multi_range_read.cc:177
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71588]    #3 0x55c7e00b8e11 in DsMrr_impl::dsmrr_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/Server/10.6alpha/sql/multi_range_read.cc:1708
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71598]    #4 0x55c7e0cea3b3 in ha_innobase::multi_range_read_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:19522
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71614]    #5 0x55c7e077783a in check_quick_select /data/Server/10.6alpha/sql/opt_range.cc:11558
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71616]    #6 0x55c7e0760f99 in get_key_scans_params /data/Server/10.6alpha/sql/opt_range.cc:7462
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71626]    #7 0x55c7e0746215 in SQL_SELECT::test_quick_select(THD*, Bitmap<64u>, unsigned long long, unsigned long long, bool, bool, bool, bool) /data/Server/10.6alpha/sql/opt_range.cc:2931
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71654]    #8 0x55c7dfcd4e73 in get_quick_record_count /data/Server/10.6alpha/sql/sql_select.cc:4898
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71656]    #9 0x55c7dfcdbc23 in make_join_statistics /data/Server/10.6alpha/sql/sql_select.cc:5630
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71658]    #10 0x55c7dfcbd097 in JOIN::optimize_inner() /data/Server/10.6alpha/sql/sql_select.cc:2325
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71660]    #11 0x55c7dfcb6f63 in JOIN::optimize() /data/Server/10.6alpha/sql/sql_select.cc:1694
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71670]    #12 0x55c7dfcd478d in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /data/Server/10.6alpha/sql/sql_select.cc:4840
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71672]    #13 0x55c7dfcaa0ae in handle_select(THD*, LEX*, select_result*, unsigned long) /data/Server/10.6alpha/sql/sql_select.cc:446
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71690]    #14 0x55c7dfc2870f in execute_sqlcom_select /data/Server/10.6alpha/sql/sql_parse.cc:6244
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71692]    #15 0x55c7dfc1797e in mysql_execute_command(THD*) /data/Server/10.6alpha/sql/sql_parse.cc:3940
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71694]    #16 0x55c7dfc32737 in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.6alpha/sql/sql_parse.cc:8018
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71696]    #17 0x55c7dfc0b021 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/10.6alpha/sql/sql_parse.cc:1897
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71698]    #18 0x55c7dfc083f6 in do_command(THD*, bool) /data/Server/10.6alpha/sql/sql_parse.cc:1406
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71704]    #19 0x55c7dffecb6f in do_handle_one_connection(CONNECT*, bool) /data/Server/10.6alpha/sql/sql_connect.cc:1410
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71706]    #20 0x55c7dffec4d1 in handle_one_connection /data/Server/10.6alpha/sql/sql_connect.cc:1312
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71714]    #21 0x694c18963608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
      # 2021-04-26T05:56:58 [3968169] | [rr 4067420 71716]    #22 0x24d603f12292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)
      ...
      SUMMARY: AddressSanitizer: use-after-poison /data/Server/10.6alpha/storage/innobase/row/row0sel.cc:2529 in row_sel_convert_mysql_key_to_innobase(dtuple_t*, unsigned char*, unsigned long, dict_index_t*, unsigned char const*, unsigned long)
      Query (0x62b0000a8238): SELECT * FROM `test`.`t1` FORCE INDEX (`idx1`) WHERE `col_int` >= -9223372036854775808 OR `col_int` IS NULL OR `col_int` IS NOT NULL
       
      sdp:/data/Results/1619433372/TBR-1036/dev/shm/vardir/1619433372/37/1/rr
      _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0 # Fate of DB server till SIGKILL
      _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1 # Fate of DB server after restart
       
      /data/Results/1619433372/TBR-1036/dev/shm/vardir/1619433372/37/1/data_copy
          Copy of the datadir of the server after SIGKILL before restart
       
       
      10.6 1a647b700f6b72dc97211510a5d0c647d5d3d911 2021-04-23T10:07:08+03:00
       
      RQG
      -------
      git clone https://github.com/mleich1/rqg --branch experimental RQG
       
      perl rqg.pl \
      --grammar=conf/mariadb/table_stress_innodb.yy \
      --gendata=conf/mariadb/table_stress.zz \
      --gendata_sql=conf/mariadb/table_stress.sql \
      --reporters=CrashRecovery1 \
      --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log-bin \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=240 \
      --mysqld=--plugin-load-add=file_key_management.so \
      --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      --duration=300 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --mysqld=--loose-innodb-sync-debug \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--innodb_adaptive_hash_index=off \
      --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      --mysqld=--loose-max-statement-time=30 \
      --threads=1 \
      --mysqld=--innodb-use-native-aio=0 \
      --rr=Extended \
      --rr_options="--chaos --wait" \
      --mysqld=--innodb_page_size=8K \
      --mysqld=--innodb-buffer-pool-size=256M \
      --duration=300 \
      --no_mask \
      --workdir=<local settings >\
      --vardir=<local settings >\
      --mtr-build-thread=<local settings >\
      --basedir1=<local settings >\
      --script_debug=_nix_
      
      

      Attachments

        Issue Links

          Activity

            Error pattern for RQG
            [ 'TBR-1036', 'INFO: Reporter \'CrashRecovery1\': Attempting database recovery using the server.+ERROR:  SELECT .{1,40} FROM .{1,120} FORCE INDEX .{1,300} harvested 2013.+\[Note\] InnoDB: Starting crash recovery from checkpoint LSN.+SUMMARY: AddressSanitizer: use-after-poison .{1,200}row0sel\.cc:.{1,20}in row_sel_convert_mysql_key_to_innobase\(dtuple_t\*, unsigned char\*, unsigned long, dict_index_t\*, unsigned char const\*, unsigned long\).+RESULT: The RQG run ended with status STATUS_RECOVERY_FAILURE' ],
             
            Archive: sdp:/data/Results/1619433372/004418.tar.xz
            

            mleich Matthias Leich added a comment - Error pattern for RQG [ 'TBR-1036', 'INFO: Reporter \'CrashRecovery1\': Attempting database recovery using the server.+ERROR: SELECT .{1,40} FROM .{1,120} FORCE INDEX .{1,300} harvested 2013.+\[Note\] InnoDB: Starting crash recovery from checkpoint LSN.+SUMMARY: AddressSanitizer: use-after-poison .{1,200}row0sel\.cc:.{1,20}in row_sel_convert_mysql_key_to_innobase\(dtuple_t\*, unsigned char\*, unsigned long, dict_index_t\*, unsigned char const\*, unsigned long\).+RESULT: The RQG run ended with status STATUS_RECOVERY_FAILURE' ],   Archive: sdp:/data/Results/1619433372/004418.tar.xz

            The memory was allocated by the same thread shortly before it was accessed:

            mariadb-10.6.0

            #1  0x00007f12fac73022 in __asan_poison_memory_region () from /lib/x86_64-linux-gnu/libasan.so.5
            #2  0x000055c7e0ebcada in mem_heap_create_block_func (heap=0x0, n=160, file_name=0x55c7e22b7740 "/data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc", line=13592, type=0)
                at /data/Server/10.6alpha/storage/innobase/mem/mem0mem.cc:333
            #3  0x000055c7e0c95182 in mem_heap_create_func (size=160, file_name=0x55c7e22b7740 "/data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc", line=13592, type=0)
                at /data/Server/10.6alpha/storage/innobase/include/mem0mem.ic:377
            #4  0x000055c7e0cd4dc2 in ha_innobase::records_in_range (this=0x61d0003d18b8, keynr=0, min_key=0x690d1ba64580, max_key=0x0, pages=0x690d1ba64560)
                at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:13592
            #5  0x000055c7e00adcf9 in handler::multi_range_read_info_const (this=0x61d0003d18b8, keyno=0, seq=0x690d1ba64850, seq_init_param=0x690d1ba648a0, n_ranges_arg=0, bufsz=0x690d1ba646e0, flags=0x690d1ba646d0, 
                cost=0x690d1ba65070) at /data/Server/10.6alpha/sql/multi_range_read.cc:177
            #6  0x000055c7e00b8e12 in DsMrr_impl::dsmrr_info_const (this=0x61d0003d1d90, keyno=0, seq=0x690d1ba64850, seq_init_param=0x690d1ba648a0, n_ranges=0, bufsz=0x690d1ba64fa0, flags=0x690d1ba64f90, 
                cost=0x690d1ba65070) at /data/Server/10.6alpha/sql/multi_range_read.cc:1708
            #7  0x000055c7e0cea3b4 in ha_innobase::multi_range_read_info_const (this=0x61d0003d18b8, keyno=0, seq=0x690d1ba64850, seq_init_param=0x690d1ba648a0, n_ranges=0, bufsz=0x690d1ba64fa0, flags=0x690d1ba64f90, 
                cost=0x690d1ba65070) at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:19522
            

            The poisoned bytes are part of a safety margin inside mem_heap_t. The problematic access is being reported a little later:

            mariadb-10.6.0

            #0  0x00007f12fac78950 in __asan_report_load_n () from /lib/x86_64-linux-gnu/libasan.so.5
            #1  0x000055c7e10a0e1c in row_sel_convert_mysql_key_to_innobase (tuple=0x612000099248, buf=0x62100003735c '\276' <repeats 20 times>, buf_len=8, index=0x6160012a8408, key_ptr=0x621000038594 "", key_len=5)
                at /data/Server/10.6alpha/storage/innobase/row/row0sel.cc:2529
            #2  0x000055c7e0cd509a in ha_innobase::records_in_range (this=0x61d0003d18b8, keynr=0, min_key=0x690d1ba64580, max_key=0x0, pages=0x690d1ba64560)
                at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:13601
            

            Let us check the lines 13592 through 13601:

            	heap = mem_heap_create(2 * (key->ext_key_parts * sizeof(dfield_t)
            				    + sizeof(dtuple_t)));
             
            	range_start = dtuple_create(heap, key->ext_key_parts);
            	dict_index_copy_types(range_start, index, key->ext_key_parts);
             
            	range_end = dtuple_create(heap, key->ext_key_parts);
            	dict_index_copy_types(range_end, index, key->ext_key_parts);
             
            	row_sel_convert_mysql_key_to_innobase(
            		range_start,
            		m_prebuilt->srch_key_val1,
            		m_prebuilt->srch_key_val_len,
            		index,
            		(byte*) (min_key ? min_key->key : (const uchar*) 0),
            		(ulint) (min_key ? min_key->length : 0));
            

            The invalid access is being reported for range_start->fields+1, which is garbage:

            (rr) p *range_start
            $11 = {info_bits = 0, n_fields = 18446744073709551614, n_fields_cmp = 18446744073709551614, fields = 0x612000099280, n_v_fields = 0, v_fields = 0x0, magic_n = 65478679}
            (rr) p *range_start.fields
            $12 = {data = 0x621000037358, ext = 0, spatial_status = 3, len = 4, type = {prtype = 1283, mtype = 6, len = 4, mbminlen = 0, mbmaxlen = 0}}
            (rr) p range_start.fields[1]
            $13 = {data = 0xbebebebebebebebe, ext = 0, spatial_status = 3, len = 3200171710, type = {prtype = 3200171710, mtype = 190, len = 48830, mbminlen = 6, mbmaxlen = 7}}
            

            A little earlier, range_start was changed from 1 to ULINT_MAX or -2 fields. But this should not be the root cause of the failure. Maybe removing that assignment could have allowed us to catch the problem a little earlier.

            #0  0x000055c7e12c6324 in dtuple_set_n_fields (tuple=0x612000099248, n_fields=18446744073709551614) at /data/Server/10.6alpha/storage/innobase/data/data0data.cc:125
            #1  0x000055c7e10a0c55 in row_sel_convert_mysql_key_to_innobase (tuple=0x612000099248, buf=0x621000037358 '\276' <repeats 24 times>, buf_len=8, index=0x6160012a8408, key_ptr=0x621000038590 "\001", key_len=5)
                at /data/Server/10.6alpha/storage/innobase/row/row0sel.cc:2506
            #2  0x000055c7e0cd509a in ha_innobase::records_in_range (this=0x61d0003d18b8, keynr=0, min_key=0x690d1ba64580, max_key=0x0, pages=0x690d1ba64560)
                at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:13601
            

            Let us check at what point the server was executing before it was killed by the test harness and before the failure of the subsequent server run that started with a recovery:

            (rr) when
            Current event: 567808
            (rr) watch -l log_sys.flushed_to_disk_lsn
            Hardware watchpoint 1: -location log_sys.flushed_to_disk_lsn
            (rr) rc
            Continuing.
            …
            Thread 4 hit Hardware watchpoint 1: -location log_sys.flushed_to_disk_lsn
             
            Old value = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 8101754}, <No data fields>}
            New value = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 8101711}, <No data fields>}
            0x000055e53f5e63a1 in std::__atomic_base<unsigned long>::store (__m=std::memory_order_release, __i=8101754, this=0x55e5427e4408 <log_sys+8>) at /usr/include/c++/9/bits/atomic_base.h:397
            397		__atomic_store_n(&_M_i, __i, int(__m));
            (rr) bt
            #0  0x000055e53f5e63a1 in std::__atomic_base<unsigned long>::store (__m=std::memory_order_release, __i=8101754, this=0x55e5427e4408 <log_sys+8>) at /usr/include/c++/9/bits/atomic_base.h:397
            #1  log_t::set_flushed_lsn (this=0x55e5427e4400 <log_sys>, lsn=8101754) at /data/Server/10.6alpha/storage/innobase/include/log0log.h:656
            #2  0x000055e53f5e026f in log_write_flush_to_disk_low (lsn=8101754) at /data/Server/10.6alpha/storage/innobase/log/log0log.cc:629
            #3  0x000055e53f5e13a7 in log_write_up_to (lsn=8101754, flush_to_disk=true, rotate_key=false, callback=0x7f3ffb184dc0) at /data/Server/10.6alpha/storage/innobase/log/log0log.cc:830
            #4  0x000055e53f8fe839 in trx_flush_log_if_needed_low (lsn=8101754, trx_state=TRX_STATE_COMMITTED_IN_MEMORY) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1205
            #5  0x000055e53f8fe920 in trx_flush_log_if_needed (lsn=8101754, trx=0x51c22490bb80) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1226
            #6  0x000055e53f908562 in trx_t::commit_in_memory (this=0x51c22490bb80, mtr=0x7f3ffb1853c0) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1414
            #7  0x000055e53f8ffa5e in trx_t::commit_low (this=0x51c22490bb80, mtr=0x7f3ffb1853c0) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1505
            #8  0x000055e53f8ffb93 in trx_t::commit (this=0x51c22490bb80) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1519
            #9  0x000055e53f4ec26c in ha_innobase::commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true)
                at /data/Server/10.6alpha/storage/innobase/handler/handler0alter.cc:11173
            #10 0x000055e53eb302ca in handler::ha_commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true) at /data/Server/10.6alpha/sql/handler.cc:4863
            #11 0x000055e53e601739 in mysql_inplace_alter_table (thd=0x62b0001f8218, table_list=0x62b0002303f0, table=0x619000a52898, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, 
                target_mdl_request=0x7f3ffb187240, alter_ctx=0x7f3ffb1882b0) at /data/Server/10.6alpha/sql/sql_table.cc:8177
            #12 0x000055e53e61376b in mysql_alter_table (thd=0x62b0001f8218, new_db=0x62b0001fcc38, new_name=0x62b0001fd060, create_info=0x7f3ffb189640, table_list=0x62b0002303f0, alter_info=0x7f3ffb189510, order_num=0, 
                order=0x0, ignore=false, if_exists=false) at /data/Server/10.6alpha/sql/sql_table.cc:10788
            #13 0x000055e53e7871ef in Sql_cmd_alter_table::execute (this=0x62b000230c18, thd=0x62b0001f8218) at /data/Server/10.6alpha/sql/sql_alter.cc:540
            #14 0x000055e53e3a82ad in mysql_execute_command (thd=0x62b0001f8218) at /data/Server/10.6alpha/sql/sql_parse.cc:5986
            #15 0x000055e53e3b4738 in mysql_parse (thd=0x62b0001f8218, rawbuf=0x62b000230238 "ALTER TABLE t1 MODIFY COLUMN col_int INTEGER NOT NULL, ALGORITHM = INPLACE  /* E_R Thread1 QNO 7321 CON_ID 54 */", length=112, 
                parser_state=0x7f3ffb18ab20) at /data/Server/10.6alpha/sql/sql_parse.cc:8018
            #16 0x000055e53e38d022 in dispatch_command (command=COM_QUERY, thd=0x62b0001f8218, 
                packet=0x62900000f219 " ALTER TABLE t1 MODIFY COLUMN col_int INTEGER NOT NULL, ALGORITHM = INPLACE  /* E_R Thread1 QNO 7321 CON_ID 54 */ ", packet_length=114, blocking=true)
                at /data/Server/10.6alpha/sql/sql_parse.cc:1897
            …
            (rr) finish
            Run till exit from #0  0x000055e53f5e63a1 in std::__atomic_base<unsigned long>::store (__m=std::memory_order_release, __i=8101754, this=0x55e5427e4408 <log_sys+8>) at /usr/include/c++/9/bits/atomic_base.h:397
            …
            (rr) finish
            Run till exit from #0  trx_t::commit (this=0x51c22490bb80) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1512
            ha_innobase::commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true) at /data/Server/10.6alpha/storage/innobase/handler/handler0alter.cc:11180
            11180			DBUG_EXECUTE_IF("innodb_alter_commit_crash_after_commit",
            (rr) when
            Current event: 567694
            (rr) finish
            Run till exit from #0  ha_innobase::commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true)
                at /data/Server/10.6alpha/storage/innobase/handler/handler0alter.cc:11180
             
            Thread 3 received signal SIGKILL, Killed.
            [Switching to Thread 3972606.3973052]
            0x0000000070000002 in ?? ()
            (rr) when
            Current event: 567808
            

            Based on this excerpt, it is obvious that the ALTER TABLE operation was committed inside InnoDB but the .frm file was not updated before the server was killed. Thus, we would have type mismatch for the column col_int.

            This kind of crashes can occur until MDEV-25180 and MDEV-17567 have been fixed. It should not occur with those fixes. Unfortunately, Atomic DDL did not make it to the 10.6.0 release.

            marko Marko Mäkelä added a comment - The memory was allocated by the same thread shortly before it was accessed: mariadb-10.6.0 #1 0x00007f12fac73022 in __asan_poison_memory_region () from /lib/x86_64-linux-gnu/libasan.so.5 #2 0x000055c7e0ebcada in mem_heap_create_block_func (heap=0x0, n=160, file_name=0x55c7e22b7740 "/data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc", line=13592, type=0) at /data/Server/10.6alpha/storage/innobase/mem/mem0mem.cc:333 #3 0x000055c7e0c95182 in mem_heap_create_func (size=160, file_name=0x55c7e22b7740 "/data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc", line=13592, type=0) at /data/Server/10.6alpha/storage/innobase/include/mem0mem.ic:377 #4 0x000055c7e0cd4dc2 in ha_innobase::records_in_range (this=0x61d0003d18b8, keynr=0, min_key=0x690d1ba64580, max_key=0x0, pages=0x690d1ba64560) at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:13592 #5 0x000055c7e00adcf9 in handler::multi_range_read_info_const (this=0x61d0003d18b8, keyno=0, seq=0x690d1ba64850, seq_init_param=0x690d1ba648a0, n_ranges_arg=0, bufsz=0x690d1ba646e0, flags=0x690d1ba646d0, cost=0x690d1ba65070) at /data/Server/10.6alpha/sql/multi_range_read.cc:177 #6 0x000055c7e00b8e12 in DsMrr_impl::dsmrr_info_const (this=0x61d0003d1d90, keyno=0, seq=0x690d1ba64850, seq_init_param=0x690d1ba648a0, n_ranges=0, bufsz=0x690d1ba64fa0, flags=0x690d1ba64f90, cost=0x690d1ba65070) at /data/Server/10.6alpha/sql/multi_range_read.cc:1708 #7 0x000055c7e0cea3b4 in ha_innobase::multi_range_read_info_const (this=0x61d0003d18b8, keyno=0, seq=0x690d1ba64850, seq_init_param=0x690d1ba648a0, n_ranges=0, bufsz=0x690d1ba64fa0, flags=0x690d1ba64f90, cost=0x690d1ba65070) at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:19522 The poisoned bytes are part of a safety margin inside mem_heap_t . The problematic access is being reported a little later: mariadb-10.6.0 #0 0x00007f12fac78950 in __asan_report_load_n () from /lib/x86_64-linux-gnu/libasan.so.5 #1 0x000055c7e10a0e1c in row_sel_convert_mysql_key_to_innobase (tuple=0x612000099248, buf=0x62100003735c '\276' <repeats 20 times>, buf_len=8, index=0x6160012a8408, key_ptr=0x621000038594 "", key_len=5) at /data/Server/10.6alpha/storage/innobase/row/row0sel.cc:2529 #2 0x000055c7e0cd509a in ha_innobase::records_in_range (this=0x61d0003d18b8, keynr=0, min_key=0x690d1ba64580, max_key=0x0, pages=0x690d1ba64560) at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:13601 Let us check the lines 13592 through 13601: heap = mem_heap_create(2 * (key->ext_key_parts * sizeof (dfield_t) + sizeof (dtuple_t)));   range_start = dtuple_create(heap, key->ext_key_parts); dict_index_copy_types(range_start, index, key->ext_key_parts);   range_end = dtuple_create(heap, key->ext_key_parts); dict_index_copy_types(range_end, index, key->ext_key_parts);   row_sel_convert_mysql_key_to_innobase( range_start, m_prebuilt->srch_key_val1, m_prebuilt->srch_key_val_len, index, (byte*) (min_key ? min_key->key : ( const uchar*) 0), (ulint) (min_key ? min_key->length : 0)); The invalid access is being reported for range_start->fields+1 , which is garbage: (rr) p *range_start $11 = {info_bits = 0, n_fields = 18446744073709551614, n_fields_cmp = 18446744073709551614, fields = 0x612000099280, n_v_fields = 0, v_fields = 0x0, magic_n = 65478679} (rr) p *range_start.fields $12 = {data = 0x621000037358, ext = 0, spatial_status = 3, len = 4, type = {prtype = 1283, mtype = 6, len = 4, mbminlen = 0, mbmaxlen = 0}} (rr) p range_start.fields[1] $13 = {data = 0xbebebebebebebebe, ext = 0, spatial_status = 3, len = 3200171710, type = {prtype = 3200171710, mtype = 190, len = 48830, mbminlen = 6, mbmaxlen = 7}} A little earlier, range_start was changed from 1 to ULINT_MAX or -2 fields. But this should not be the root cause of the failure. Maybe removing that assignment could have allowed us to catch the problem a little earlier. #0 0x000055c7e12c6324 in dtuple_set_n_fields (tuple=0x612000099248, n_fields=18446744073709551614) at /data/Server/10.6alpha/storage/innobase/data/data0data.cc:125 #1 0x000055c7e10a0c55 in row_sel_convert_mysql_key_to_innobase (tuple=0x612000099248, buf=0x621000037358 '\276' <repeats 24 times>, buf_len=8, index=0x6160012a8408, key_ptr=0x621000038590 "\001", key_len=5) at /data/Server/10.6alpha/storage/innobase/row/row0sel.cc:2506 #2 0x000055c7e0cd509a in ha_innobase::records_in_range (this=0x61d0003d18b8, keynr=0, min_key=0x690d1ba64580, max_key=0x0, pages=0x690d1ba64560) at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:13601 Let us check at what point the server was executing before it was killed by the test harness and before the failure of the subsequent server run that started with a recovery: (rr) when Current event: 567808 (rr) watch -l log_sys.flushed_to_disk_lsn Hardware watchpoint 1: -location log_sys.flushed_to_disk_lsn (rr) rc Continuing. … Thread 4 hit Hardware watchpoint 1: -location log_sys.flushed_to_disk_lsn   Old value = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 8101754}, <No data fields>} New value = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 8101711}, <No data fields>} 0x000055e53f5e63a1 in std::__atomic_base<unsigned long>::store (__m=std::memory_order_release, __i=8101754, this=0x55e5427e4408 <log_sys+8>) at /usr/include/c++/9/bits/atomic_base.h:397 397 __atomic_store_n(&_M_i, __i, int(__m)); (rr) bt #0 0x000055e53f5e63a1 in std::__atomic_base<unsigned long>::store (__m=std::memory_order_release, __i=8101754, this=0x55e5427e4408 <log_sys+8>) at /usr/include/c++/9/bits/atomic_base.h:397 #1 log_t::set_flushed_lsn (this=0x55e5427e4400 <log_sys>, lsn=8101754) at /data/Server/10.6alpha/storage/innobase/include/log0log.h:656 #2 0x000055e53f5e026f in log_write_flush_to_disk_low (lsn=8101754) at /data/Server/10.6alpha/storage/innobase/log/log0log.cc:629 #3 0x000055e53f5e13a7 in log_write_up_to (lsn=8101754, flush_to_disk=true, rotate_key=false, callback=0x7f3ffb184dc0) at /data/Server/10.6alpha/storage/innobase/log/log0log.cc:830 #4 0x000055e53f8fe839 in trx_flush_log_if_needed_low (lsn=8101754, trx_state=TRX_STATE_COMMITTED_IN_MEMORY) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1205 #5 0x000055e53f8fe920 in trx_flush_log_if_needed (lsn=8101754, trx=0x51c22490bb80) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1226 #6 0x000055e53f908562 in trx_t::commit_in_memory (this=0x51c22490bb80, mtr=0x7f3ffb1853c0) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1414 #7 0x000055e53f8ffa5e in trx_t::commit_low (this=0x51c22490bb80, mtr=0x7f3ffb1853c0) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1505 #8 0x000055e53f8ffb93 in trx_t::commit (this=0x51c22490bb80) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1519 #9 0x000055e53f4ec26c in ha_innobase::commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true) at /data/Server/10.6alpha/storage/innobase/handler/handler0alter.cc:11173 #10 0x000055e53eb302ca in handler::ha_commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true) at /data/Server/10.6alpha/sql/handler.cc:4863 #11 0x000055e53e601739 in mysql_inplace_alter_table (thd=0x62b0001f8218, table_list=0x62b0002303f0, table=0x619000a52898, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, target_mdl_request=0x7f3ffb187240, alter_ctx=0x7f3ffb1882b0) at /data/Server/10.6alpha/sql/sql_table.cc:8177 #12 0x000055e53e61376b in mysql_alter_table (thd=0x62b0001f8218, new_db=0x62b0001fcc38, new_name=0x62b0001fd060, create_info=0x7f3ffb189640, table_list=0x62b0002303f0, alter_info=0x7f3ffb189510, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/10.6alpha/sql/sql_table.cc:10788 #13 0x000055e53e7871ef in Sql_cmd_alter_table::execute (this=0x62b000230c18, thd=0x62b0001f8218) at /data/Server/10.6alpha/sql/sql_alter.cc:540 #14 0x000055e53e3a82ad in mysql_execute_command (thd=0x62b0001f8218) at /data/Server/10.6alpha/sql/sql_parse.cc:5986 #15 0x000055e53e3b4738 in mysql_parse (thd=0x62b0001f8218, rawbuf=0x62b000230238 "ALTER TABLE t1 MODIFY COLUMN col_int INTEGER NOT NULL, ALGORITHM = INPLACE /* E_R Thread1 QNO 7321 CON_ID 54 */", length=112, parser_state=0x7f3ffb18ab20) at /data/Server/10.6alpha/sql/sql_parse.cc:8018 #16 0x000055e53e38d022 in dispatch_command (command=COM_QUERY, thd=0x62b0001f8218, packet=0x62900000f219 " ALTER TABLE t1 MODIFY COLUMN col_int INTEGER NOT NULL, ALGORITHM = INPLACE /* E_R Thread1 QNO 7321 CON_ID 54 */ ", packet_length=114, blocking=true) at /data/Server/10.6alpha/sql/sql_parse.cc:1897 … (rr) finish Run till exit from #0 0x000055e53f5e63a1 in std::__atomic_base<unsigned long>::store (__m=std::memory_order_release, __i=8101754, this=0x55e5427e4408 <log_sys+8>) at /usr/include/c++/9/bits/atomic_base.h:397 … (rr) finish Run till exit from #0 trx_t::commit (this=0x51c22490bb80) at /data/Server/10.6alpha/storage/innobase/trx/trx0trx.cc:1512 ha_innobase::commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true) at /data/Server/10.6alpha/storage/innobase/handler/handler0alter.cc:11180 11180 DBUG_EXECUTE_IF("innodb_alter_commit_crash_after_commit", (rr) when Current event: 567694 (rr) finish Run till exit from #0 ha_innobase::commit_inplace_alter_table (this=0x61d0007116b8, altered_table=0x7f3ffb187840, ha_alter_info=0x7f3ffb187150, commit=true) at /data/Server/10.6alpha/storage/innobase/handler/handler0alter.cc:11180   Thread 3 received signal SIGKILL, Killed. [Switching to Thread 3972606.3973052] 0x0000000070000002 in ?? () (rr) when Current event: 567808 Based on this excerpt, it is obvious that the ALTER TABLE operation was committed inside InnoDB but the .frm file was not updated before the server was killed. Thus, we would have type mismatch for the column col_int . This kind of crashes can occur until MDEV-25180 and MDEV-17567 have been fixed. It should not occur with those fixes. Unfortunately, Atomic DDL did not make it to the 10.6.0 release.

            mleich, is this bug still repeatable?

            marko Marko Mäkelä added a comment - mleich , is this bug still repeatable?

            This problem was no more observed on actual 10.6 over several months.

            mleich Matthias Leich added a comment - This problem was no more observed on actual 10.6 over several months.

            People

              mleich Matthias Leich
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.