[MDEV-29201] Crash in row_search_mvcc() Created: 2022-07-29  Updated: 2022-09-05  Resolved: 2022-09-05

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Valerii Kravchuk Assignee: Valerii Kravchuk
Resolution: Incomplete Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-28349 Provide "crash safe" options for CHEC... Open

 Description   

The following crash happens (from gdb full backtrace):

Thread 1 (Thread 0x7fb40cc76700 (LWP 56287)):
#0 0x00007fb8ff34eaa1 in pthread_kill () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x000055beab4a87ee in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-/src_0/sql/signal_handler.cc:330
curr_time = 1658744252
tm = {tm_sec = 32, tm_min = 17, tm_hour = 18, tm_mday = 25, tm_mon = 6, tm_year = 122, tm_wday = 1, tm_yday = 205, tm_isdst = 0, tm_gmtoff = 28800, tm_zone = 0x55beae08b770 "CST"}
thd = 0x7fb31c0717f8
print_invalid_query_pointer = false
#2 ...signal handler called...
No symbol table info available.
#3 row_search_mvcc (buf=..., mode=PAGE_CUR_GE, prebuilt=0x7fb31c2a76c0, match_mode=..., direction=0) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0sel.cc:4797
...
#4 0x000055beab7ca1cf in ha_innobase::index_read (this=0x7fb31c01a6b0, buf=0x7fb31c0b5ce0 "\277\367\377\377\377\377\373", '\377' ..., key_ptr=0x7fb31c169508 "", key_len=0, find_flag=HA_READ_KEY_EXACT) at /usr/src/debug/MariaDB-/src_0/storage/innobase/handler/ha_innodb.cc:8797
No locals.
#5 0x000055beab4adfc8 in handler::ha_index_read_map (this=0x7fb31c01a6b0, buf=0x7fb31c0b5ce0 "\277\367\377\377\377\377\373", '\377' ..., key=0x7fb31c169508 "", keypart_map=0, find_flag=...) at /usr/src/debug/MariaDB-/src_0/sql/handler.cc:3148
No locals.
#6 0x000055beab2e26d7 in join_read_always_key (tab=0x7fb31c1681d8) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:21380
No locals.
#7 0x000055beab2d619a in sub_select (join=0x7fb31c2bd318, join_tab=0x7fb31c1681d8, end_of_records=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20616
No locals.
#8 0x000055beab304dff in JOIN::exec_inner (this=this@entry=0x7fb31c2bd318) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20153
No locals.
#9 0x000055beab305063 in JOIN::exec (this=0x7fb31c2bd318) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:4240
No locals.
#10 0x000055beab30323e in mysql_select (thd=thd@entry=0x7fb31c0717f8, tables=0x7fb31c099080, fields=..., conds=0x7fb31c1672f0, og_num=2, order=0x0, group=0x7fb31c0998c0, having=0x0, proc_param=proc_param@entry=0x0, select_options=2209760938752, result=result@entry=0x7fb31c2bd238, unit=unit@entry=0x7fb31c099ad8, select_lex=select_lex@entry=0x7fb31c0988f0) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:4655
No locals.
#11 0x000055beab26337e in mysql_derived_fill (thd=...>, lex=0x7fb31c075710, derived=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_derived.cc:1265
...
#13 0x000055beab2d5e99 in st_join_table::preread_init (this=0x7fb31c167700) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:13585
No locals.
#14 0x000055beab2d6168 in sub_select (join=0x7fb31c2bccc0, join_tab=0x7fb31c167700, end_of_records=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20581
No locals.
#15 0x000055beab304dff in JOIN::exec_inner (this=this@entry=0x7fb31c2bccc0) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20153
No locals.
#16 0x000055beab305063 in JOIN::exec (this=0x7fb31c2bccc0) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:4240
No locals.
#17 0x000055beab5619b5 in subselect_single_select_engine::exec (this=0x7fb31c09a860) at /usr/src/debug/MariaDB-/src_0/sql/item_subselect.cc:3989
...
#18 0x000055beab5600ba in Item_subselect::exec (this=0x7fb31c2c9eb0) at /usr/src/debug/MariaDB-/src_0/sql/item_subselect.cc:782
...
#19 0x000055beab56090e in Item_singlerow_subselect::val_int (this=0x7fb31c2c9eb0) at /usr/src/debug/MariaDB-/src_0/sql/item_subselect.cc:1375
No locals.
#20 0x000055beab4bddda in Item_cache_int::cache_value (this=0x7fb31c16add0) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:9834
No locals.
#21 0x000055beab4d2da7 in cache (this=0x7fb31c16ad08) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:8672
No locals.
#22 Item_cache_wrapper::val_int (this=0x7fb31c16ad08) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:8726
cached_value = 0x0
#23 0x000055beab4df469 in Arg_comparator::compare_int_signed (this=0x7fb31c2ca0f8) at /usr/src/debug/MariaDB-/src_0/sql/item_cmpfunc.cc:939
...
#24 0x000055beab4e371a in compare (this=...) at /usr/src/debug/MariaDB-/src_0/sql/item_cmpfunc.h:102
No locals.
#25 Item_func_gt::val_int (this=...) at /usr/src/debug/MariaDB-/src_0/sql/item_cmpfunc.cc:1814
No locals.
#26 0x000055beab3ef0c0 in Type_handler_int_result::Item_val_bool (this=..., item=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_type.cc:4997
No locals.
#27 0x000055beab4e2a3d in Item_func_case_searched::find_item (this=0x7fb31c2cb088) at /usr/src/debug/MariaDB-/src_0/sql/item_cmpfunc.cc:3009
i = 0
count = 2
#28 0x000055beab4dfe16 in Item_func_case::str_op (this=0x7fb31c2cb088, str=0x7fb31c2cb0c0) at /usr/src/debug/MariaDB-/src_0/sql/item_cmpfunc.cc:3042
...
#29 0x000055beab4cf861 in Item::save_str_in_field (this=0x7fb31c2cb088, field=0x7fb31c156740, no_conversions=...) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:6612
...
#30 0x000055beab4c0d86 in Item::save_in_field (this=0x7fb31c2cb088, field=0x7fb31c156740, no_conversions=...) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:6660
error = ...
#31 0x000055beab2e46a1 in copy_funcs (func_ptr=0x7fb31c155198, thd=0x7fb31c0717f8) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:25709
No locals.
#32 0x000055beab2e4740 in end_write (join=0x7fb31c0f5e08, join_tab=0x7fb31c14fe28, end_of_records=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:22047
No locals.
#33 0x000055beab2d073c in evaluate_join_record (join=0x7fb31c0f5e08, join_tab=0x7fb31c14fa60, error=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20842
No locals.
#34 0x000055beab2d61af in sub_select (join=0x7fb31c0f5e08, join_tab=0x7fb31c14fa60, end_of_records=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20619
No locals.
#35 0x000055beab2d649c in sub_select (join=0x7fb31c0f5e08, join_tab=0x7fb31c14f698, end_of_records=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20965
No locals.
#36 0x000055beab2d649c in sub_select (join=0x7fb31c0f5e08, join_tab=0x7fb31c14f2d0, end_of_records=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20965
No locals.
#37 0x000055beab2d073c in evaluate_join_record (join=0x7fb31c0f5e08, join_tab=0x7fb31c14ef08, error=...) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20842
No locals.
...
#48 0x000055beab304dff in JOIN::exec_inner (this=this@entry=0x7fb31c0f5e08) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:20153
No locals.
#49 0x000055beab305063 in JOIN::exec (this=0x7fb31c0f5e08) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:4240
No locals.
#50 0x000055beab30323e in mysql_select (thd=0x7fb31c0717f8, tables=0x7fb31c0f4be0, fields=..., conds=0x0, og_num=1, order=0x0, group=0x7fb31c0f5410, having=0x0, proc_param=0x0, select_options=10737683200, result=0x7fb31c0f5de0, unit=0x7fb31c0757d8, select_lex=0x7fb31c002640) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:4655
No locals.
#51 0x000055beab303d84 in handle_select (thd=thd@entry=0x7fb31c0717f8, lex=lex@entry=0x7fb31c075710, result=result@entry=0x7fb31c0f5de0, setup_tables_done_option=setup_tables_done_option@entry=0) at /usr/src/debug/MariaDB-/src_0/sql/sql_select.cc:429
No locals.
#52 0x000055beab165f61 in execute_sqlcom_select (thd=thd@entry=0x7fb31c0717f8, all_tables=0x7fb31c0f4be0) at /usr/src/debug/MariaDB-/src_0/sql/sql_parse.cc:6293
...
#54 0x000055beab2aacc5 in mysql_parse (thd=thd@entry=0x7fb31c0717f8, rawbuf=..., length=..., parser_state=parser_state@entry=0x7fb40cc75300, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-/src_0/sql/sql_parse.cc:8071
...
#55 0x000055beab2ace53 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fb31c0717f8, packet=packet@entry=0x7fb31c077f19 "", packet_length=packet_length@entry=3703, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-/src_0/sql/sql_parse.cc:1873
packet_end = 0x7fb31c001767 ""
parser_state = {m_lip = {lookahead_token = -1, lookahead_yylval = 0x0, m_thd = 0x7fb31c0717f8, m_ptr = 0x7fb31c001768 "\005", m_tok_start = 0x7fb31c001768 "\005", m_tok_end = 0x7fb31c001768 "\005", m_end_of_query = 0x7fb31c001767 "", m_tok_start_prev = 0x7fb31c001767 "", m_buf = 0x7fb31c0008f0 "SELECT * FROM (SELECT \r\n UUID(), \t\t\t\t\t\t\t\t\t -- 主键\r\n\t REPLACE(CURDATE(),'-',''), -- 数据日期\r\n\t '20220630' , \t\t\t\t\t\t\t\t\t -- 报表日期\r\n '102' NBJGH, -- 内部机构号\r\n 'CF Ops"..., m_buf_length = 3703, m_echo = true, m_echo_saved = false, m_cpp_buf = 0x7fb31c0017c8 "SELECT * FROM (SELECT \r\n UUID(), \t\t\t\t\t\t\t\t\t -- 主键\r\n\t REPLACE(CURDATE(),'-',''), -- 数据日期\r\n\t '20220630' , \t\t\t\t\t\t\t\t\t -- 报表日期\r\n '102' NBJGH, -- 内部机构号\r\n 'CF Ops"..., m_cpp_ptr = 0x7fb31c00263f "", m_cpp_tok_start = 0x7fb31c00263f "", m_cpp_tok_start_prev = 0x7fb31c00263f "", m_cpp_tok_end = 0x7fb31c00263f "", m_body_utf8 = 0x0, m_body_utf8_ptr = 0x20e2bd8aa4bf3600 ..., m_cpp_utf8_processed_ptr = 0x0, next_state = MY_LEX_END, found_semicolon = 0x0, ignore_space = false, stmt_prepare_mode = false, multi_statements = true, yylineno = 73, m_digest = 0x0, in_comment = NO_COMMENT, in_comment_saved = (unknown: 32692), m_cpp_text_start = 0x7fb31c00263a "XDHTH", m_cpp_text_end = 0x7fb31c00263f "", m_underscore_cs = 0x0}, m_yacc = {yacc_yyss = 0x0, yacc_yyvs = 0x0, m_set_signal_info = {m_item = {0x0 ...}}, m_lock_type = TL_READ_DEFAULT, m_mdl_type = MDL_SHARED_READ}, m_digest_psi = 0x0}
net = 0x7fb31c071a70
do_end_of_statement = true
__FUNCTION__ = "dispatch_command"
error = false
drop_more_results = false
#56 0x000055beab2ae41b in do_command (thd=0x7fb31c0717f8) at /usr/src/debug/MariaDB-/src_0/sql/sql_parse.cc:1354
...
#57 0x000055beab399129 in do_handle_one_connection (connect=..., connect@entry=0x55beae0e92b8, put_in_cache=put_in_cache@entry=true) at /usr/src/debug/MariaDB-/src_0/sql/sql_connect.cc:1410
...
#58 0x000055beab3993c4 in handle_one_connection (arg=arg@entry=0x55beae0e92b8) at /usr/src/debug/MariaDB-/src_0/sql/sql_connect.cc:1312
connect = 0x55beae0e92b8
#59 0x000055beab71dd1d in pfs_spawn_thread (arg=0x55bec3350b78) at /usr/src/debug/MariaDB-/src_0/storage/perfschema/pfs.cc:2201
...
#60 0x00007fb8ff349ea5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#61 0x00007fb8fd4e198d in clone () from /lib64/libc.so.6
No symbol table info available.

while executing complex SELECT with subqueries, accessing 9 different tables via secondary indexes. CHECK TABLES reports "OK" for all of them, to we do not have a reason to think the problem is related to index corruption (unlike in MDEV-23192).



 Comments   
Comment by Marko Mäkelä [ 2022-07-29 ]

The cause appears to be an invalid (null?) rec pointer being dereferenced in row_search_mvcc():

                if (rec_get_info_bits(rec, true) & REC_INFO_MIN_REC_FLAG) {

It is possible that MDEV-21098 would fix this. As part of that change, I have tried to make record iteration functions to return a null pointer when an out-of-bounds pointer is encountered, and added checks for invalid pointers everywhere. But, we only test with data that is easily generated by SQL, not by pages where arbitrary corruption has been injected. In the development branch, I did not add an explicit check for rec being a null pointer. It would be very helpful if that development branch could be tested with this dataset. It is possible that the error would be caught earlier, and an explicit null-pointer check in row_search_mvcc() would be redundant.

Side note: I think that things would be simpler if InnoDB did not store the infimum and supremum pseudo-records in each index page. We could simply terminate the singly linked list in a null pointer (the 16-bit next-record offset would be 0), and checks for the page supremum would be replaced with checks for a null pointer. But, we cannot change the format easily because we want to guarantee easy in-place upgrades of old installations.

Comment by Marko Mäkelä [ 2022-07-29 ]

Based on further discussion, we actually have rec=0x0 in the backtrace full of the crashing thread.

The core dump should identify the name of the index and the table:

p ((dict_index_t*)0x7f13540846b0)->name
p ((dict_index_t*)0x7f13540846b0)->table->name

(Note: those numeric addresses apply to this particular core dump only.)

Because the buffer pool contents was omitted from the core dump (MDEV-10814), we’d need another crash (within GDB, not via core dump) to identify the corrupted page and to extract a copy of it. Something like the following in the stack frame of row_search_mvcc should work for that:

print/x pcur.btr_cur.page_cur.block.page.id
dump binary memory /tmp/page.bin pcur.btr_cur.page_cur.block.frame pcur.btr_cur.page_cur.block.frame+srv_page_size

Comment by Marko Mäkelä [ 2022-08-01 ]

Can we get some more details?

Generated at Thu Feb 08 10:06:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.