[MDEV-29328] rocksdb.allow_to_start_after_corruption fails on ASAN build Created: 2022-08-18  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.5, 10.6

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None


 Description   

On a debug ASAN build,

perl ./mtr rocksdb.allow_to_start_after_corruption

The visible effect in MTR stdout is

10.5 5fc172fd

rocksdb.allow_to_start_after_corruption 'write_committed' [ fail ]
        Test ended at 2022-08-18 18:45:57
 
CURRENT_TEST: rocksdb.allow_to_start_after_corruption
2022-08-18 18:45:42 0 [Note] /mnt8t/bld/10.5-asan/bin/mariadbd (mysqld 10.5.18-MariaDB-debug-log) starting as process 1206303 ...
2022-08-18 18:45:42 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32198)
Aborted
mysqltest: At line 62: command "$MYSQLD_CMD --plugin_load=$HA_ROCKSDB_SO $_mysqld_option" failed with wrong error: 134
 
The result from queries just before the failure was:
< snip >
1	1
set session debug_dbug= "+d,rocksdb_return_status_corrupted";
select * from t1 where pk=1;
ERROR HY000: Lost connection to MySQL server during query
FOUND 1 /data corruption detected/ in allow_to_start_after_corruption_debug.err
#
#  The same for scan queries
#
select * from t1;
pk	col1
1	1
2	2
3	3
set session debug_dbug= "+d,rocksdb_return_status_corrupted";
select * from t1;
ERROR HY000: Lost connection to MySQL server during query
FOUND 1 /data corruption detected/ in allow_to_start_after_corruption_debug.err
#
#  Test restart failure. The server is shutdown at this point.
#
 
More results from queries before failure can be found in /mnt8t/bld/10.5-asan/mysql-test/var/log/allow_to_start_after_corruption.log
 
 - found 'core' (0/5)
Core generated by '/mnt8t/bld/10.5-asan/bin/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
[New LWP 1206291]
[New LWP 1206250]
[New LWP 1206285]
[New LWP 1206251]
[New LWP 1206249]
[New LWP 1206252]
[New LWP 1206284]
[New LWP 1206283]
[New LWP 1206287]
[New LWP 1206248]
[New LWP 1206289]
[New LWP 1206288]
[New LWP 1206290]
[New LWP 1206286]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/mnt8t/bld/10.5-asan/bin/mariadbd --defaults-group-suffix=.1 --defaults-file=/m'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
56	../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.

[Current thread is 1 (Thread 0x7f3c48646700 (LWP 1206291))]
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000000003458f0b in my_write_core (sig=6) at /data/src/10.5/mysys/stacktrace.c:424
#2  0x00000000017e8054 in handle_fatal_signal (sig=6) at /data/src/10.5/sql/signal_handler.cc:354
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00007f3c5894c537 in __GI_abort () at abort.c:79
#6  0x00007f3c4fdcbe4c in myrocks::rdb_handle_io_error (status=..., err_type=myrocks::RDB_IO_ERROR_GENERAL) at /data/src/10.5/storage/rocksdb/ha_rocksdb.cc:14046
#7  0x00007f3c4fda0c23 in myrocks::is_valid (scan_it=0x608000017620) at /data/src/10.5/storage/rocksdb/ha_rocksdb.cc:5969
#8  0x00007f3c4fdb0c3b in myrocks::ha_rocksdb::rnd_next_with_direction (this=0x61d000195128, buf=0x619000082b38 "\376\003", move_forward=true) at /data/src/10.5/storage/rocksdb/ha_rocksdb.cc:10687
#9  0x00007f3c4fdbfa7e in myrocks::ha_rocksdb::rnd_next (this=0x61d000195128, buf=0x619000082b38 "\376\003") at /data/src/10.5/storage/rocksdb/ha_rocksdb.cc:10627
#10 0x000000000180824a in handler::ha_rnd_next (this=0x61d000195128, buf=0x619000082b38 "\376\003") at /data/src/10.5/sql/handler.cc:3096
#11 0x0000000001debdc3 in rr_sequential (info=0x62b00003b070) at /data/src/10.5/sql/records.cc:519
#12 0x0000000000ac10f9 in READ_RECORD::read_record (this=0x62b00003b070) at /data/src/10.5/sql/records.h:80
#13 0x0000000000efc629 in sub_select (join=0x62b0000399f0, join_tab=0x62b00003afa8, end_of_records=false) at /data/src/10.5/sql/sql_select.cc:20985
#14 0x0000000000e93106 in do_select (join=0x62b0000399f0, procedure=0x0) at /data/src/10.5/sql/sql_select.cc:20498
#15 0x0000000000e8f516 in JOIN::exec_inner (this=0x62b0000399f0) at /data/src/10.5/sql/sql_select.cc:4560
#16 0x0000000000e8ba87 in JOIN::exec (this=0x62b0000399f0) at /data/src/10.5/sql/sql_select.cc:4340
#17 0x0000000000e0714c in mysql_select (thd=0x62b000069288, tables=0x62b000038980, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2164525824, result=0x62b0000399c0, unit=0x62b00006d430, select_lex=0x62b000038338) at /data/src/10.5/sql/sql_select.cc:4817
#18 0x0000000000e05d73 in handle_select (thd=0x62b000069288, lex=0x62b00006d368, result=0x62b0000399c0, setup_tables_done_option=0) at /data/src/10.5/sql/sql_select.cc:444
#19 0x0000000000d36ee9 in execute_sqlcom_select (thd=0x62b000069288, all_tables=0x62b000038980) at /data/src/10.5/sql/sql_parse.cc:6315
#20 0x0000000000d1b621 in mysql_execute_command (thd=0x62b000069288) at /data/src/10.5/sql/sql_parse.cc:4006
#21 0x0000000000d02383 in mysql_parse (thd=0x62b000069288, rawbuf=0x62b0000382a8 "select * from t1", length=16, parser_state=0x7f3c48643a40, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:8101
#22 0x0000000000cfa5ee in dispatch_command (command=COM_QUERY, thd=0x62b000069288, packet=0x6290000eb289 "", packet_length=16, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:1891
#23 0x0000000000d04c15 in do_command (thd=0x62b000069288) at /data/src/10.5/sql/sql_parse.cc:1375
#24 0x00000000012af1d0 in do_handle_one_connection (connect=0x611000034648, put_in_cache=true) at /data/src/10.5/sql/sql_connect.cc:1418
#25 0x00000000012ae754 in handle_one_connection (arg=0x611000034648) at /data/src/10.5/sql/sql_connect.cc:1312
#26 0x00000000023bf7a8 in pfs_spawn_thread (arg=0x616000122b08) at /data/src/10.5/storage/perfschema/pfs.cc:2201
#27 0x00007f3c58cecea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x00007f3c58a24def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The actual ASAN error is apparently this (got it by adding --log-error option to the failing intermediate server startup):

==1206303==ERROR: AddressSanitizer: heap-use-after-free on address 0x625000000109 at pc 0x00000242c834 bp 0x7ffd16978840 sp 0x7ffd16978838
WRITE of size 1 at 0x625000000109 thread T0
    #0 0x242c833 in PFS_buffer_scalable_container<PFS_mutex, 1024, 1024, PFS_buffer_default_array<PFS_mutex>, PFS_buffer_default_allocator<PFS_mutex> >::static_deallocate(PFS_m
utex*) /data/src/10.5/storage/perfschema/pfs_buffer_container.h:747:22
    #1 0x241f8f8 in PFS_partitioned_buffer_scalable_container<PFS_buffer_scalable_container<PFS_mutex, 1024, 1024, PFS_buffer_default_array<PFS_mutex>, PFS_buffer_default_alloc
ator<PFS_mutex> >, 2>::deallocate(PFS_mutex*) /data/src/10.5/storage/perfschema/pfs_buffer_container.h:1198:5
    #2 0x240f997 in destroy_mutex(PFS_mutex*) /data/src/10.5/storage/perfschema/pfs_instr.cc:339:26
    #3 0x23be190 in pfs_destroy_mutex_v1 /data/src/10.5/storage/perfschema/pfs.cc:1844:3
    #4 0x1446842 in inline_mysql_mutex_destroy(st_mysql_mutex*, char const*, unsigned int) /data/src/10.5/include/mysql/psi/mysql_thread.h:719:5
    #5 0x144d806 in rpl_binlog_state::free() /data/src/10.5/sql/rpl_gtid.cc:1516:5
    #6 0x144dfd4 in rpl_binlog_state::~rpl_binlog_state() /data/src/10.5/sql/rpl_gtid.cc:1565:3
    #7 0x7f0d23d744d6 in __run_exit_handlers stdlib/exit.c:108:8
    #8 0x7f0d23d74679 in exit stdlib/exit.c:139:3
    #9 0x7f0d1b1f837c in myrocks::rocksdb_init_func(void*) /data/src/10.5/storage/rocksdb/ha_rocksdb.cc:5255:7
    #10 0x17ee147 in ha_initialize_handlerton(st_plugin_int*) /data/src/10.5/sql/handler.cc:645:31
    #11 0xd64376 in plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /data/src/10.5/sql/sql_plugin.cc:1461:9
    #12 0xd62e5a in plugin_init(int*, char**, int) /data/src/10.5/sql/sql_plugin.cc:1753:15
    #13 0x8df5bf in init_server_components() /data/src/10.5/sql/mysqld.cc:4928:7
    #14 0x8d7fcd in mysqld_main(int, char**) /data/src/10.5/sql/mysqld.cc:5522:7
    #15 0x8cdc41 in main /data/src/10.5/sql/main.cc:25:10
    #16 0x7f0d23d5cd09 in __libc_start_main csu/../csu/libc-start.c:308:16
    #17 0x821629 in _start (/mnt8t/bld/10.5-asan/bin/mariadbd+0x821629)

but there are also lots of non-ASAN errors before. Since the test is meant to cause corruption, it's hard to say where the intended failures end and the unexpected one starts.

Reproducible on 10.5+. Not reproducible on 10.4.


Generated at Thu Feb 08 10:07:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.