[MDEV-22369] Assertion `err == DB_SUCCESS' failed. in btr_block_get_func during crash recovery Created: 2020-04-25  Updated: 2020-04-28  Resolved: 2020-04-28

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.3
Fix Version/s: 10.5.3

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MDEV-22369.log     File MDEV-22369.yy     File simp_MDEV-22369.cfg    
Issue Links:
Problem/Incident
is caused by MDEV-21572 buf_page_get_gen() should apply buffe... Closed
Relates
relates to MDEV-19738 Doublewrite buffer is unnecessarily u... Closed

 Description   

Workflow:
1. Start server and create a table
2. One session runs random
     ALTER TABLE t1 ADD KEY ( col2 ), ADD KEY ( col1 ) ;
     TRUNCATE TABLE t1;
3. Kill (SIGKILL) the process of the DB server 
4. The attempt to restart that server fails with
     2020-04-25 19:18:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2754798
    2020-04-25 19:18:53 0 [Note] InnoDB: Ignoring data file './test/t1.ibd' with space ID 83, since the redo log references ./test/t1.ibd with space ID 82.
    2020-04-25 19:18:53 0 [Note] InnoDB: Ignoring data file './test/t1.ibd' with space ID 82. Another data file called ./test/#sql-ib97.ibd exists with the same space ID.
    mysqld: storage/innobase/include/btr0btr.h:241: buf_block_t* btr_block_get_func(const dict_index_t&, ulint, ulint, bool, const char*, unsigned int, mtr_t*): Assertion `err == DB_SUCCESS' failed.
   200425 19:18:54 [rr 40350 5231][ERROR] mysqld got signal 6 ;
 
origin/10.5 2b2dcf34f7325872b03534ebca31c396f449330f with some local changes
which should not play a significant role because the bug was observed without
these changes too.
 
No replay (336 attempts) on origin/10.4 6dab094fbd31ef93a1d51b1b9c867188b2f958c3 2020-04-27T09:22:36+02:00
 
rqg.pl \
--duration=100 \
--queries=10000000 \
--threads=1 \
--no_mask \
--seed=random \
--engine=InnoDB \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--sqltrace=MarkErrors \
--rpl_mode=none \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--log-bin \
--mysqld=--loose-max-statement-time=30 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--net_write_timeout=60 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--connect_timeout=60 \
--mysqld=--log-output=none \
--mysqld=--innodb_page_size=4K \
--mysqld=--net_read_timeout=30 \
--mysqld=--innodb-lock-wait-timeout=50 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--wait_timeout=28800 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--reporters=Backtrace,CrashRecovery1,Deadlock1,ErrorLog \
--validators=None \
--grammar=MDEV-22369.yy \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_ \
--rr=Server



 Comments   
Comment by Elena Stepanova [ 2020-04-27 ]

Stack trace from the attached log:

mysqld: 10.5/storage/innobase/include/btr0btr.h:241: buf_block_t* btr_block_get_func(const dict_index_t&, ulint, ulint, bool, const char*, unsigned int, mtr_t*): Assertion `err == DB_SUCCESS' failed.
200425 19:18:54 [ERROR] mysqld got signal 6 ;
ERROR: Reporter 'CrashRecovery1': Recovery has apparently crashed.
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
WARN: Reporter 'CrashRecovery1': Log message 'or one of the libraries it was linked against is corrupt, improperly built,' might indicate database corruption
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
WARN: Reporter 'CrashRecovery1': Log message 'diagnose the problem, but since we have already crashed, ' might indicate database corruption
something is definitely wrong and this may fail.
 
Server version: 10.5.3-MariaDB-debug-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467925 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
10.5_debug/bin/mysqld(my_print_stacktrace+0x40)[0x556573ac7e76]
10.5_debug/bin/mysqld(handle_fatal_signal+0x3ae)[0x55657316e5b9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa8bcd88890]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fa8bbe80e97]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fa8bbe82801]
/lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7fa8bbe7239a]
/lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7fa8bbe72412]
bin/mysqld(+0x11f5bec)[0x556573749bec]
bin/mysqld(+0x136d118)[0x5565738c1118]
bin/mysqld(+0x13f52ea)[0x5565739492ea]
bin/mysqld(+0x13fcb31)[0x556573950b31]
mysqld(+0x13fe5af)[0x5565739525af]
include/btr0btr.h:242(btr_block_get_func(dict_index_t const&, unsigned long, unsigned long, bool, char const*, unsigned int, mtr_t*))[0x5565739515b7]
btr/btr0pcur.cc:474(btr_pcur_move_to_next_page(btr_pcur_t*, mtr_t*))[0x5565739532cb]
include/btr0pcur.ic:286(btr_pcur_move_to_next_user_rec(btr_pcur_t*, mtr_t*))[0x556573932ab2]
dict/dict0load.cc:2575(dict_load_indexes(dict_table_t*, mem_block_info_t*, dict_err_ignore_t))[0x5565739343b2]
dict/dict0load.cc:2961(dict_load_table_one(table_name_t const&, dict_err_ignore_t, std::deque<char const*, ut_allocator<char const*, true> >&))[0x556573852c57]
dict/dict0dict.cc:222(dict_table_open_on_id_low(unsigned long, dict_err_ignore_t, bool))[0x5565738533bd]
dict/dict0dict.cc:937(dict_table_open_on_id(unsigned long, bool, dict_table_op_t, THD*, MDL_ticket**))[0x5565738536c1]
trx/trx0trx.cc:615(trx_resurrect_table_locks(trx_t*, trx_undo_t const*))[0x5565737ff439]
trx/trx0trx.cc:712(trx_resurrect(trx_undo_t*, trx_rseg_t*, long, unsigned long long, unsigned long*, bool))[0x5565735c9234]
trx/trx0trx.cc:772(trx_lists_init_at_db_start())[0x556573172252]
srv/srv0start.cc:1559(srv_start(bool))[0x556572e38a96]
handler/ha_innodb.cc:4058(innodb_init(void*))[0x556572e39891]
sql/handler.cc:584(ha_initialize_handlerton(st_plugin_int*))[0x556572cd939a]
sql/sql_plugin.cc:1459(plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool))[0x556572cda5c1]
sql/mysqld.cc:4997(init_server_components())[0x556572ccf10a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa8bbe63b97]
bin/mysqld(_start+0x2a)[0x556572ccf00a]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /dev/shm/vardir/1587842225/1/1/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             1541946              1541946              processes 
Max open files            10240                10240                files     
Max locked memory         16777216             16777216             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       1541946              1541946              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

Another more resolved stack trace

10.5 fbe2712

#7  0x00007fc39eaa1ca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000056273a2fe398 in btr_block_get_func (index=..., page=1038, mode=1, merge=true, file=0x56273ab76290 "/home/travis/src/storage/innobase/btr/btr0pcur.cc", line=476, mtr=0x7ffe83257ef0) at /home/travis/src/storage/innobase/include/btr0btr.h:241
#9  0x000056273a47341b in btr_pcur_move_to_next_page (cursor=0x7ffe83257c90, mtr=0x7ffe83257ef0) at /home/travis/src/storage/innobase/btr/btr0pcur.cc:474
#10 0x000056273a4fa75b in btr_pcur_move_to_next_user_rec (cursor=0x7ffe83257c90, mtr=0x7ffe83257ef0) at /home/travis/src/storage/innobase/include/btr0pcur.ic:286
#11 0x000056273a4ff953 in dict_load_columns (table=0x56273ef49e58, heap=0x56273ef413c8) at /home/travis/src/storage/innobase/dict/dict0load.cc:1862
#12 0x000056273a5036df in dict_load_table_one (name=..., ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK, fk_tables=std::deque with 0 elements) at /home/travis/src/storage/innobase/dict/dict0load.cc:2937
#13 0x000056273a502878 in dict_load_table (name=0x56273ef49ae8 "test/tc034", ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK) at /home/travis/src/storage/innobase/dict/dict0load.cc:2745
#14 0x000056273a504569 in dict_load_table_on_id (table_id=71, ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK) at /home/travis/src/storage/innobase/dict/dict0load.cc:3181
#15 0x000056273a4e3ea9 in dict_table_open_on_id_low (table_id=71, ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK, cached_only=false) at /home/travis/src/storage/innobase/dict/dict0dict.cc:222
#16 0x000056273a4e57b2 in dict_table_open_on_id (table_id=71, dict_locked=false, table_op=DICT_TABLE_OP_LOAD_TABLESPACE, thd=0x0, mdl=0x0) at /home/travis/src/storage/innobase/dict/dict0dict.cc:937
#17 0x000056273a405ded in trx_resurrect_table_locks (trx=0x7fc389974290, undo=0x56273ef15258) at /home/travis/src/storage/innobase/trx/trx0trx.cc:615
#18 0x000056273a4064c7 in trx_resurrect (undo=0x56273ef15258, rseg=0x56273ef15058, start_time=1587868841, start_time_micro=1592631653, rows_to_undo=0x7ffe83259ea8, is_old_insert=false) at /home/travis/src/storage/innobase/trx/trx0trx.cc:711
#19 0x000056273a406795 in trx_lists_init_at_db_start () at /home/travis/src/storage/innobase/trx/trx0trx.cc:774
#20 0x000056273a3b2c1c in srv_start (create_new_db=false) at /home/travis/src/storage/innobase/srv/srv0start.cc:1558
#21 0x000056273a180060 in innodb_init (p=0x56273cc5e328) at /home/travis/src/storage/innobase/handler/ha_innodb.cc:4042
#22 0x0000562739d29b11 in ha_initialize_handlerton (plugin=0x56273cb916b0) at /home/travis/src/sql/handler.cc:584
#23 0x00005627399e6fe2 in plugin_initialize (tmp_root=0x7ffe8325c910, plugin=0x56273cb916b0, argc=0x56273b543c60 <remaining_argc>, argv=0x56273cb504c8, options_only=false) at /home/travis/src/sql/sql_plugin.cc:1459
#24 0x00005627399e7df1 in plugin_init (argc=0x56273b543c60 <remaining_argc>, argv=0x56273cb504c8, flags=0) at /home/travis/src/sql/sql_plugin.cc:1752
#25 0x00005627398877af in init_server_components () at /home/travis/src/sql/mysqld.cc:5004
#26 0x00005627398889ee in mysqld_main (argc=33, argv=0x56273cb504c8) at /home/travis/src/sql/mysqld.cc:5557
#27 0x000056273987d3b5 in main (argc=32, argv=0x7ffe8325d378) at /home/travis/src/sql/main.cc:25

Comment by Marko Mäkelä [ 2020-04-27 ]

I did not have time to analyze this yet, but I suspect that the corrupted page was left behind because the server was killed in the middle of a page write. Normally, the doublewrite buffer should protect against that, but MDEV-19738 optimized away the logic.

This could be a bug in our recovery (a follow-up to MDEV-12699 could be needed), or we could do something wrong related to INIT_PAGE or FREE_PAGE records, either before the server was killed or during recovery.

Comment by Thirunarayanan Balathandayuthapani [ 2020-04-28 ]

MDEV-21572 fails to set err code while returning from the page which was initialized
using redo logs.

Generated at Thu Feb 08 09:14:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.