[MDEV-25967] Server restarted on backupped data crashes during CHECK TABLE Created: 2021-06-18  Updated: 2021-06-23  Resolved: 2021-06-23

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Storage Engine - InnoDB
Affects Version/s: 10.6.2
Fix Version/s: 10.6.3

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile

Issue Links:
Problem/Incident
is caused by MDEV-24626 Remove synchronous write of page0 and... Closed

 Description   

Scenario:
1. Start the "source" DB server and generate some initial data
2. 33 sessions run a DDL/DML mix concurrent
3. During 2. is ongoing
    3.1  mariabackup --backup  from source DB server
    3.2  mariabackup --prepare
    3.3  Start a DB server on the backupped data
    3.4 Check the data within that server
    And here leads a "CHECK TABLE `test`.`t7` EXTENDED" to
    [rr 2157024 6154]2021-06-18  7:03:52 0 [ERROR] [FATAL] InnoDB: Trying to read 4096 bytes at 135168 outside the bounds of the file: ./test/t7.ibd
[rr 2157024 6175]210618  7:03:52 [rr 2157024 6177][ERROR] mysqld got signal 6 ;
    [ERROR] [FATAL] InnoDB: Trying to read 4096 bytes at 135168 outside the bounds of the file: ./test/t7.ibd
 
origin/bb-10.6-MDEV-25062 7848c57f90335e162f8eaf4e49268d15101188a9 2021-06-18T11:26:57+03:00
 
Per Marko this problem is caused by MDEV-24626.
 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f1d7f6f3859 in __GI_abort () at abort.c:79
#2  0x000056379c575d63 in ib::fatal::~fatal (this=0x7f1d68213af0, __in_chrg=<optimized out>) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/ut/ut0ut.cc:519
#3  0x000056379c6aaabe in fil_report_invalid_page_access (name=0x6020000034d0 "./test/t7.ibd", offset=135168, len=4096, is_read=true) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/fil/fil0fil.cc:2781
#4  0x000056379c6ab062 in fil_space_t::io (this=0x6120000022c0, type=..., offset=135168, len=4096, buf=0x7f1d6c119000, bpage=0x7f1d6b9aa600) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/fil/fil0fil.cc:2871
#5  0x000056379c61dab9 in buf_read_page_low (err=0x7f1d68214180, space=0x6120000022c0, sync=true, mode=132, page_id=..., zip_size=0, unzip=false) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/buf/buf0rea.cc:338
#6  0x000056379c61e953 in buf_read_page (page_id=..., zip_size=0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/buf/buf0rea.cc:483
#7  0x000056379c5e90e5 in buf_page_get_low (page_id=..., zip_size=0, rw_latch=1, guess=0x0, mode=10, mtr=0x7f1d682148c0, err=0x0, allow_ibuf_merge=false) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/buf/buf0buf.cc:2672
#8  0x000056379c5ea28e in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=1, guess=0x0, mode=10, mtr=0x7f1d682148c0, err=0x0, allow_ibuf_merge=false) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/buf/buf0buf.cc:3106
#9  0x000056379c5bfe82 in btr_copy_blob_prefix (buf=0x61f0000000f0 '\276' <repeats 200 times>..., len=3072, id=..., offset=38) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/btr/btr0cur.cc:7904
#10 0x000056379c5c1079 in btr_copy_externally_stored_field_prefix_low (buf=0x61f0000000f0 '\276' <repeats 200 times>..., len=3072, zip_size=0, id=<error reading variable: Cannot access memory at address 0xffffffffffffffc0>, offset=38)
    at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/btr/btr0cur.cc:8100
#11 0x000056379c5c1369 in btr_copy_externally_stored_field_prefix (buf=0x61f0000000f0 '\276' <repeats 200 times>..., len=3072, zip_size=0, data=0x6130000100c4 "", local_len=0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/btr/btr0cur.cc:8150
#12 0x000056379c76d852 in row_ext_cache_fill (ext=0x618000010248, i=0, space=0x6120000022c0, dfield=0x618000010180) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0ext.cc:78
#13 0x000056379c76db6a in row_ext_create (n_ext=1, ext=0x6130000100e8, table=..., tuple=0x6180000100f0, heap=0x61300000ffc0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0ext.cc:128
#14 0x000056379c4a819a in row_build_low (type=1, index=0x61600000baf0, rec=0x7f1d6c08370d "\200", offsets=0x7f1d682155d0, col_table=0x617000002af0, defaults=0x0, add_v=0x0, col_map=0x0, ext=0x61a00000fe78, heap=0x61300000ffc0)
    at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0row.cc:604
#15 0x000056379c4a8394 in row_build (type=1, index=0x61600000baf0, rec=0x7f1d6c08370d "\200", offsets=0x7f1d682155d0, col_table=0x0, defaults=0x0, col_map=0x0, ext=0x61a00000fe78, heap=0x61300000ffc0)
    at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0row.cc:663
#16 0x000056379c4cc39c in row_undo_search_clust_to_pcur (node=0x61a00000fcf0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0undo.cc:214
#17 0x000056379c770c31 in row_undo_ins_parse_undo_rec (node=0x61a00000fcf0, dict_locked=false) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0uins.cc:518
#18 0x000056379c771426 in row_undo_ins (node=0x61a00000fcf0, thr=0x6160000106a0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0uins.cc:616
#19 0x000056379c4cdb09 in row_undo (node=0x61a00000fcf0, thr=0x6160000106a0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0undo.cc:422
#20 0x000056379c4cdf40 in row_undo_step (thr=0x6160000106a0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/row/row0undo.cc:478
#21 0x000056379c3f3b0d in que_thr_step (thr=0x6160000106a0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/que/que0que.cc:651
#22 0x000056379c3f3d7b in que_run_threads_low (thr=0x6160000106a0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/que/que0que.cc:709
#23 0x000056379c3f3e98 in que_run_threads (thr=0x6160000106a0) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/que/que0que.cc:729
#24 0x000056379c53ebd8 in trx_rollback_active (trx=0x7f1d73a00988) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/trx/trx0roll.cc:602
#25 0x000056379c53f8c4 in trx_rollback_recovered (all=true) at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/trx/trx0roll.cc:739
#26 0x000056379c53fbbf in trx_rollback_all_recovered () at /data/Server/bb-10.6-MDEV-25062A/storage/innobase/trx/trx0roll.cc:785
#27 0x000056379c79d3cd in tpool::task_group::execute (this=0x56379e382280 <rollback_all_recovered_group>, t=0x56379e382340 <rollback_all_recovered_task>) at /data/Server/bb-10.6-MDEV-25062A/tpool/task_group.cc:55
#28 0x000056379c7988f9 in tpool::thread_pool_generic::worker_main (this=0x618000000080, thread_var=0x63000001fd80) at /data/Server/bb-10.6-MDEV-25062A/tpool/tpool_generic.cc:550
#29 0x000056379c799564 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/9/bits/invoke.h:70
#30 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
#31 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
#32 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
#33 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
#34 0x00007f1d7fb00d84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#35 0x00007f1d7fc1c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#36 0x00007f1d7f7f0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
sdp:/data/Results/1624021740/TBR-1085/dev/shm/vardir/1624021740/62/1_clone/rr
_RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-0
     rr trace of mariabackup --backup
_RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-1
    rr trace of mariabackup --prepare
_RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0
     rr trace of the server started on the backupped data.
sdp:/data/Results/1624021740/TBR-1085/dev/shm/vardir/1624021740/62/1/data_backup
     Copy of the backupped data before server start
 
sdp:/data/Results/1624021740/TBR-1085/dev/shm/vardir/1624021740/62/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1
     rr trace of the "source server"



 Comments   
Comment by Matthias Leich [ 2021-06-18 ]

Archive
sdp:/data/Results/1624021740/001802.tar.xz
Error pattern for RQG:
[ 'TBR-1085', 'ERROR: Reporter Mariabackup : \'CHECK TABLE .{1,150} EXTENDED\' failed with : 2013.+\[ERROR\] \[FATAL\] InnoDB: Trying to read .{1,30} bytes at .{1,30} outside the bounds of the file' ],
 
RQG
====
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--reporters=Mariabackup_linux \
--mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=240 \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--loose-file-key-management-filename=/home/mleich/RQG/conf/mariadb/encryption_keys.txt \
--duration=300 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--loose-innodb-sync-debug \
--mysqld=--innodb_stats_persistent=on \
--mysqld=--innodb_adaptive_hash_index=off \
--mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
--mysqld=--loose-max-statement-time=30 \
--threads=33 \
--mysqld=--innodb-use-native-aio=0 \
--rr=Extended \
--rr_options="--wait" \
--mysqld=--innodb_page_size=4K \
--mysqld=--innodb-buffer-pool-size=256M \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_

Comment by Marko Mäkelä [ 2021-06-18 ]

The function recv_sys_t::recover_deferred() that was added in MDEV-24626 fails to extend the file to the correct size. In the case that I analyzed, the FSP_SIZE was 36 pages, and fil_node_t::read_page0() would round down the file size of 33.125 pages to 33 pages (page numbers 0 to 32). An attempt to read page number 33 would then trigger the out-of-bounds assertion.

Generated at Thu Feb 08 09:41:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.