[MDEV-25797] Optimize table after CrashRecovery causes [ERROR] InnoDB: tried to purge non-delete-marked record in index Created: 2021-05-27  Updated: 2021-06-01

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.11
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File MDEV-25783.yy     File MDEV-25783.zz    
Issue Links:
Relates
relates to MDEV-25783 CHECK TABLE harvests InnoDB: Index 'a... Closed

 Description   

Scenario:
 
1. Start the server and generate some initial data (table tab1 with 10000 records, row_format REDUNDANT ), no generated columns
2. 33 sessions run concurrent UPDATEs (autocommit) on tab1
3. SIGKILL server during 2. is ongoing
4. Restart attempt with success
5. OPTIMIZE TABLE `test`.`tab1` harvests  
[ERROR] InnoDB: tried to purge non-delete-marked record in index `acbdef` of table `test`.`tab1`: tuple: TUPLE (info_bits=0, 7 fields): {[11]mzuibqwprga(0x6D7A756962717770726761),[1]i(0x69),[16]african-american(0x6166726963616E2D616D65726963616E),[5]birth(0x6269727468),[1]o(0x6F),[5]mouse(0x6D6F757365),[4]   w(0x00000277)}, record: RECORD(info_bits=0, 7 fields): {[11]mzuibqwprga(0x6D7A756962717770726761),[1]i(0x69),[16]african-american(0x6166726963616E2D616D65726963616E),[5]birth(0x6269727468),[1]o(0x6F),[5]mouse(0x6D6F757365),[4]   w(0x00000277)}
# 2021-05-26T13:03:34 [720993] | mysqld: /data/Server/10.5_old/storage/innobase/row/row0purge.cc:484: bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*): Assertion `0' failed.
 
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000564c3ac68d36 in my_write_core (sig=sig@entry=6) at /data/Server/10.5_old/mysys/stacktrace.c:424
#2  0x0000564c39643d56 in handle_fatal_signal (sig=<optimized out>) at /data/Server/10.5_old/sql/signal_handler.cc:343
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00007fdfde4fc859 in __GI_abort () at abort.c:79
#6  0x00007fdfde4fc729 in __assert_fail_base (fmt=0x7fdfde692588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x564c3b5546e0 "0", file=0x564c3b5547c0 "/data/Server/10.5_old/storage/innobase/row/row0purge.cc", line=484, 
    function=<optimized out>) at assert.c:92
#7  0x00007fdfde50df36 in __GI___assert_fail (assertion=assertion@entry=0x564c3b5546e0 "0", file=file@entry=0x564c3b5547c0 "/data/Server/10.5_old/storage/innobase/row/row0purge.cc", line=line@entry=484, 
    function=function@entry=0x564c3b558720 "bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*)") at assert.c:101
#8  0x0000564c3a505b4d in row_purge_remove_sec_if_poss_leaf (node=node@entry=0x61a000002508, index=index@entry=0x617000005888, entry=entry@entry=0x619001fe2908) at /data/Server/10.5_old/storage/innobase/row/row0purge.cc:484
#9  0x0000564c3a511adb in row_purge_remove_sec_if_poss (entry=0x619001fe2908, index=0x617000005888, node=0x61a000002508) at /data/Server/10.5_old/storage/innobase/row/row0purge.cc:568
#10 row_purge_upd_exist_or_extern_func (thr=thr@entry=0x6160059a04d0, node=node@entry=0x61a000002508, undo_rec=undo_rec@entry=0x621003b34588 "") at /data/Server/10.5_old/storage/innobase/row/row0purge.cc:762
#11 0x0000564c3a5144c2 in row_purge_record_func (node=node@entry=0x61a000002508, undo_rec=undo_rec@entry=0x621003b34588 "", thr=thr@entry=0x6160059a04d0, updated_extern=<optimized out>)
    at /data/Server/10.5_old/storage/innobase/row/row0purge.cc:1071
#12 0x0000564c3a515e6c in row_purge (node=node@entry=0x61a000002508, undo_rec=undo_rec@entry=0x621003b34588 "", thr=thr@entry=0x6160059a04d0) at /data/Server/10.5_old/storage/innobase/row/row0purge.cc:1110
#13 0x0000564c3a5162c5 in row_purge_step (thr=thr@entry=0x6160059a04d0) at /data/Server/10.5_old/storage/innobase/row/row0purge.cc:1159
#14 0x0000564c3a3b7bca in que_thr_step (thr=thr@entry=0x6160059a04d0) at /data/Server/10.5_old/storage/innobase/que/que0que.cc:946
#15 0x0000564c3a3b8b0d in que_run_threads_low (thr=thr@entry=0x6160059a04d0) at /data/Server/10.5_old/storage/innobase/que/que0que.cc:1008
#16 0x0000564c3a3b980e in que_run_threads (thr=0x6160059a04d0) at /data/Server/10.5_old/storage/innobase/que/que0que.cc:1048
#17 0x0000564c3a65094c in trx_purge (n_tasks=<optimized out>, truncate=<optimized out>) at /data/Server/10.5_old/storage/innobase/trx/trx0purge.cc:1300
#18 0x0000564c3a5b521b in srv_do_purge (n_total_purged=n_total_purged@entry=0x7fdf9c7a3b00) at /data/Server/10.5_old/storage/innobase/srv/srv0srv.cc:1944
#19 0x0000564c3a5b5764 in purge_coordinator_callback_low () at /data/Server/10.5_old/storage/innobase/srv/srv0srv.cc:2041
#20 0x0000564c3a5b88d7 in purge_coordinator_callback () at /data/Server/10.5_old/storage/innobase/srv/srv0srv.cc:2070
#21 0x0000564c3ab59a80 in tpool::task_group::execute (this=0x564c3cbf9440 <purge_coordinator_task_group>, t=t@entry=0x564c3cbf9380 <purge_coordinator_task>) at /data/Server/10.5_old/tpool/task_group.cc:55
#22 0x0000564c3ab5a19f in tpool::task::execute (this=0x564c3cbf9380 <purge_coordinator_task>) at /data/Server/10.5_old/tpool/task.cc:47
#23 0x0000564c3ab529c8 in tpool::thread_pool_generic::worker_main (this=0x618000000480, thread_var=0x63000001cb80) at /data/Server/10.5_old/tpool/tpool_generic.cc:542
#24 0x0000564c3ab536e2 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/9/bits/invoke.h:89
#25 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
#26 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
#27 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
#28 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
#29 0x00007fdfde909d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#30 0x00007fdfdea25609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#31 0x00007fdfde5f9293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)
 
sdp:/data/Results/1622048657/TBR-1092
gdb -c dev/shm/vardir/1622048657/205/1/data/core /data/Server_bin/10.5_old_asan_Og/bin/mysql
 
GIT_SHOW: HEAD, origin/bb-10.5-monty, origin/10.5 c80cecb5e3e509d37929b4f446edf9b6c636b98f 2021-05-23T19:53:38+03:00
 
RQG
====
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--gendata=MDEV-25783.zz \
--max_gd_duration=600 \
--grammar=MDEV-25783.yy \
--mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=CrashRecovery1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=240 \
--duration=300 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--innodb_adaptive_hash_index=off \
--threads=9 \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_page_size=4K \
--mysqld=--innodb-buffer-pool-size=5M \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_
 
The problem might be a sibling of https://jira.mariadb.org/browse/MDEV-25776 .
Replays of the effect described here are extreme rare, 2 times on 10000 RQG tests,
and none of them happened when rr was invoked.



 Comments   
Comment by Marko Mäkelä [ 2021-05-27 ]

The data directory dump that is attached to MDEV-25783 is reproducing something similar. The rr replay trace in that ticket was probably just lucky that it did not give purge a chance to run before the CHECK TABLE already reported an error.

Generated at Thu Feb 08 09:40:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.