[MDEV-28954] purge_coordinator_callback() is invoked multiple times concurrently Created: 2022-06-27  Updated: 2022-06-27  Resolved: 2022-06-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Vladislav Vaintroub
Resolution: Done Votes: 0
Labels: corruption, race, rr-profile

Issue Links:
Problem/Incident
is caused by MDEV-11026 Make InnoDB number of IO write/read t... Closed

 Description   

mleich produced an rr replay trace where multiple purge coordinator tasks are running simulataneously.

ssh pluto
rr replay /data/results/1654860797/001037/1/rr/latest-trace

bb-10.10-MDEV-11026 2acc2601910d5292f1ea46e06e5affeb56dc8010

(rr) c
Continuing.
mysqld: /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1186: ulint trx_purge_attach_undo_recs(ulint): Assertion `node->undo_recs.empty()' failed.
 
Thread 3 received signal SIGABRT, Aborted.
[Switching to Thread 2420074.2420795]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(rr) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000073e3778e7859 in __GI_abort () at abort.c:79
#2  0x000073e3778e7729 in __assert_fail_base (
    fmt=0x73e377a7d588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x55a3b83be3e0 "node->undo_recs.empty()", 
    file=0x55a3b83bc860 "/data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc", line=1186, function=<optimized out>) at assert.c:92
#3  0x000073e3778f8f36 in __GI___assert_fail (
    assertion=0x55a3b83be3e0 "node->undo_recs.empty()", 
    file=0x55a3b83bc860 "/data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc", line=1186, 
    function=0x55a3b83be380 "ulint trx_purge_attach_undo_recs(ulint)")
    at assert.c:101
#4  0x000055a3b71a4f75 in trx_purge_attach_undo_recs (n_purge_threads=1)
    at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1186
#5  0x000055a3b71a5e04 in trx_purge (n_tasks=1, truncate=false)
    at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1346
#6  0x000055a3b717b874 in purge_coordinator_state::do_purge (
    this=0x55a3ba001500 <purge_state>)
    at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1611
#7  0x000055a3b71794e5 in purge_coordinator_callback ()
    at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1765
#8  0x000055a3b752e6b8 in tpool::task_group::execute (
(rr) break trx0purge.cc:1254
(rr) reverse-continue
Continuing.
 
Thread 3 received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	in ../sysdeps/unix/sysv/linux/raise.c
(rr) reverse-continue
Continuing.
[Switching to Thread 2420074.2420757]
 
Thread 9 hit Breakpoint 1, trx_purge_attach_undo_recs (n_purge_threads=2) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1254
1254			node->undo_recs.push(purge_rec);
(rr) continue
Continuing.
mysqld: /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1186: ulint trx_purge_attach_undo_recs(ulint): Assertion `node->undo_recs.empty()' failed.
 
Thread 3 received signal SIGABRT, Aborted.
[Switching to Thread 2420074.2420795]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(rr) thread apply 9 backtrace
 
Thread 9 (Thread 2420074.2420757):
#0  0x0000000067df63b7 in sched_yield () at /home/roc/rr/rr/src/preload/overrides.c:215
#1  0x000055a3b70352ee in __gthread_yield () at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:693
#2  0x000055a3b7035389 in std::this_thread::yield () at /usr/include/c++/9/thread:356
#3  0x000055a3b71a3f93 in trx_purge_choose_next_log () at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1018
#4  0x000055a3b71a466d in trx_purge_get_next_rec (n_pages_handled=0x29b97acab720, heap=0x621000050100) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1083
#5  0x000055a3b71a4ba8 in trx_purge_fetch_next_rec (roll_ptr=0x29b97acab768, n_pages_handled=0x29b97acab720, heap=0x621000050100) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1154
#6  0x000055a3b71a5511 in trx_purge_attach_undo_recs (n_purge_threads=2) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1225
#7  0x000055a3b71a5e04 in trx_purge (n_tasks=2, truncate=false) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1346
#8  0x000055a3b717b874 in purge_coordinator_state::do_purge (this=0x55a3ba001500 <purge_state>) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1611
#9  0x000055a3b71794e5 in purge_coordinator_callback () at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1765
#10 0x000055a3b752e6b8 in tpool::task_group::execute (this=0x55a3ba001980 <purge_coordinator_task_group>, t=0x55a3ba001a40 <purge_coordinator_task>) at /data/Server/bb-10.10-MDEV-11026/tpool/task_group.cc:68

The design constraint is that there is only one purge coordinator task running at a time. Before MDEV-16264 this was trivially satisfied by the fact that there was a dedicated purge coordinator thread.



 Comments   
Comment by Vladislav Vaintroub [ 2022-06-27 ]

This is a local error in the bb-10.10-MDEV-11026. comes from changing tpool::thread_group constructor. Already fixed in the mentioned branch.

Generated at Thu Feb 08 10:04:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.