Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28954

purge_coordinator_callback() is invoked multiple times concurrently

    XMLWordPrintable

Details

    Description

      mleich produced an rr replay trace where multiple purge coordinator tasks are running simulataneously.

      ssh pluto
      rr replay /data/results/1654860797/001037/1/rr/latest-trace
      

      bb-10.10-MDEV-11026 2acc2601910d5292f1ea46e06e5affeb56dc8010

      (rr) c
      Continuing.
      mysqld: /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1186: ulint trx_purge_attach_undo_recs(ulint): Assertion `node->undo_recs.empty()' failed.
       
      Thread 3 received signal SIGABRT, Aborted.
      [Switching to Thread 2420074.2420795]
      __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      (rr) backtrace
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x000073e3778e7859 in __GI_abort () at abort.c:79
      #2  0x000073e3778e7729 in __assert_fail_base (
          fmt=0x73e377a7d588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
          assertion=0x55a3b83be3e0 "node->undo_recs.empty()", 
          file=0x55a3b83bc860 "/data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc", line=1186, function=<optimized out>) at assert.c:92
      #3  0x000073e3778f8f36 in __GI___assert_fail (
          assertion=0x55a3b83be3e0 "node->undo_recs.empty()", 
          file=0x55a3b83bc860 "/data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc", line=1186, 
          function=0x55a3b83be380 "ulint trx_purge_attach_undo_recs(ulint)")
          at assert.c:101
      #4  0x000055a3b71a4f75 in trx_purge_attach_undo_recs (n_purge_threads=1)
          at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1186
      #5  0x000055a3b71a5e04 in trx_purge (n_tasks=1, truncate=false)
          at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1346
      #6  0x000055a3b717b874 in purge_coordinator_state::do_purge (
          this=0x55a3ba001500 <purge_state>)
          at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1611
      #7  0x000055a3b71794e5 in purge_coordinator_callback ()
          at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1765
      #8  0x000055a3b752e6b8 in tpool::task_group::execute (
      (rr) break trx0purge.cc:1254
      (rr) reverse-continue
      Continuing.
       
      Thread 3 received signal SIGABRT, Aborted.
      __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      50	in ../sysdeps/unix/sysv/linux/raise.c
      (rr) reverse-continue
      Continuing.
      [Switching to Thread 2420074.2420757]
       
      Thread 9 hit Breakpoint 1, trx_purge_attach_undo_recs (n_purge_threads=2) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1254
      1254			node->undo_recs.push(purge_rec);
      (rr) continue
      Continuing.
      mysqld: /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1186: ulint trx_purge_attach_undo_recs(ulint): Assertion `node->undo_recs.empty()' failed.
       
      Thread 3 received signal SIGABRT, Aborted.
      [Switching to Thread 2420074.2420795]
      __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      (rr) thread apply 9 backtrace
       
      Thread 9 (Thread 2420074.2420757):
      #0  0x0000000067df63b7 in sched_yield () at /home/roc/rr/rr/src/preload/overrides.c:215
      #1  0x000055a3b70352ee in __gthread_yield () at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:693
      #2  0x000055a3b7035389 in std::this_thread::yield () at /usr/include/c++/9/thread:356
      #3  0x000055a3b71a3f93 in trx_purge_choose_next_log () at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1018
      #4  0x000055a3b71a466d in trx_purge_get_next_rec (n_pages_handled=0x29b97acab720, heap=0x621000050100) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1083
      #5  0x000055a3b71a4ba8 in trx_purge_fetch_next_rec (roll_ptr=0x29b97acab768, n_pages_handled=0x29b97acab720, heap=0x621000050100) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1154
      #6  0x000055a3b71a5511 in trx_purge_attach_undo_recs (n_purge_threads=2) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1225
      #7  0x000055a3b71a5e04 in trx_purge (n_tasks=2, truncate=false) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/trx/trx0purge.cc:1346
      #8  0x000055a3b717b874 in purge_coordinator_state::do_purge (this=0x55a3ba001500 <purge_state>) at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1611
      #9  0x000055a3b71794e5 in purge_coordinator_callback () at /data/Server/bb-10.10-MDEV-11026/storage/innobase/srv/srv0srv.cc:1765
      #10 0x000055a3b752e6b8 in tpool::task_group::execute (this=0x55a3ba001980 <purge_coordinator_task_group>, t=0x55a3ba001a40 <purge_coordinator_task>) at /data/Server/bb-10.10-MDEV-11026/tpool/task_group.cc:68
      

      The design constraint is that there is only one purge coordinator task running at a time. Before MDEV-16264 this was trivially satisfied by the fact that there was a dedicated purge coordinator thread.

      Attachments

        Issue Links

          Activity

            People

              wlad Vladislav Vaintroub
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.