Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12091

Shutdown fails to wait for rollback of recovered transactions to finish

    Details

    • Sprint:
      10.2.5-1

      Description

      If InnoDB is still rolling back resurrected transactions on while it is being shut down, it is possible that buf_flush_page_cleaner_thread() has already invoked os_event_free(buf_flush_event) while that event is being signalled:

          #1 0x7fc605437ae5 in os_event_set(os_event*) /mariadb/server/storage/innobase/os/os0sync.cc:428:2
          #2 0x7fc60524767c in buf_LRU_get_free_block(buf_pool_t*) /mariadb/server/storage/innobase/buf/buf0lru.cc:1307:4
          #3 0x7fc605223575 in buf_page_init_for_read(dberr_t*, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long) /mariadb/server/storage/innobase/buf/buf0buf.cc:3940:11
          #4 0x7fc605251e1f in buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long, buf_page_t**) /mariadb/server/storage/innobase/buf/buf0rea.cc:164:10
          #5 0x7fc605252b97 in buf_read_ahead_linear(unsigned long, unsigned long, unsigned long, unsigned long) /mariadb/server/storage/innobase/buf/buf0rea.cc:728:13
          #6 0x7fc605220bc9 in buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, dberr_t*) /mariadb/server/storage/innobase/buf/buf0buf.cc:3430:3
          #7 0x7fc6051e430e in btr_block_get_func(unsigned long, unsigned long, unsigned long, unsigned long, char const*, unsigned long, dict_index_t*, mtr_t*) /mariadb/server/storage/innobase/include/btr0btr.ic:59:10
          #8 0x7fc6051e3811 in btr_cur_latch_leaves(unsigned char*, unsigned long, unsigned long, unsigned long, unsigned long, btr_cur_t*, mtr_t*) /mariadb/server/storage/innobase/btr/btr0cur.cc:269:16
          #9 0x7fc6051e1ad8 in btr_cur_search_to_nth_level(dict_index_t*, unsigned long, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, unsigned long, char const*, unsigned long, mtr_t*) /mariadb/server/storage/innobase/btr/btr0cur.cc:737:4
          #10 0x7fc6051fa27a in btr_pcur_open_with_no_init_func(dict_index_t*, dtuple_t const*, unsigned long, unsigned long, btr_pcur_t*, unsigned long, char const*, unsigned long, mtr_t*) /mariadb/server/storage/innobase/include/btr0pcur.ic:484:8
          #11 0x7fc6051f9901 in btr_pcur_restore_position_func(unsigned long, btr_pcur_t*, char const*, unsigned long, mtr_t*) /mariadb/server/storage/innobase/btr/btr0pcur.cc:340:2
          #12 0x7fc60552502d in row_undo_ins_remove_clust_rec(undo_node_t*) /mariadb/server/storage/innobase/row/row0uins.cc:144:12
          #13 0x7fc605523e40 in row_undo_ins(undo_node_t*) /mariadb/server/storage/innobase/row/row0uins.cc:461:9
          #14 0x7fc60553004e in row_undo(undo_node_t*, que_thr_t*) /mariadb/server/storage/innobase/row/row0undo.cc:304:9
          #15 0x7fc60552fc12 in row_undo_step(que_thr_t*) /mariadb/server/storage/innobase/row/row0undo.cc:351:8
          #16 0x7fc60547e0de in que_thr_step(que_thr_t*) /mariadb/server/storage/innobase/que/que0que.cc:1077:9
          #17 0x7fc60547d0ec in que_run_threads_low(que_thr_t*) /mariadb/server/storage/innobase/que/que0que.cc:1141:14
          #18 0x7fc60547cded in que_run_threads(que_thr_t*) /mariadb/server/storage/innobase/que/que0que.cc:1182:2
          #19 0x7fc60558d89d in trx_rollback_active(trx_t*) /mariadb/server/storage/innobase/trx/trx0roll.cc:607:2
          #20 0x7fc60558b5d0 in trx_rollback_resurrected(trx_t*, unsigned long) /mariadb/server/storage/innobase/trx/trx0roll.cc:707:4
      

      MariaDB 10.2 should not be able to trigger this error, because it is srv_free() that is invoking os_event_destroy(buf_flush_event).

      Nevertheless, we should ensure that the page cleaner does not exit while there are dirty pages in the buffer pool.

      We should also ensure that each subsystem cleanup is done in the main shutdown thread after the threads of the subsystem have exited. Another bad example is btr_defragment_thread that is calling btr_defragment_shutdown(). Apparently the main thread is not at all waiting for that shutdown to finish.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                marko Marko Mäkelä
                Reporter:
                marko Marko Mäkelä
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: