Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21075

Corruption in the undo log that is triggered during a purge operation.

    XMLWordPrintable

    Details

      Description

      Hello,

      We've seen a few times in our systems an scenario that the database will not be able to start.

      If we want to recover from this we will have to end up starting with innodb_force_recovery =2 and then dump and load which is not the best of the experiences.

      Sadly I can't reproduce the issue.

      The signature of the the crash on the error log is:
      mysys/stacktrace.c:270(my_print_stacktrace)[0x5606f32e55f9]
      sql/signal_handler.cc:165(handle_fatal_signal)[0x5606f2d785ed]
      sigaction.c:0(__restore_rt)[0x2ad480e2e100]
      :0(__GI_raise)[0x2ad4815885f7]
      :0(__GI_abort)[0x2ad481589ce8]
      ut/ut0rbt.cc:472(rbt_eject_node)[0x5606f2adc687]
      page/page0page.cc:863(page_copy_rec_list_start(buf_block_t*, buf_block_t*, unsigned char*, dict_index_t*, mtr_t*))[0x5606f307dd28]
      btr/btr0btr.cc:3905(btr_compress(btr_cur_t*, unsigned long, mtr_t*))[0x5606f315bff7]
      btr/btr0cur.cc:5254(btr_cur_compress_if_useful(btr_cur_t*, unsigned long, mtr_t*))[0x5606f316907b]
      btr/btr0cur.cc:5674(btr_cur_pessimistic_delete(dberr_t*, unsigned long, btr_cur_t*, unsigned long, bool, mtr_t*))[0x5606f316bbbf]
      row/row0purge.cc:177(row_purge_remove_clust_if_poss_low)[0x5606f30d4f9d]
      row/row0purge.cc:222(row_purge_remove_clust_if_poss)[0x5606f30d66d3]
      row/row0purge.cc:1110(row_purge)[0x5606f30d7749]
      que/que0que.cc:1046(que_thr_step)[0x5606f30942cd]
      include/sync0types.h:1149(my_atomic_addlint)[0x5606f3121d81]
      srv/srv0srv.cc:2585(srv_do_purge)[0x5606f3105cae]
      pthread_create.c:0(start_thread)[0x2ad480e26dc5]
      /lib64/libc.so.6(clone+0x6d)[0x2ad481649c9d]

      The backtrace from gdb is:

      (gdb) bt
      #0 0x00002b42be49b5f7 in raise () from /lib64/libc.so.6
      #1 0x00002b42be49cce8 in abort () from /lib64/libc.so.6
      #2 0x00005581806a0687 in ut_dbg_assertion_failed (expr=expr@entry=0x558181285d90 "cur2",
      file=file@entry=0x558181285840 " /local/home/src/MariaDB/storage/innobase/page/page0page.cc", line=line@entry=863)
      at /local/home/src/MariaDB/storage/innobase/ut/ut0dbg.cc:61
      #3 0x0000558180c41d28 in page_copy_rec_list_start (new_block=0x2b44ee624780, block=block@entry=0x2b44ee624900, rec=0x2b44eef18070 "supremum", index=index@entry=0x2b45c84976f0,
      mtr=mtr@entry=0x2b45cfc04040) at /local/home/src/MariaDB/storage/innobase/page/page0page.cc:863
      #4 0x0000558180d1fff7 in btr_compress (cursor=cursor@entry=0x2b42bf2b5360, adjust=adjust@entry=0, mtr=mtr@entry=0x2b45cfc04040)
      at /local/home/src/MariaDB/storage/innobase/btr/btr0btr.cc:3903
      #5 0x0000558180d2d07b in btr_cur_compress_if_useful (cursor=cursor@entry=0x2b42bf2b5360, adjust=adjust@entry=0, mtr=mtr@entry=0x2b45cfc04040)
      at /local/home/src/MariaDB/storage/innobase/btr/btr0cur.cc:5254
      #6 0x0000558180d2fbbf in btr_cur_pessimistic_delete (err=err@entry=0x2b45cfc03d14, has_reserved_extents=has_reserved_extents@entry=0, cursor=cursor@entry=0x2b42bf2b5360, flags=flags@entry=0,
      rollback=rollback@entry=false, mtr=mtr@entry=0x2b45cfc04040) at /local/home/src/MariaDB/storage/innobase/btr/btr0cur.cc:5674
      #7 0x0000558180c98f9d in row_purge_remove_clust_if_poss_low (node=node@entry=0x2b42bf2b52c8, mode=mode@entry=65569)
      at /local/home/src/MariaDB/storage/innobase/row/row0purge.cc:173
      #8 0x0000558180c9a6d3 in row_purge_remove_clust_if_poss (node=0x2b42bf2b52c8) at /local/home/src/MariaDB/storage/innobase/row/row0purge.cc:222
      #9 row_purge_del_mark (node=0x2b42bf2b52c8) at /local/home/src/MariaDB/storage/innobase/row/row0purge.cc:663
      #10 row_purge_record_func (node=node@entry=0x2b42bf2b52c8, undo_rec=undo_rec@entry=0x2b45c727a9c8 "", updated_extern=<optimized out>)
      at /local/home/src/MariaDB/storage/innobase/row/row0purge.cc:1042
      #11 0x0000558180c9b749 in row_purge (thr=0x2b42bf2b5210, undo_rec=0x2b45c727a9c8 "", node=0x2b42bf2b52c8) at /local/home/src/MariaDB/storage/innobase/row/row0purge.cc:1107
      #12 row_purge_step (thr=thr@entry=0x2b42bf2b5210) at /local/home/src/MariaDB/storage/innobase/row/row0purge.cc:1186
      #13 0x0000558180c582cd in que_thr_step (thr=0x2b42bf2b5210) at /local/home/src/MariaDB/storage/innobase/que/que0que.cc:1046
      #14 que_run_threads_low (thr=0x2b42bf2b5210) at /local/home/src/MariaDB/storage/innobase/que/que0que.cc:1108
      #15 que_run_threads (thr=0x2b42bf2b5210) at /local/home/src/MariaDB/storage/innobase/que/que0que.cc:1148
      #16 0x0000558180ce5d81 in trx_purge (n_purge_threads=<optimized out>, truncate=<optimized out>) at /local/home/src/MariaDB/storage/innobase/trx/trx0purge.cc:1571
      #17 0x0000558180cc9cae in srv_do_purge (n_total_purged=<synthetic pointer>) at /local/home/src/MariaDB/storage/innobase/srv/srv0srv.cc:2581
      #18 srv_purge_coordinator_thread (arg=<optimized out>) at /local/home/src/MariaDB/storage/innobase/srv/srv0srv.cc:2714
      #19 0x00002b42bdd39dc5 in start_thread () from /lib64/libpthread.so.0
      #20 0x00002b42be55cc9d in clone () from /lib64/libc.so.6

      I will also attach some gdb debugging and a page dump. From my initial analysis of the page this is what I see:

      The page dump shows that the page is well built, this is, it has all the components and does not seem to have anything missing but has some things that are off for me. Lets look at it:

      FIL_PAGE_NEXT in 0x00d is 4294967295 ? the current page is 2438 the only explanation would be that it is used as end of the list. I am not that familiar with the code so not sure if this is the case.

      FIL_PAGE_FILE_FLUSH_LSN in 0x010b is empty?? however FIL_PAGE_ARCH_LOG_NO at 0x0203 has information

      PAGE_N_DIR_SLOTS there are 19

      PAGE_N_HEAP shows that are 32865 records in the page as it can be seen in 0x02b

      There are 6894 free (deleted) records on the page that use 5896 bytes

      PAGE_LAST_INSERT is empty in 0x0300

      PAGE_DIRECTION is RIGHT and 94 records where inserted in that direction

      PAGE_MAX_TRX_ID is empty??

      PAGE_N_RECS in 0x037 has 73 user records. Notice that this is really low compared with PAGE_N_HEAP

      PAGE_INDEX_ID is 9007 which matches with the info from gdb (as it should)

      PAGE_BTR_SEG_LEAF is empty

      PAGE_BTR_SEG_TOP Is empty

      My take after debugging and looking into the code is that something may go wrong when calling page_cur_insert_rec_low

      We've seen a few of these errors in our systems.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              Bernardo Perez Bernardo Perez
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: