Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30882

Crash on ROLLBACK of DELETE or UPDATE in a ROW_FORMAT=COMPRESSED table

Details

    Description

      jeanfrancois.gagne provided a copy of a page of a ROW_FORMAT=COMPRESSED page on which an attempt to execute the following:

      BEGIN;
      DELETE FROM t WHERE pk=123;
      ROLLBACK;
      

      would lead to the following crash:

      10.6 32a53a66df0369a446db1e41f5123afe62e793fb

      2023-03-20 11:09:00 0x7f427c2026c0  InnoDB: Assertion failure in file /mariadb/10.6/storage/innobase/row/row0umod.cc line 130
      InnoDB: Failing assertion: !dummy_big_rec
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      #7  0x0000558b2197f2e3 in ut_dbg_assertion_failed (expr=expr@entry=0x558b20cfd8b8 "!dummy_big_rec", file=file@entry=0x558b20d6ea78 "/mariadb/10.6/storage/innobase/row/row0umod.cc", line=line@entry=130) at /mariadb/10.6/storage/innobase/ut/ut0dbg.cc:60
      #8  0x0000558b2194a7fb in row_undo_mod_clust_low (node=node@entry=0x7f422002d698, offsets=offsets@entry=0x7f427c200178, offsets_heap=offsets_heap@entry=0x7f427c200170, heap=0x7f4220031b10, sys=sys@entry=0x7f427c2001d3 "|B\177", thr=thr@entry=0x7f422004faf0, mtr=<optimized out>, mode=<optimized out>) at /mariadb/10.6/storage/innobase/row/row0umod.cc:130
      

      The reason for this crash is a misguided check in btr_cur_update_in_place() that unnecessarily causes btr_cur_pessimistic_update() to be invoked during the ROLLBACK operation.

      Back in 2005, I specifically designed the ROW_FORMAT=COMPRESSED format in such a way that a delete or the rollback of a delete would always succeed, as would purging the history of a delete-marked record. The "deleted" as well as the "freed" flags are stored in a bit in the dense page directory at the end of the compressed page. This is the reason why the uncompressed page size is limited to 16384 bytes when using ROW_FORMAT=COMPRESSED. The clustered index fields DB_TRX_ID, DB_ROLL_PTR will be stored in uncompressed format right before the page directory. Thus, both a DELETE and ROLLBACK can be executed without touching any compressed data.

      Attachments

        1. a-fake_delete_crash_10.6.12_code_path.txt
          14 kB
          Jean-François Gagné
        2. a-fake_delete_crash_10.6.12_stack_trace.txt
          6 kB
          Jean-François Gagné
        3. a-fake_update_crash_10.6.12_code_path.txt
          9 kB
          Jean-François Gagné
        4. a-fake_update_crash_10.6.12_stack_trace.txt
          6 kB
          Jean-François Gagné
        5. b-org_delete_crash_10.6.12_code_path.txt
          35 kB
          Jean-François Gagné
        6. b-org_delete_crash_10.6.12_stack_trace.txt
          6 kB
          Jean-François Gagné
        7. b-org_update_crash_10.6.12_code_path.txt
          10 kB
          Jean-François Gagné
        8. b-org_update_crash_10.6.12_stack_trace.txt
          6 kB
          Jean-François Gagné
        9. c-other_delete_crash_10.6.12_code_path.txt
          25 kB
          Jean-François Gagné
        10. c-other_delete_crash_10.6.12_stack_trace.txt
          6 kB
          Jean-François Gagné
        11. c-other_update_crash_10.6.12_code_path.txt
          17 kB
          Jean-François Gagné
        12. c-other_update_crash_10.6.12_stack_trace.txt
          6 kB
          Jean-François Gagné
        13. c-other_update_crash_10.6.14_code_path.txt
          39 kB
          Jean-François Gagné
        14. c-other_update_crash_10.6.14_stack_trace.txt
          6 kB
          Jean-François Gagné

        Issue Links

          Activity

            The table that I created based on the page dump that was shared by jeanfrancois.gagne also caused a crash on the following type of transaction (using the same primary key value as the DELETE):

            BEGIN;
            UPDATE t SET previously_null_column=0 WHERE pk=123;
            ROLLBACK;
            

            The problem was that the ROW_FORMAT=COMPRESSED page would run out of space and we would try to move one long column to off-page storage. That is not foreseen during a ROLLBACK. I worked around it by ignoring the error and hoping that a subsequent recompression of the page will succeed. That cannot be guaranteed in a general case.

            marko Marko Mäkelä added a comment - The table that I created based on the page dump that was shared by jeanfrancois.gagne also caused a crash on the following type of transaction (using the same primary key value as the DELETE ): BEGIN ; UPDATE t SET previously_null_column=0 WHERE pk=123; ROLLBACK ; The problem was that the ROW_FORMAT=COMPRESSED page would run out of space and we would try to move one long column to off-page storage. That is not foreseen during a ROLLBACK . I worked around it by ignoring the error and hoping that a subsequent recompression of the page will succeed. That cannot be guaranteed in a general case.

            I tested this patch on a few places where I know a ROLLBACK of a DELETE or UPDATE were crashing (in addition to the one I previously shared with Marko). All DELETE occurrences are now fixed / not crashing. Some of the UPDATE are fixed, but some others are still crashing.

            So I am afraid this bug is not fully fixed.

            I am in the process of collecting data on these crashes.

            jeanfrancois.gagne Jean-François Gagné added a comment - I tested this patch on a few places where I know a ROLLBACK of a DELETE or UPDATE were crashing (in addition to the one I previously shared with Marko). All DELETE occurrences are now fixed / not crashing. Some of the UPDATE are fixed, but some others are still crashing. So I am afraid this bug is not fully fixed. I am in the process of collecting data on these crashes.

            I have just attached the following files to the issue:

            • a-fake_delete_crash_10.6.12_code_path.txt
            • a-fake_delete_crash_10.6.12_stack_trace.txt
            • a-fake_update_crash_10.6.12_code_path.txt
            • a-fake_update_crash_10.6.12_stack_trace.txt
            • b-org_delete_crash_10.6.12_code_path.txt
            • b-org_delete_crash_10.6.12_stack_trace.txt
            • b-org_update_crash_10.6.12_code_path.txt
            • b-org_update_crash_10.6.12_stack_trace.txt
            • c-other_delete_crash_10.6.12_code_path.txt
            • c-other_delete_crash_10.6.12_stack_trace.txt
            • c-other_update_crash_10.6.12_code_path.txt
            • c-other_update_crash_10.6.12_stack_trace.txt
            • c-other_update_crash_10.6.14_code_path.txt
            • c-other_update_crash_10.6.14_stack_trace.txt

            These are the stack trace and gdb code-path of the crash, for the delete and update, and for the fake, org and other table. org is the source of the fake file that was created by Marko, and other is another table on which 10.6.14 still crashes on the update. I also sent privately details of this other table to Marko for investigations.

            jeanfrancois.gagne Jean-François Gagné added a comment - I have just attached the following files to the issue: a-fake_delete_crash_10.6.12_code_path.txt a-fake_delete_crash_10.6.12_stack_trace.txt a-fake_update_crash_10.6.12_code_path.txt a-fake_update_crash_10.6.12_stack_trace.txt b-org_delete_crash_10.6.12_code_path.txt b-org_delete_crash_10.6.12_stack_trace.txt b-org_update_crash_10.6.12_code_path.txt b-org_update_crash_10.6.12_stack_trace.txt c-other_delete_crash_10.6.12_code_path.txt c-other_delete_crash_10.6.12_stack_trace.txt c-other_update_crash_10.6.12_code_path.txt c-other_update_crash_10.6.12_stack_trace.txt c-other_update_crash_10.6.14_code_path.txt c-other_update_crash_10.6.14_stack_trace.txt These are the stack trace and gdb code-path of the crash, for the delete and update, and for the fake, org and other table. org is the source of the fake file that was created by Marko, and other is another table on which 10.6.14 still crashes on the update. I also sent privately details of this other table to Marko for investigations.

            In MDEV-32174 you can find evidence that ROW_FORMAT=COMPRESSED tables get corrupted due to ROLLBACK. It could be related to what I wrote the commit message:

            If the BTR_KEEP_POS_FLAG is not set (we are in a ROLLBACK and cannot write any BLOBs), ignore the potential overflow and let page_zip_reorganize() or page_zip_compress() handle it.

            marko Marko Mäkelä added a comment - In MDEV-32174 you can find evidence that ROW_FORMAT=COMPRESSED tables get corrupted due to ROLLBACK . It could be related to what I wrote the commit message : If the BTR_KEEP_POS_FLAG is not set (we are in a ROLLBACK and cannot write any BLOBs), ignore the potential overflow and let page_zip_reorganize() or page_zip_compress() handle it.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.