[MDEV-30882] Crash on ROLLBACK of DELETE or UPDATE in a ROW_FORMAT=COMPRESSED table Created: 2023-03-20  Updated: 2023-11-30  Resolved: 2023-03-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 11.1.1, 10.11.3, 11.0.2, 10.4.29, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 2
Labels: corruption, crash, upstream

Attachments: Text File a-fake_delete_crash_10.6.12_code_path.txt     Text File a-fake_delete_crash_10.6.12_stack_trace.txt     Text File a-fake_update_crash_10.6.12_code_path.txt     Text File a-fake_update_crash_10.6.12_stack_trace.txt     Text File b-org_delete_crash_10.6.12_code_path.txt     Text File b-org_delete_crash_10.6.12_stack_trace.txt     Text File b-org_update_crash_10.6.12_code_path.txt     Text File b-org_update_crash_10.6.12_stack_trace.txt     Text File c-other_delete_crash_10.6.12_code_path.txt     Text File c-other_delete_crash_10.6.12_stack_trace.txt     Text File c-other_update_crash_10.6.12_code_path.txt     Text File c-other_update_crash_10.6.12_stack_trace.txt     Text File c-other_update_crash_10.6.14_code_path.txt     Text File c-other_update_crash_10.6.14_stack_trace.txt    
Issue Links:
Relates
relates to MDEV-32174 ROW_FORMAT=COMPRESSED table corruptio... Confirmed

 Description   

jeanfrancois.gagne provided a copy of a page of a ROW_FORMAT=COMPRESSED page on which an attempt to execute the following:

BEGIN;
DELETE FROM t WHERE pk=123;
ROLLBACK;

would lead to the following crash:

10.6 32a53a66df0369a446db1e41f5123afe62e793fb

2023-03-20 11:09:00 0x7f427c2026c0  InnoDB: Assertion failure in file /mariadb/10.6/storage/innobase/row/row0umod.cc line 130
InnoDB: Failing assertion: !dummy_big_rec
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
#7  0x0000558b2197f2e3 in ut_dbg_assertion_failed (expr=expr@entry=0x558b20cfd8b8 "!dummy_big_rec", file=file@entry=0x558b20d6ea78 "/mariadb/10.6/storage/innobase/row/row0umod.cc", line=line@entry=130) at /mariadb/10.6/storage/innobase/ut/ut0dbg.cc:60
#8  0x0000558b2194a7fb in row_undo_mod_clust_low (node=node@entry=0x7f422002d698, offsets=offsets@entry=0x7f427c200178, offsets_heap=offsets_heap@entry=0x7f427c200170, heap=0x7f4220031b10, sys=sys@entry=0x7f427c2001d3 "|B\177", thr=thr@entry=0x7f422004faf0, mtr=<optimized out>, mode=<optimized out>) at /mariadb/10.6/storage/innobase/row/row0umod.cc:130

The reason for this crash is a misguided check in btr_cur_update_in_place() that unnecessarily causes btr_cur_pessimistic_update() to be invoked during the ROLLBACK operation.

Back in 2005, I specifically designed the ROW_FORMAT=COMPRESSED format in such a way that a delete or the rollback of a delete would always succeed, as would purging the history of a delete-marked record. The "deleted" as well as the "freed" flags are stored in a bit in the dense page directory at the end of the compressed page. This is the reason why the uncompressed page size is limited to 16384 bytes when using ROW_FORMAT=COMPRESSED. The clustered index fields DB_TRX_ID, DB_ROLL_PTR will be stored in uncompressed format right before the page directory. Thus, both a DELETE and ROLLBACK can be executed without touching any compressed data.



 Comments   
Comment by Marko Mäkelä [ 2023-03-22 ]

The table that I created based on the page dump that was shared by jeanfrancois.gagne also caused a crash on the following type of transaction (using the same primary key value as the DELETE):

BEGIN;
UPDATE t SET previously_null_column=0 WHERE pk=123;
ROLLBACK;

The problem was that the ROW_FORMAT=COMPRESSED page would run out of space and we would try to move one long column to off-page storage. That is not foreseen during a ROLLBACK. I worked around it by ignoring the error and hoping that a subsequent recompression of the page will succeed. That cannot be guaranteed in a general case.

Comment by Jean-François Gagné [ 2023-06-07 ]

I tested this patch on a few places where I know a ROLLBACK of a DELETE or UPDATE were crashing (in addition to the one I previously shared with Marko). All DELETE occurrences are now fixed / not crashing. Some of the UPDATE are fixed, but some others are still crashing.

So I am afraid this bug is not fully fixed.

I am in the process of collecting data on these crashes.

Comment by Jean-François Gagné [ 2023-06-09 ]

I have just attached the following files to the issue:

  • a-fake_delete_crash_10.6.12_code_path.txt
  • a-fake_delete_crash_10.6.12_stack_trace.txt
  • a-fake_update_crash_10.6.12_code_path.txt
  • a-fake_update_crash_10.6.12_stack_trace.txt
  • b-org_delete_crash_10.6.12_code_path.txt
  • b-org_delete_crash_10.6.12_stack_trace.txt
  • b-org_update_crash_10.6.12_code_path.txt
  • b-org_update_crash_10.6.12_stack_trace.txt
  • c-other_delete_crash_10.6.12_code_path.txt
  • c-other_delete_crash_10.6.12_stack_trace.txt
  • c-other_update_crash_10.6.12_code_path.txt
  • c-other_update_crash_10.6.12_stack_trace.txt
  • c-other_update_crash_10.6.14_code_path.txt
  • c-other_update_crash_10.6.14_stack_trace.txt

These are the stack trace and gdb code-path of the crash, for the delete and update, and for the fake, org and other table. org is the source of the fake file that was created by Marko, and other is another table on which 10.6.14 still crashes on the update. I also sent privately details of this other table to Marko for investigations.

Comment by Marko Mäkelä [ 2023-11-30 ]

In MDEV-32174 you can find evidence that ROW_FORMAT=COMPRESSED tables get corrupted due to ROLLBACK. It could be related to what I wrote the commit message:

If the BTR_KEEP_POS_FLAG is not set (we are in a ROLLBACK and cannot write any BLOBs), ignore the potential overflow and let page_zip_reorganize() or page_zip_compress() handle it.

Generated at Thu Feb 08 10:19:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.