[MDEV-19022] InnoDB fails to clean up useless B-tree pages Created: 2019-03-22  Updated: 2021-08-16  Resolved: 2019-03-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.2.24, 10.3.14, 10.4.4

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, performance

Issue Links:
Relates
relates to MDEV-14126 Assertion `page_get_page_no(page) == ... Closed

 Description   

The test case of MDEV-14126 demonstrates that InnoDB can end up with an index tree where a non-leaf page has only one child page. Such internal nodes are totally unused and should be removed when the B-tree is being shrunk in the first place. The functions btr_discard_only_page_on_level(), btr_lift_page_up(), or btr_compress() or their callers may have to be revised.



 Comments   
Comment by Marko Mäkelä [ 2019-03-25 ]

The following patch seems to demonstrate the problem:

diff --git a/storage/innobase/include/page0page.ic b/storage/innobase/include/page0page.ic
index 224e5b08985..8cf93c0e7be 100644
--- a/storage/innobase/include/page0page.ic
+++ b/storage/innobase/include/page0page.ic
@@ -174,6 +174,9 @@ page_header_set_field(
 	ut_ad(page);
 	ut_ad(field <= PAGE_N_RECS);
 	ut_ad(field != PAGE_N_RECS || val);
+	ut_ad(field != PAGE_N_RECS || val > 1
+	      || !mach_read_from_2(PAGE_N_RECS + PAGE_HEADER + page)
+	      || page_is_leaf(page));
 	ut_ad(field == PAGE_N_HEAP || val < srv_page_size);
 	ut_ad(field != PAGE_N_HEAP || (val & 0x7fff) < srv_page_size);
 

According to the stack trace, we are performing a pessimistic delete when rolling back an insert of a row that caused a duplicate key error, while executing a long statement

INSERT /*! IGNORE */ INTO t1 VALUES  \n('2016', 'butterfly', 'domestic', NULL, 'd', LOAD_FILE('/mariadb/earth15kb.jpg'), …

10.3

#4  0x00005555561fb123 in page_header_set_field (page=0x7ffff0838000 "", 
    page_zip=0x7ffff0324750, field=16, val=1)
    at /mariadb/10.3/storage/innobase/include/page0page.ic:177
#5  0x0000555556207234 in page_zip_dir_delete (page_zip=0x7ffff0324750, 
    rec=0x7ffff083809a "\200", index=0x7fff8c0729d8, offsets=0x7fff8d6323b8, 
    free=0x0) at /mariadb/10.3/storage/innobase/page/page0zip.cc:4531
#6  0x00005555561e51da in page_mem_free (page=0x7ffff0838000 "", 
    page_zip=0x7ffff0324750, rec=0x7ffff083809a "\200", index=0x7fff8c0729d8, 
    offsets=0x7fff8d6323b8)
    at /mariadb/10.3/storage/innobase/include/page0page.ic:1048
#7  0x00005555561ec4f0 in page_cur_delete_rec (cursor=0x7ffff00502f8, 
    index=0x7fff8c0729d8, offsets=0x7fff8d6323b8, mtr=0x7ffff0050920)
    at /mariadb/10.3/storage/innobase/page/page0cur.cc:2421
#8  0x0000555556375b86 in btr_cur_pessimistic_delete (err=0x7ffff00502e4, 
    has_reserved_extents=1, cursor=0x7ffff00502f0, flags=16, rollback=false, 
    mtr=0x7ffff0050920) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:5833
#9  0x000055555634badc in btr_node_ptr_delete (index=0x7fff8c0729d8, 
    block=0x7ffff0331fe0, mtr=0x7ffff0050920)
    at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:3490
#10 0x000055555634ef72 in btr_discard_page (cursor=0x7fff8cebc3d8, 
    mtr=0x7ffff0050920) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:4429
#11 0x00005555563757fc in btr_cur_pessimistic_delete (err=0x7ffff00508b8, 
    has_reserved_extents=0, cursor=0x7fff8cebc3d8, flags=0, rollback=true, 
    mtr=0x7ffff0050920) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:5749
#12 0x00005555564b05e4 in row_undo_ins_remove_clust_rec (node=0x7fff8cebc368)
    at /mariadb/10.3/storage/innobase/row/row0uins.cc:210
#13 0x00005555564b17a8 in row_undo_ins (node=0x7fff8cebc368, 
    thr=0x7fff8c07b498) at /mariadb/10.3/storage/innobase/row/row0uins.cc:612
#14 0x00005555562bd832 in row_undo (node=0x7fff8cebc368, thr=0x7fff8c07b498)
    at /mariadb/10.3/storage/innobase/row/row0undo.cc:301
#15 0x00005555562bda94 in row_undo_step (thr=0x7fff8c07b498)
    at /mariadb/10.3/storage/innobase/row/row0undo.cc:360
#16 0x0000555556214543 in que_thr_step (thr=0x7fff8c07b498)
    at /mariadb/10.3/storage/innobase/que/que0que.cc:1040
#17 0x0000555556214797 in que_run_threads_low (thr=0x7fff8c07b498) at /mariadb/10.3/storage/innobase/que/que0que.cc:1104
#18 0x0000555556214989 in que_run_threads (thr=0x7fff8c07b498) at /mariadb/10.3/storage/innobase/que/que0que.cc:1144
#19 0x000055555631b550 in trx_rollback_to_savepoint_low (trx=0x7ffff0b4b0f0, savept=0x7ffff0051530) at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:141
#20 0x000055555631b905 in trx_rollback_to_savepoint (trx=0x7ffff0b4b0f0, savept=0x7ffff0051530) at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:189
#21 0x000055555626b2e5 in row_mysql_handle_errors (new_err=0x7ffff0051528, trx=0x7ffff0b4b0f0, thr=0x7fff8c2efdb0, savept=0x7ffff0051530) at /mariadb/10.3/storage/innobase/row/row0mysql.cc:735
#22 0x000055555626d1a1 in row_insert_for_mysql (mysql_rec=0x7fff8c047828 "\b", prebuilt=0x7fff8c2edb18, ins_mode=ROW_INS_NORMAL) at /mariadb/10.3/storage/innobase/row/row0mysql.cc:1473
#23 0x0000555556116534 in ha_innobase::write_row (this=0x7fff8c04e6f0, record=0x7fff8c047828 "\b") at /mariadb/10.3/storage/innobase/handler/ha_innodb.cc:8086

An examination of the code suggests that btr_cur_pessimistic_delete() is not going to shrink the tree, although we should:

			btr_discard_page(cursor, mtr);
 
			ret = TRUE;
			goto return_after_reservations;

Ultimately, the mini-transaction will be committed in row_undo_ins_remove_clust_rec(), leaving the index tree in a corrupted state (with an internal page that does not have multiple child pages).

Comment by Marko Mäkelä [ 2019-03-25 ]

The scenario occurs on the 3rd invocation of row_undo_ins_remove_clust_rec() in the MDEV-14126 test case. At this point, the clustered index tree consists of 3 pages:

  • page 3 (the root page) pointing to pages 69 and 70
  • page 69 containing 1 record
  • page 70 likewise containing the 1 record that is being rolled back

After the rollback, we should end up with page 3 being the only index page, containing the entire contents of the table. But, btr_discard_page() fails to invoke btr_lift_page_up() even though after it has invoked btr_node_ptr_delete() on the parent page, there would be only 1 record left in the parent page.

Generated at Thu Feb 08 08:48:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.