[MDEV-32489] Change buffer index fails to delete the records Created: 2023-10-17  Updated: 2024-01-29

Status: In Progress
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.10, 10.11, 11.1, 11.2, 11.3
Fix Version/s: 10.5, 10.6, 10.11, 11.1, 11.2, 11.3

Type: Bug Priority: Major
Reporter: Thirunarayanan Balathandayuthapani Assignee: Thirunarayanan Balathandayuthapani
Resolution: Unresolved Votes: 0
Labels: None


 Description   

While testing bb-10.6-thiru, InnoDB hangs during shutdown and it shows the
following information in error log file:

2023-10-10  6:11:53 0 [Note] Completing change buffer merge; 1 page reads initiated; 3 change buffer pages remain
2023-10-10  6:12:08 0 [Note] Completing change buffer merge; 1 page reads initiated; 3 change buffer pages remain
2023-10-10  6:12:23 0 [Note] Completing change buffer merge; 1 page reads initiated; 3 change buffer pages remain
2023-10-10  6:12:38 0 [Note] Completing change buffer merge; 1 page reads initiated; 3 change buffer pages remain

Analysis:
=========
During shutdown, InnoDB calls ibuf_merge_or_delete_for_page() for the problematic page (0, 739). But the desired bit for a given
page in the bitmap page is already set to IBUF_BITMAP_FREE. So we fail to remove the entry (0, 739) from change buffer index. So
jumped to understand ibuf_delete_recs() where the problematic page (0, 739) was deleted. During that, InnoDB change
buffer index has only root, leaf pages and there are no internal nodes.

  mtr_t mtr;
loop:
  btr_pcur_t pcur;
  pcur.btr_cur.page_cur.index= ibuf.index;
  ibuf_mtr_start(&mtr);
  if (btr_pcur_open(&tuple, PAGE_CUR_GE, BTR_MODIFY_LEAF, &pcur, &mtr))
    goto func_exit;
  if (!btr_pcur_is_on_user_rec(&pcur))
  {
    ut_ad(btr_pcur_is_after_last_on_page(&pcur));
    goto func_exit;
  }
 
  for (;;)
  {
    ut_ad(btr_pcur_is_on_user_rec(&pcur));
    const rec_t* ibuf_rec = btr_pcur_get_rec(&pcur);
    if (ibuf_rec_get_space(&mtr, ibuf_rec) != page_id.space()
        || ibuf_rec_get_page_no(&mtr, ibuf_rec) != page_id.page_no())
      break;
    /* Delete the record from ibuf */
    if (ibuf_delete_rec(page_id, &pcur, &tuple, &mtr))
    {
      /* Deletion was pessimistic and mtr was committed:
      we start from the beginning again */
      ut_ad(mtr.has_committed());
      goto loop;
    }
 
    if (btr_pcur_is_after_last_on_page(&pcur))
    {
      ibuf_mtr_commit(&mtr);
      btr_pcur_close(&pcur);
      goto loop;
    }

btr_cur_open() searches with tuple only page_id (0, 739). Root page has this following (..(0, 563)(child 60), (0, 739)(child 63)..) since the mode is
PAGE_CUR_L for non-leaf node. It leads to child page 60 and deletes the record of (0, 739). once we reached the end of the page, again we do
open the change buffer index and we end up in page 60. Fail to find the record on page 60. ibuf_delete_recs() fails to delete the entries completely.
Even page 62 which is next to child page 60 has the record (0, 739)

Since change buffer index is in 5.5+ format, primary key for the index is

{space, 0, page_no, counter}

. But we fail to use the counter field for searching the tuple.

Thanks to vlad.lesin for helping me in analysing this issue.



 Comments   
Comment by Vladislav Lesin [ 2023-10-17 ]

> Since change buffer index is in 5.5+ format, primary key for the index is

{space, 0, page_no, counter}

I am not sure about this, because when we get offsets of the record on root page, there are 7 fields. The first 4 fields is {space, 0, page_no, counter}

, the last field is a child page id. There are 2 more fields in the key. And, if so, https://github.com/MariaDB/server/commit/015ab499696382b0e3b8d70118beefafd328a779 commit just shifts the problem from "counter" field to the next to the "counter" one.

I.e. what if we have

{a, 0, b, count, c, ...}

record on leaf page N and

{a, 0, b, count, c+1, ...}

record on page N+1. And we use

{a, 0, b, count}

search tuple. We will delete all

{a, 0, b, count, ...}

records on page N, and then, when we open cursor again, there will be the same problem as we have now, i.e. the cursor will point to the supremum of page N, and we will miss records on page N+1.

Comment by Marko Mäkelä [ 2023-10-17 ]

Change buffer records may be in the pre-MySQL 5.5 format in case innodb_change_buffering=inserts is being used. We must not assume anything about the format.

I think that we should perform a forward scan and first delete-mark all records in the same mini-transaction that modifies the change buffer bitmap. Then, start another mini-transaction that will lock the entire change buffer while doing a forward scan and "pessimistic" delete of all matching change buffer records.

Generated at Thu Feb 08 10:31:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.