[MDEV-21899] INSERT into a secondary index with zero-data-length key is not crash-safe - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5.2
Fix Version/s: 10.5.3
Component/s: mariabackup, Storage Engine - InnoDB
Labels:
None

Description

Problem found during RQG testing invoking the RQG reporter "Mariabackup".

Scenario:

1. More or less heavy DDL/DML load

     During that some call   mariabackup ... --backup ... is made.

2. mariabackup ... --prepare  ...  is called first time.

2020-03-06 19:20:38 0 [Note] InnoDB: Starting final batch to recover 491 pages from redo log.

2020-03-06 19:20:39 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_DYNAMIC due to corruption on [page id: space=167, page number=36]

2020-03-06 19:20:39 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.

2020-03-06 19:20:39 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1580] with error Data structure corruption

[00] FATAL ERROR: 2020-03-06 19:20:40 mariabackup: innodb_init() returned 39 (Data structure corruption).

origin/HEAD, origin/10.5 23685378bafa38a74957e8f07e2d56b57e90fa53 2020-03-06T11:06:59+02:00

Some second variant for the same failure is

1. Only on session and that runs

    REPLACE INTO <random table which was created earlier> ( _field_no_pk ) VALUES ( <random _mediumint  value) ;

2. At some point of time the server gets killed (SIGKILL)

3. Attempt to restart the server

2020-03-23 14:12:49 0 [Note] InnoDB: Completed initialization of buffer pool

2020-03-23 14:12:49 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().

2020-03-23 14:12:49 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=335923252

2020-03-23 14:12:49 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 2 row operations to undo

2020-03-23 14:12:49 0 [Note] InnoDB: Trx id counter is 21150

2020-03-23 14:12:49 0 [Note] InnoDB: Starting final batch to recover 272 pages from redo log.

2020-03-23 14:12:49 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_DYNAMIC due to corruption on [page id: space=5, page number=22]

2020-03-23 14:12:49 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.

2020-03-23 14:12:49 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1574] with error Data structure corruption

origin/10.5 02fe99750536aad7c9e75513be47c91c8c568419 2020-03-23T10:45:08+01:00

The bug affects builds with and without debug.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-21899.tgz
2020-03-23 15:30
869 kB
Matthias Leich
rqg.log
2020-03-09 17:52
381 kB
Matthias Leich

Issue Links

is caused by

MDEV-21724 Optimize page_cur_insert_rec_low() redo logging

Closed

relates to

MDEV-12353 Efficient InnoDB redo log record format

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2020-03-09 17:49

In the case that I analyzed, I observed 3 or 4 mini-transactions for the page:

Create the B-tree page (root page of a secondary index, page 36 in the tablespace).
Insert one record to the B-tree page, with the key 0x80000000 (looks like the PRIMARY KEY was an empty string).
Update the PAGE_MAX_TRX_ID and delete-mark the record.
Delete a non-existing record with the non-existing predecessor at byte offset 0x8a.

We could detect the inconsistency earlier, by checking that the predecessor record is not beyond PAGE_HEAP_TOP. We could copy some further consistency checks from the INSERT function.

Until we have a reasonably repeatable test case, it is not known what is causing this. It would seem that some log is missing. A bug in Mariabackup?

Marko Mäkelä added a comment - 2020-03-09 17:49 In the case that I analyzed, I observed 3 or 4 mini-transactions for the page: Create the B-tree page (root page of a secondary index, page 36 in the tablespace). Insert one record to the B-tree page, with the key 0x80000000 (looks like the PRIMARY KEY was an empty string). Update the PAGE_MAX_TRX_ID and delete-mark the record. Delete a non-existing record with the non-existing predecessor at byte offset 0x8a. We could detect the inconsistency earlier, by checking that the predecessor record is not beyond PAGE_HEAP_TOP . We could copy some further consistency checks from the INSERT function. Until we have a reasonably repeatable test case, it is not known what is causing this. It would seem that some log is missing. A bug in Mariabackup?

Marko Mäkelä added a comment - 2020-03-26 18:12

I think that I finally found the cause. It was rather challenging, so I am providing details for future reference, to aid in similar debugging. The patch is simple, and the cause is ~~MDEV-21724~~:

diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc

index bce7231059a..ec79450995a 100644

--- a/storage/innobase/page/page0cur.cc

+++ b/storage/innobase/page/page0cur.cc

@@ -1551,7 +1551,6 @@ page_cur_insert_rec_low(

   /* Insert the record, possibly copying from the preceding record. */

   ut_ad(mtr->get_log_mode() == MTR_LOG_ALL);

-  if (data_size)

     const byte *r= rec;

     const byte *c= cur->rec;

In a provided dataset, there were at least 2 pages for which the record failed to apply: pages 5:9 and 5:12. I concentrated on page 5:9. For that one, I instrumented the crash recovery with gdb as follows:

break ib::error::~error

break page_apply_delete_dynamic

cond 2 block.page.id.m_id==0x500000009

break 'recv_recover_page(buf_block_t*, mtr_t&, std::_Rb_tree_iterator<std::pair<page_id_t const, page_recv_t> > const&, fil_space_t*, mlog_init_t::init*)'

cond 3 block.page.id.m_id==0x500000009

command 3

set scheduler-locking on

enable 4

continue

break log_phys_t::apply

disable 4

continue

…

display/x *begin()@len

display *this

command 4

continue

end

continue

There were 2 page_apply_delete_dynamic() calls to this page. The second one failed, because it was pointing to an area in the page that was filled with zero bytes. That is, the next-record loop to find the record owner in the sparse page directory would exceed n_recs and the page would correctly be regarded as corrupted.

Aimed with the knowledge on the LSN and the redo log record contents, I used the provided rr replay environment. First, I think that I let it run to the end (to the SIGKILL that was originally initiated by RQG), and did when to see how many steps there are in total. It was above 4 million. Then, I did roughly the following in the end (omitting lots of steps):

run 4000000

break row_upd_sec_index_entry

cond 1 log_sys.lsn._M_i>176623800

break mtr_buf_t::close

cond 2 log_sys.lsn._M_i>176623800

define find_page_105

  set $id=((unsigned long long)$arg0)<<32|$arg1

  set $chunk=buf_pool->chunks

  set $i=buf_pool->n_chunks

  set $found=0

  while ($i-- && !$found)

    set $j = $chunk->size

    set $b = $chunk->blocks

    while ($j--)

      if ($b->page.id.m_id==$id)

        print $b

        set $found=1

        loop_break

end

      set $b++

end

    set $chunk++

end

end

document find_page_105

Buffer pool look-up: find_page_105 space_id page_no"

end

find_page_105 5 3

watch -l *(ulonglong*)$1.frame[0x38]@8

watch -l $1.frame[0x7a]

display/x $1.frame[0x38]@8

display log_sys.lsn

The first hardware watchpoint is for PAGE_MAX_TRX_ID on the page. The page turned out to be a secondary index page.

The second watchpoint is for a byte that used to be a delete-mark flag that was set back and forth by the test. It was accidentally hit by the page_cur_insert_rec_low(). Thanks to the breakpoint on mtr_buf_t::close, which was not called for that insert, I knew that something is wrong. So, I went back in time by issuing

when

run (the reported number - a little bit)

and single-stepped through the function page_cur_insert_rec_low(). The condition that my patch above removed is redundant and outright harmful.

It is a rare corner case that a secondary index record has no data payload at all. It is possible when the PRIMARY KEY consists of variable-length string columns that are all empty, and all secondary index columns are either NULL or the empty string.

We failed to write the INSERT_REUSE_DYNAMIC redo log record for the following statement:

REPLACE INTO `table100_innodb_default` ( `col_text_latin1_unique_default_null` ) VALUES ( 5900288 );

This was on the secondary index col_char_latin1_unique, and the value is (NULL,''), as expected.

Marko Mäkelä added a comment - 2020-03-26 18:12 I think that I finally found the cause. It was rather challenging, so I am providing details for future reference, to aid in similar debugging. The patch is simple, and the cause is MDEV-21724 : diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc index bce7231059a..ec79450995a 100644 --- a/storage/innobase/page/page0cur.cc +++ b/storage/innobase/page/page0cur.cc @@ -1551,7 +1551,6 @@ page_cur_insert_rec_low( /* Insert the record, possibly copying from the preceding record. */ ut_ad(mtr->get_log_mode() == MTR_LOG_ALL); - if (data_size) { const byte *r= rec; const byte *c= cur->rec; In a provided dataset, there were at least 2 pages for which the record failed to apply: pages 5:9 and 5:12. I concentrated on page 5:9. For that one, I instrumented the crash recovery with gdb as follows: break ib::error::~error break page_apply_delete_dynamic cond 2 block.page.id.m_id==0x500000009 break 'recv_recover_page(buf_block_t*, mtr_t&, std::_Rb_tree_iterator<std::pair<page_id_t const, page_recv_t> > const&, fil_space_t*, mlog_init_t::init*)' cond 3 block.page.id.m_id==0x500000009 command 3 set scheduler-locking on enable 4 continue break log_phys_t::apply disable 4 continue … display/x *begin()@len display *this command 4 continue end continue There were 2 page_apply_delete_dynamic() calls to this page. The second one failed, because it was pointing to an area in the page that was filled with zero bytes. That is, the next-record loop to find the record owner in the sparse page directory would exceed n_recs and the page would correctly be regarded as corrupted. Aimed with the knowledge on the LSN and the redo log record contents, I used the provided rr replay environment. First, I think that I let it run to the end (to the SIGKILL that was originally initiated by RQG), and did when to see how many steps there are in total. It was above 4 million. Then, I did roughly the following in the end (omitting lots of steps): run 4000000 break row_upd_sec_index_entry cond 1 log_sys.lsn._M_i>176623800 break mtr_buf_t::close cond 2 log_sys.lsn._M_i>176623800 define find_page_105 set $id=((unsigned long long)$arg0)<<32|$arg1 set $chunk=buf_pool->chunks set $i=buf_pool->n_chunks set $found=0 while ($i-- && !$found) set $j = $chunk->size set $b = $chunk->blocks while ($j--) if ($b->page.id.m_id==$id) print $b set $found=1 loop_break end set $b++ end set $chunk++ end end document find_page_105 Buffer pool look-up: find_page_105 space_id page_no" end find_page_105 5 3 watch -l *(ulonglong*)$1.frame[0x38]@8 watch -l $1.frame[0x7a] display/x $1.frame[0x38]@8 display log_sys.lsn The first hardware watchpoint is for PAGE_MAX_TRX_ID on the page. The page turned out to be a secondary index page. The second watchpoint is for a byte that used to be a delete-mark flag that was set back and forth by the test. It was accidentally hit by the page_cur_insert_rec_low() . Thanks to the breakpoint on mtr_buf_t::close , which was not called for that insert, I knew that something is wrong. So, I went back in time by issuing when run (the reported number - a little bit) and single-stepped through the function page_cur_insert_rec_low() . The condition that my patch above removed is redundant and outright harmful. It is a rare corner case that a secondary index record has no data payload at all. It is possible when the PRIMARY KEY consists of variable-length string columns that are all empty, and all secondary index columns are either NULL or the empty string. We failed to write the INSERT_REUSE_DYNAMIC redo log record for the following statement: REPLACE INTO `table100_innodb_default` ( `col_text_latin1_unique_default_null` ) VALUES ( 5900288 ); This was on the secondary index col_char_latin1_unique , and the value is (NULL,'') , as expected.

MariaDB Server

INSERT into a secondary index with zero-data-length key is not crash-safe

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration