[MDEV-23190] Assertion `id.page_no() < space.size' during recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Fix Version/s: 10.1.46, 10.2.33, 10.3.24, 10.4.14, 10.5.5
Component/s: Storage Engine - InnoDB
Labels:
- rr-profile-analyzed

Description

This issue was happened while testing bb-10.5-thiru for ~~MDEV-8139~~.
Back trace is

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

#1  0x000055705e6778b1 in __GI_abort () at abort.c:79

#2  0x000055705e66742a in __assert_fail_base (fmt=0x55705e7eea38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55705cd0811a "id.page_no() < space.size",

    file=file@entry=0x55705cd07918 "/home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc", line=line@entry=1254,

    function=function@entry=0x55705cd099a0 <buf_flush_check_neighbors(fil_space_t const&, page_id_t&, IORequest::flush_t)::__PRETTY_FUNCTION__> "page_id_t buf_flush_check_neighbors(const fil_space_t&, page_id_t&, IORequest::flush_t)") at assert.c:92

#3  0x000055705e6674a2 in __GI___assert_fail (assertion=0x55705cd0811a "id.page_no() < space.size", file=0x55705cd07918 "/home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc", line=1254,

    function=0x55705cd099a0 <buf_flush_check_neighbors(fil_space_t const&, page_id_t&, IORequest::flush_t)::__PRETTY_FUNCTION__> "page_id_t buf_flush_check_neighbors(const fil_space_t&, page_id_t&, IORequest::flush_t)") at assert.c:101

#4  0x000055705c56af9c in buf_flush_check_neighbors (space=..., id=..., flush=IORequest::FLUSH_LIST) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1254

#5  0x000055705c56b93b in buf_flush_try_neighbors (page_id=..., flush=IORequest::FLUSH_LIST, n_flushed=164, n_to_flush=200) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1392

#6  0x000055705c56c7dd in buf_do_flush_list_batch (min_n=200, lsn_limit=18446744073709551615) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1640

#7  0x000055705c56cd03 in buf_flush_batch (lru=false, min_n=200, lsn_limit=18446744073709551615, n=0x2fef7a41bc40) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1697

#8  0x000055705c56d0ec in buf_flush_do_batch (lru=false, min_n=200, lsn_limit=18446744073709551615, n=0x2fef7a41bc40) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1800

#9  0x000055705c56d322 in buf_flush_lists (min_n=200, lsn_limit=18446744073709551615, n_processed=0x2fef7a41bc98) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1858

#10 0x000055705c570640 in buf_flush_page_cleaner () at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:2746

#11 0x000000006fde86db in start_thread (arg=0x2fef7a41c700) at pthread_create.c:463

#12 0x000055705e758a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Error log contains the following:

[rr 99536 5963]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_BEING_DELETED not found.

[rr 99536 5967]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_BEING_DELETED_CACHE not found.

[rr 99536 5971]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_CONFIG not found.

[rr 99536 5975]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_DELETED not found.

[rr 99536 5979]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_DELETED_CACHE not found.

[rr 99536 5983]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_00000000000001df_INDEX_1 not found.

[rr 99536 5987]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_00000000000001df_INDEX_2 not found.

Problem:
=======

During recovery, InnoDB encounters INIT_PAGE for page 1. So it does buf_page_create() and add the page 1 to flush list. It never encounters INIT_PAGE for page0 and FSP_SIZE redo log also. So InnoDB never sets the size for the tablespace. While flushing, InnoDB fails in assert `id.page_no() < space.size'

Root cause is that InnoDB does mtr_commit separately for page 1 before doing commit for page 0 and FSP_SIZE

In the reported case, fsp_header_init() is being called from dict_build_table_def_step()

                mtr.start();

                mtr.set_named_space(table->space);

                fsp_header_init(table->space, FIL_IBD_FILE_INITIAL_SIZE, &mtr);

                mtr.commit();

Inside fsp_header_init(), InnoDB does fsp_fill_free_list() to add the new extent to free list.

In fsp_fill_free_list(), the following code does have seperate mtr to commit ibuf bitmap page.

                        /* Initialize the ibuf bitmap page in a separate

                        mini-transaction because it is low in the latching

                        order, and we must be able to release its latch.

                        Note: Insert-Buffering is disabled for tables that

                        reside in the temp-tablespace. */

                        if (space->purpose != FIL_TYPE_TEMPORARY) {

                                mtr_t   ibuf_mtr;

                                mtr_start(&ibuf_mtr);

                                ibuf_mtr.set_named_space(space);

                                /* Avoid logging while truncate table

                                fix-up is active. */

                                if (srv_is_tablespace_truncated(space->id)) {

                                        mtr_set_log_mode(

                                                &ibuf_mtr, MTR_LOG_NO_REDO);

                                const page_id_t page_id(

                                        space->id,

                                        i + FSP_IBUF_BITMAP_OFFSET);

                                block = buf_page_create(

                                        page_id, page_size, &ibuf_mtr);

                                buf_page_get(

                                        page_id, page_size, RW_SX_LATCH,

                                        &ibuf_mtr);

                                buf_block_dbg_add_level(block, SYNC_FSP_PAGE);

                                fsp_init_file_page(space, block, &ibuf_mtr);

                                ibuf_bitmap_page_init(block, &ibuf_mtr);

                                mtr_commit(&ibuf_mtr);

So InnoDB should commit everything within same mtr or handle the scenario during recovery.

Attachments

Issue Links

relates to

MDEV-11556 InnoDB redo log apply fails to adjust data file sizes

Closed

MDEV-23439 Assertion `size == space->size' failed in buf_read_ahead_random

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2020-07-16 08:00

It indeed seems to be the case that the use of a ‘sub-minitransaction’ ibuf_mtr in fsp_fill_free_list() is ‘overtaking’ the write of FSP_SIZE.

I believe that the reason for using a separate ibuf_mtr in fsp_fill_free_list() is to avoid trouble with any caller that could access further pages while the change buffer bitmap page latch is being held.

I see two alternative fixes:

Before invoking fsp_fill_free_list() in fsp_header_init(), commit and restart the mini-transaction. To prevent page eviction, buf_block_buf_fix_inc() could be invoked before the mtr_t::commit().
Remove the ibuf_mtr and use the passed mtr instead. Ensure that no caller is accessing any further pages before invoking mtr_t::commit().

The transitive closure of all callers of fsp_fill_free_list() in that compilation unit is as follows:

fsp_alloc_free_extent()
fsp_alloc_free_page()
fseg_fill_free_list()
fseg_alloc_free_extent()
fseg_alloc_free_page_low()
fsp_alloc_seg_inode_page()
fsp_alloc_seg_inode()
fseg_create_general()
fseg_create()
fseg_alloc_free_page_general()
fsp_header_init()

None of the functions are invoking mtr_t::commit(). The last 4 functions are global, and hence we must conduct a transitive closure of them in all files until the mtr_t::commit() call. Here is the closure of fseg_create_general(), fseg_create(), fseg_alloc_free_page_general() but not fsp_header_init():

fseg_create_general(): trx_undo_create(), trx_undo_assign_undo() looks OK
fseg_create(): btr_create() is allocating a further page in the same mini-transaction. Not OK!
fseg_create(): buf_dblwr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap.
fseg_create(): dict_hdr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap.
fseg_create(): trx_rseg_header_create() is looking up the TRX_SYS page after allocating a page. Might be problematic, although it only occurs during bootstrap.
fseg_create(): trx_sysf_create() might be OK, because it only happens during bootstrap.
fseg_alloc_free_page_general(): btr_page_alloc_low(), btr_page_alloc() get the mtr from their callers
fseg_alloc_free_page_general(): trx_undo_add_page() gets the mtr from its callers, and its callers trx_undo_report_row_operation() and trx_undo_report_rename() do not seem to be accessing any further pages before invoking mtr_t::commit().
btr_page_alloc(): btr_root_raise_and_insert() can allocate further pages in btr_page_split_and_insert(): Not OK!
btr_page_alloc(): There are also other callers, which I did not investigate.

If we let the fsp_alloc_free_extent() call in fseg_alloc_free_extent() continue to use a sub-minitransaction, the transitive closure of affected callers would be limited to fsp_header_init(), which is listed below:

fsp_header_init(): Most callers are immediately invoking mtr_t::commit().
fsp_header_init(): srv_undo_tablespaces_init() is using a single mini-transaction for creating all undo tablespaces. Should be OK, because it happens at bootstrap.
fsp_header_init(): (10.2 and 10.3 only) srv_undo_tablespaces_init() is initializing the undo tablespaces in the recovery of undo tablespace truncation. Probably OK, because it happens at startup.
fsp_header_init(): innobase_start_or_create_for_mysql() is invoking btr_create() on the ibuf->index in the same mini-transaction. Probably OK, because it happens at bootstrap.

Marko Mäkelä added a comment - 2020-07-16 08:00 It indeed seems to be the case that the use of a ‘sub-minitransaction’ ibuf_mtr in fsp_fill_free_list() is ‘overtaking’ the write of FSP_SIZE . I believe that the reason for using a separate ibuf_mtr in fsp_fill_free_list() is to avoid trouble with any caller that could access further pages while the change buffer bitmap page latch is being held. I see two alternative fixes: Before invoking fsp_fill_free_list() in fsp_header_init() , commit and restart the mini-transaction. To prevent page eviction, buf_block_buf_fix_inc() could be invoked before the mtr_t::commit() . Remove the ibuf_mtr and use the passed mtr instead. Ensure that no caller is accessing any further pages before invoking mtr_t::commit() . The transitive closure of all callers of fsp_fill_free_list() in that compilation unit is as follows: fsp_alloc_free_extent() fsp_alloc_free_page() fseg_fill_free_list() fseg_alloc_free_extent() fseg_alloc_free_page_low() fsp_alloc_seg_inode_page() fsp_alloc_seg_inode() fseg_create_general() fseg_create() fseg_alloc_free_page_general() fsp_header_init() None of the functions are invoking mtr_t::commit() . The last 4 functions are global, and hence we must conduct a transitive closure of them in all files until the mtr_t::commit() call. Here is the closure of fseg_create_general() , fseg_create() , fseg_alloc_free_page_general() but not fsp_header_init() : fseg_create_general() : trx_undo_create() , trx_undo_assign_undo() looks OK fseg_create() : btr_create() is allocating a further page in the same mini-transaction. Not OK! fseg_create() : buf_dblwr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap. fseg_create() : dict_hdr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap. fseg_create() : trx_rseg_header_create() is looking up the TRX_SYS page after allocating a page. Might be problematic, although it only occurs during bootstrap. fseg_create() : trx_sysf_create() might be OK, because it only happens during bootstrap. fseg_alloc_free_page_general() : btr_page_alloc_low() , btr_page_alloc() get the mtr from their callers fseg_alloc_free_page_general() : trx_undo_add_page() gets the mtr from its callers, and its callers trx_undo_report_row_operation() and trx_undo_report_rename() do not seem to be accessing any further pages before invoking mtr_t::commit() . btr_page_alloc() : btr_root_raise_and_insert() can allocate further pages in btr_page_split_and_insert() : Not OK! btr_page_alloc() : There are also other callers, which I did not investigate. If we let the fsp_alloc_free_extent() call in fseg_alloc_free_extent() continue to use a sub-minitransaction, the transitive closure of affected callers would be limited to fsp_header_init() , which is listed below: fsp_header_init() : Most callers are immediately invoking mtr_t::commit() . fsp_header_init() : srv_undo_tablespaces_init() is using a single mini-transaction for creating all undo tablespaces. Should be OK, because it happens at bootstrap. fsp_header_init() : (10.2 and 10.3 only) srv_undo_tablespaces_init() is initializing the undo tablespaces in the recovery of undo tablespace truncation. Probably OK, because it happens at startup. fsp_header_init() : innobase_start_or_create_for_mysql() is invoking btr_create() on the ibuf->index in the same mini-transaction. Probably OK, because it happens at bootstrap.

Marko Mäkelä added a comment - 2020-07-16 09:02

We should also check the transitive closure of callers of the following functions, which are modifying FSP_SIZE. No redo log for referring to newly added pages must be written ahead of the FSP_SIZE modification:

fsp_header_inc_size(): OK; only called by innobase_start_or_create_for_mysql(), which immediately calls mtr_t::commit()
fsp_try_extend_data_file_with_pages()
fsp_try_extend_data_file(): fsp_fill_free_list() is using the problematic sub-minitransaction as noted above. And this can occur also as part of fseg_alloc_free_extent() when a new change buffer bitmap page is created.

I do not yet see how a deadlock involving the change buffer could be possible, as implied by the comment:

			/* Initialize the ibuf bitmap page in a separate

			mini-transaction because it is low in the latching

			order, and we must be able to release its latch

			before returning from the fsp routine */

Any page allocation operation should be protected by an exclusive fil_space_t::latch. Other threads should not be able to access any of the newly allocated pages (including the change buffer bitmap page) while our mini-transaction is holding the tablespace latch. The only exception that I can think of would be read-ahead. In both buf_read_ahead_random() and buf_read_ahead_linear() we are reading fil_space_t::size without holding fil_space_t::latch, which should be protecting it. If we introduce the proper protection to fil_space_t::size, we should be able to eliminate the sub-minitransaction. If the tablespace is being extended while a read-ahead operation is executing, it would either block until the extension has been completed, or it would limit the read-ahead to pages that were not extended yet.

Because introducing extra synchronization is expensive, we might rather replace the problematic use of fil_space_t::size with fil_space_t::free_limit. It should not make sense to read-ahead any unallocated pages in any case.

Marko Mäkelä added a comment - 2020-07-16 09:02 We should also check the transitive closure of callers of the following functions, which are modifying FSP_SIZE . No redo log for referring to newly added pages must be written ahead of the FSP_SIZE modification: fsp_header_inc_size() : OK; only called by innobase_start_or_create_for_mysql() , which immediately calls mtr_t::commit() fsp_try_extend_data_file_with_pages() fsp_try_extend_data_file() : fsp_fill_free_list() is using the problematic sub-minitransaction as noted above. And this can occur also as part of fseg_alloc_free_extent() when a new change buffer bitmap page is created. I do not yet see how a deadlock involving the change buffer could be possible, as implied by the comment: /* Initialize the ibuf bitmap page in a separate mini-transaction because it is low in the latching order, and we must be able to release its latch before returning from the fsp routine */ Any page allocation operation should be protected by an exclusive fil_space_t::latch . Other threads should not be able to access any of the newly allocated pages (including the change buffer bitmap page) while our mini-transaction is holding the tablespace latch. The only exception that I can think of would be read-ahead. In both buf_read_ahead_random() and buf_read_ahead_linear() we are reading fil_space_t::size without holding fil_space_t::latch , which should be protecting it. If we introduce the proper protection to fil_space_t::size , we should be able to eliminate the sub-minitransaction. If the tablespace is being extended while a read-ahead operation is executing, it would either block until the extension has been completed, or it would limit the read-ahead to pages that were not extended yet. Because introducing extra synchronization is expensive, we might rather replace the problematic use of fil_space_t::size with fil_space_t::free_limit . It should not make sense to read-ahead any unallocated pages in any case.

Marko Mäkelä added a comment - 2020-07-16 15:59

I prototyped 3 different solutions on 10.1 so far:

Cache FSP_FREE_LIMIT in fil_space_t::free_limit, and protect access to it with fil_space_t::latch when determining the bounds for read-ahead and flushing. Problem: FSP_FREE_LIMIT can actually be larger than FSP_SIZE!
Access fil_space_t::size protected by fil_space_t::latch when determining the bounds for read-ahead and neighbour flushing. Problem: Some operations will hang, because the latch acquisition would be violating the latching order.
Copy fil_space_t::size to a new field fil_space_t::committed_size when mtr_commit() is releasing an exclusive fil_space_t::latch. There are some crashes, which I have not debugged yet.

Once I have a workable solution for preventing read-ahead or flushing from accessing a newly extended change buffer bitmap page, I can remove the ibuf_mtr and use the single mini-transaction object in fsp_fill_free_list(), to fix the ordering problem that is breaking the recovery.

Marko Mäkelä added a comment - 2020-07-16 15:59 I prototyped 3 different solutions on 10.1 so far: Cache FSP_FREE_LIMIT in fil_space_t::free_limit , and protect access to it with fil_space_t::latch when determining the bounds for read-ahead and flushing. Problem: FSP_FREE_LIMIT can actually be larger than FSP_SIZE ! Access fil_space_t::size protected by fil_space_t::latch when determining the bounds for read-ahead and neighbour flushing. Problem: Some operations will hang, because the latch acquisition would be violating the latching order. Copy fil_space_t::size to a new field fil_space_t::committed_size when mtr_commit() is releasing an exclusive fil_space_t::latch . There are some crashes, which I have not debugged yet. Once I have a workable solution for preventing read-ahead or flushing from accessing a newly extended change buffer bitmap page, I can remove the ibuf_mtr and use the single mini-transaction object in fsp_fill_free_list() , to fix the ordering problem that is breaking the recovery.

Marko Mäkelä added a comment - 2020-07-17 18:15

I pushed a fix to bb-10.1-marko and merged it up to bb-10.5-MDEV-23190, with some conflict resolution in each of 10.2, 10.3, 10.4, 10.5. mleich, please test.

Marko Mäkelä added a comment - 2020-07-17 18:15 I pushed a fix to bb-10.1-marko and merged it up to bb-10.5-MDEV-23190 , with some conflict resolution in each of 10.2, 10.3, 10.4, 10.5. mleich , please test.

Marko Mäkelä added a comment - 2020-07-20 13:02

I pushed this to 10.1 and merged up to 10.5. Between each major version, some additional conflict resolution was needed.

Marko Mäkelä added a comment - 2020-07-20 13:02 I pushed this to 10.1 and merged up to 10.5. Between each major version, some additional conflict resolution was needed.

MariaDB Server

Assertion `id.page_no() < space.size' during recovery

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration