[MDEV-23190] Assertion `id.page_no() < space.size' during recovery Created: 2020-07-16  Updated: 2020-10-15  Resolved: 2020-07-20

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.1.46, 10.2.33, 10.3.24, 10.4.14, 10.5.5

Type: Bug Priority: Major
Reporter: Thirunarayanan Balathandayuthapani Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile-analyzed

Issue Links:
Relates
relates to MDEV-11556 InnoDB redo log apply fails to adjust... Closed
relates to MDEV-23439 Assertion `size == space->size' faile... Closed

 Description   

This issue was happened while testing bb-10.5-thiru for MDEV-8139.
Back trace is

 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x000055705e6778b1 in __GI_abort () at abort.c:79
#2  0x000055705e66742a in __assert_fail_base (fmt=0x55705e7eea38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55705cd0811a "id.page_no() < space.size",
    file=file@entry=0x55705cd07918 "/home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc", line=line@entry=1254,
    function=function@entry=0x55705cd099a0 <buf_flush_check_neighbors(fil_space_t const&, page_id_t&, IORequest::flush_t)::__PRETTY_FUNCTION__> "page_id_t buf_flush_check_neighbors(const fil_space_t&, page_id_t&, IORequest::flush_t)") at assert.c:92
#3  0x000055705e6674a2 in __GI___assert_fail (assertion=0x55705cd0811a "id.page_no() < space.size", file=0x55705cd07918 "/home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc", line=1254,
    function=0x55705cd099a0 <buf_flush_check_neighbors(fil_space_t const&, page_id_t&, IORequest::flush_t)::__PRETTY_FUNCTION__> "page_id_t buf_flush_check_neighbors(const fil_space_t&, page_id_t&, IORequest::flush_t)") at assert.c:101
#4  0x000055705c56af9c in buf_flush_check_neighbors (space=..., id=..., flush=IORequest::FLUSH_LIST) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1254
#5  0x000055705c56b93b in buf_flush_try_neighbors (page_id=..., flush=IORequest::FLUSH_LIST, n_flushed=164, n_to_flush=200) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1392
#6  0x000055705c56c7dd in buf_do_flush_list_batch (min_n=200, lsn_limit=18446744073709551615) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1640
#7  0x000055705c56cd03 in buf_flush_batch (lru=false, min_n=200, lsn_limit=18446744073709551615, n=0x2fef7a41bc40) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1697
#8  0x000055705c56d0ec in buf_flush_do_batch (lru=false, min_n=200, lsn_limit=18446744073709551615, n=0x2fef7a41bc40) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1800
#9  0x000055705c56d322 in buf_flush_lists (min_n=200, lsn_limit=18446744073709551615, n_processed=0x2fef7a41bc98) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1858
#10 0x000055705c570640 in buf_flush_page_cleaner () at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:2746
#11 0x000000006fde86db in start_thread (arg=0x2fef7a41c700) at pthread_create.c:463
#12 0x000055705e758a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Error log contains the following:

[rr 99536 5963]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_BEING_DELETED not found.
[rr 99536 5967]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_BEING_DELETED_CACHE not found.
[rr 99536 5971]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_CONFIG not found.
[rr 99536 5975]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_DELETED not found.
[rr 99536 5979]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_DELETED_CACHE not found.
[rr 99536 5983]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_00000000000001df_INDEX_1 not found.
[rr 99536 5987]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_00000000000001df_INDEX_2 not found.

Problem:
=======

During recovery, InnoDB encounters INIT_PAGE for page 1. So it does buf_page_create() and add the page 1 to flush list. It never encounters INIT_PAGE for page0 and FSP_SIZE redo log also. So InnoDB never sets the size for the tablespace. While flushing, InnoDB fails in assert `id.page_no() < space.size'

Root cause is that InnoDB does mtr_commit separately for page 1 before doing commit for page 0 and FSP_SIZE

In the reported case, fsp_header_init() is being called from dict_build_table_def_step()

                mtr.start();
                mtr.set_named_space(table->space);
                fsp_header_init(table->space, FIL_IBD_FILE_INITIAL_SIZE, &mtr);
                mtr.commit();

Inside fsp_header_init(), InnoDB does fsp_fill_free_list() to add the new extent to free list.

In fsp_fill_free_list(), the following code does have seperate mtr to commit ibuf bitmap page.

                        /* Initialize the ibuf bitmap page in a separate
                        mini-transaction because it is low in the latching
                        order, and we must be able to release its latch.
                        Note: Insert-Buffering is disabled for tables that
                        reside in the temp-tablespace. */
                        if (space->purpose != FIL_TYPE_TEMPORARY) {
                                mtr_t   ibuf_mtr;
 
                                mtr_start(&ibuf_mtr);
                                ibuf_mtr.set_named_space(space);
 
                                /* Avoid logging while truncate table
                                fix-up is active. */
                                if (srv_is_tablespace_truncated(space->id)) {
                                        mtr_set_log_mode(
                                                &ibuf_mtr, MTR_LOG_NO_REDO);
                                }
 
                                const page_id_t page_id(
                                        space->id,
                                        i + FSP_IBUF_BITMAP_OFFSET);
 
                                block = buf_page_create(
                                        page_id, page_size, &ibuf_mtr);
 
                                buf_page_get(
                                        page_id, page_size, RW_SX_LATCH,
                                        &ibuf_mtr);
 
                                buf_block_dbg_add_level(block, SYNC_FSP_PAGE);
 
                                fsp_init_file_page(space, block, &ibuf_mtr);
 
                                ibuf_bitmap_page_init(block, &ibuf_mtr);
 
                                mtr_commit(&ibuf_mtr);

So InnoDB should commit everything within same mtr or handle the scenario during recovery.



 Comments   
Comment by Marko Mäkelä [ 2020-07-16 ]

It indeed seems to be the case that the use of a ‘sub-minitransaction’ ibuf_mtr in fsp_fill_free_list() is ‘overtaking’ the write of FSP_SIZE.

I believe that the reason for using a separate ibuf_mtr in fsp_fill_free_list() is to avoid trouble with any caller that could access further pages while the change buffer bitmap page latch is being held.

I see two alternative fixes:

  1. Before invoking fsp_fill_free_list() in fsp_header_init(), commit and restart the mini-transaction. To prevent page eviction, buf_block_buf_fix_inc() could be invoked before the mtr_t::commit().
  2. Remove the ibuf_mtr and use the passed mtr instead. Ensure that no caller is accessing any further pages before invoking mtr_t::commit().

The transitive closure of all callers of fsp_fill_free_list() in that compilation unit is as follows:

  • fsp_alloc_free_extent()
  • fsp_alloc_free_page()
  • fseg_fill_free_list()
  • fseg_alloc_free_extent()
  • fseg_alloc_free_page_low()
  • fsp_alloc_seg_inode_page()
  • fsp_alloc_seg_inode()
  • fseg_create_general()
  • fseg_create()
  • fseg_alloc_free_page_general()
  • fsp_header_init()

None of the functions are invoking mtr_t::commit(). The last 4 functions are global, and hence we must conduct a transitive closure of them in all files until the mtr_t::commit() call. Here is the closure of fseg_create_general(), fseg_create(), fseg_alloc_free_page_general() but not fsp_header_init():

  • fseg_create_general(): trx_undo_create(), trx_undo_assign_undo() looks OK
  • fseg_create(): btr_create() is allocating a further page in the same mini-transaction. Not OK!
  • fseg_create(): buf_dblwr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap.
  • fseg_create(): dict_hdr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap.
  • fseg_create(): trx_rseg_header_create() is looking up the TRX_SYS page after allocating a page. Might be problematic, although it only occurs during bootstrap.
  • fseg_create(): trx_sysf_create() might be OK, because it only happens during bootstrap.
  • fseg_alloc_free_page_general(): btr_page_alloc_low(), btr_page_alloc() get the mtr from their callers
  • fseg_alloc_free_page_general(): trx_undo_add_page() gets the mtr from its callers, and its callers trx_undo_report_row_operation() and trx_undo_report_rename() do not seem to be accessing any further pages before invoking mtr_t::commit().
  • btr_page_alloc(): btr_root_raise_and_insert() can allocate further pages in btr_page_split_and_insert(): Not OK!
  • btr_page_alloc(): There are also other callers, which I did not investigate.

If we let the fsp_alloc_free_extent() call in fseg_alloc_free_extent() continue to use a sub-minitransaction, the transitive closure of affected callers would be limited to fsp_header_init(), which is listed below:

  • fsp_header_init(): Most callers are immediately invoking mtr_t::commit().
  • fsp_header_init(): srv_undo_tablespaces_init() is using a single mini-transaction for creating all undo tablespaces. Should be OK, because it happens at bootstrap.
  • fsp_header_init(): (10.2 and 10.3 only) srv_undo_tablespaces_init() is initializing the undo tablespaces in the recovery of undo tablespace truncation. Probably OK, because it happens at startup.
  • fsp_header_init(): innobase_start_or_create_for_mysql() is invoking btr_create() on the ibuf->index in the same mini-transaction. Probably OK, because it happens at bootstrap.
Comment by Marko Mäkelä [ 2020-07-16 ]

We should also check the transitive closure of callers of the following functions, which are modifying FSP_SIZE. No redo log for referring to newly added pages must be written ahead of the FSP_SIZE modification:

  • fsp_header_inc_size(): OK; only called by innobase_start_or_create_for_mysql(), which immediately calls mtr_t::commit()
  • fsp_try_extend_data_file_with_pages()
  • fsp_try_extend_data_file(): fsp_fill_free_list() is using the problematic sub-minitransaction as noted above. And this can occur also as part of fseg_alloc_free_extent() when a new change buffer bitmap page is created.

I do not yet see how a deadlock involving the change buffer could be possible, as implied by the comment:

			/* Initialize the ibuf bitmap page in a separate
			mini-transaction because it is low in the latching
			order, and we must be able to release its latch
			before returning from the fsp routine */

Any page allocation operation should be protected by an exclusive fil_space_t::latch. Other threads should not be able to access any of the newly allocated pages (including the change buffer bitmap page) while our mini-transaction is holding the tablespace latch. The only exception that I can think of would be read-ahead. In both buf_read_ahead_random() and buf_read_ahead_linear() we are reading fil_space_t::size without holding fil_space_t::latch, which should be protecting it. If we introduce the proper protection to fil_space_t::size, we should be able to eliminate the sub-minitransaction. If the tablespace is being extended while a read-ahead operation is executing, it would either block until the extension has been completed, or it would limit the read-ahead to pages that were not extended yet.

Because introducing extra synchronization is expensive, we might rather replace the problematic use of fil_space_t::size with fil_space_t::free_limit. It should not make sense to read-ahead any unallocated pages in any case.

Comment by Marko Mäkelä [ 2020-07-16 ]

I prototyped 3 different solutions on 10.1 so far:

  1. Cache FSP_FREE_LIMIT in fil_space_t::free_limit, and protect access to it with fil_space_t::latch when determining the bounds for read-ahead and flushing. Problem: FSP_FREE_LIMIT can actually be larger than FSP_SIZE!
  2. Access fil_space_t::size protected by fil_space_t::latch when determining the bounds for read-ahead and neighbour flushing. Problem: Some operations will hang, because the latch acquisition would be violating the latching order.
  3. Copy fil_space_t::size to a new field fil_space_t::committed_size when mtr_commit() is releasing an exclusive fil_space_t::latch. There are some crashes, which I have not debugged yet.

Once I have a workable solution for preventing read-ahead or flushing from accessing a newly extended change buffer bitmap page, I can remove the ibuf_mtr and use the single mini-transaction object in fsp_fill_free_list(), to fix the ordering problem that is breaking the recovery.

Comment by Marko Mäkelä [ 2020-07-17 ]

I pushed a fix to bb-10.1-marko and merged it up to bb-10.5-MDEV-23190, with some conflict resolution in each of 10.2, 10.3, 10.4, 10.5. mleich, please test.

Comment by Marko Mäkelä [ 2020-07-20 ]

I pushed this to 10.1 and merged up to 10.5. Between each major version, some additional conflict resolution was needed.

Generated at Thu Feb 08 09:20:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.