Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23190

Assertion `id.page_no() < space.size' during recovery

Details

    Description

      This issue was happened while testing bb-10.5-thiru for MDEV-8139.
      Back trace is

       
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
      #1  0x000055705e6778b1 in __GI_abort () at abort.c:79
      #2  0x000055705e66742a in __assert_fail_base (fmt=0x55705e7eea38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55705cd0811a "id.page_no() < space.size",
          file=file@entry=0x55705cd07918 "/home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc", line=line@entry=1254,
          function=function@entry=0x55705cd099a0 <buf_flush_check_neighbors(fil_space_t const&, page_id_t&, IORequest::flush_t)::__PRETTY_FUNCTION__> "page_id_t buf_flush_check_neighbors(const fil_space_t&, page_id_t&, IORequest::flush_t)") at assert.c:92
      #3  0x000055705e6674a2 in __GI___assert_fail (assertion=0x55705cd0811a "id.page_no() < space.size", file=0x55705cd07918 "/home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc", line=1254,
          function=0x55705cd099a0 <buf_flush_check_neighbors(fil_space_t const&, page_id_t&, IORequest::flush_t)::__PRETTY_FUNCTION__> "page_id_t buf_flush_check_neighbors(const fil_space_t&, page_id_t&, IORequest::flush_t)") at assert.c:101
      #4  0x000055705c56af9c in buf_flush_check_neighbors (space=..., id=..., flush=IORequest::FLUSH_LIST) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1254
      #5  0x000055705c56b93b in buf_flush_try_neighbors (page_id=..., flush=IORequest::FLUSH_LIST, n_flushed=164, n_to_flush=200) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1392
      #6  0x000055705c56c7dd in buf_do_flush_list_batch (min_n=200, lsn_limit=18446744073709551615) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1640
      #7  0x000055705c56cd03 in buf_flush_batch (lru=false, min_n=200, lsn_limit=18446744073709551615, n=0x2fef7a41bc40) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1697
      #8  0x000055705c56d0ec in buf_flush_do_batch (lru=false, min_n=200, lsn_limit=18446744073709551615, n=0x2fef7a41bc40) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1800
      #9  0x000055705c56d322 in buf_flush_lists (min_n=200, lsn_limit=18446744073709551615, n_processed=0x2fef7a41bc98) at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:1858
      #10 0x000055705c570640 in buf_flush_page_cleaner () at /home/mleich/bb-10.5-thiru/storage/innobase/buf/buf0flu.cc:2746
      #11 0x000000006fde86db in start_thread (arg=0x2fef7a41c700) at pthread_create.c:463
      #12 0x000055705e758a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Error log contains the following:

      [rr 99536 5963]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_BEING_DELETED not found.
      [rr 99536 5967]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_BEING_DELETED_CACHE not found.
      [rr 99536 5971]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_CONFIG not found.
      [rr 99536 5975]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_DELETED not found.
      [rr 99536 5979]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_DELETED_CACHE not found.
      [rr 99536 5983]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_00000000000001df_INDEX_1 not found.
      [rr 99536 5987]2020-07-14 12:31:07 0 [Warning] InnoDB: Parent table of FTS auxiliary table test/FTS_0000000000000159_00000000000001df_INDEX_2 not found.
      

      Problem:
      =======

      During recovery, InnoDB encounters INIT_PAGE for page 1. So it does buf_page_create() and add the page 1 to flush list. It never encounters INIT_PAGE for page0 and FSP_SIZE redo log also. So InnoDB never sets the size for the tablespace. While flushing, InnoDB fails in assert `id.page_no() < space.size'

      Root cause is that InnoDB does mtr_commit separately for page 1 before doing commit for page 0 and FSP_SIZE

      In the reported case, fsp_header_init() is being called from dict_build_table_def_step()

                      mtr.start();
                      mtr.set_named_space(table->space);
                      fsp_header_init(table->space, FIL_IBD_FILE_INITIAL_SIZE, &mtr);
                      mtr.commit();
      

      Inside fsp_header_init(), InnoDB does fsp_fill_free_list() to add the new extent to free list.

      In fsp_fill_free_list(), the following code does have seperate mtr to commit ibuf bitmap page.

                              /* Initialize the ibuf bitmap page in a separate
                              mini-transaction because it is low in the latching
                              order, and we must be able to release its latch.
                              Note: Insert-Buffering is disabled for tables that
                              reside in the temp-tablespace. */
                              if (space->purpose != FIL_TYPE_TEMPORARY) {
                                      mtr_t   ibuf_mtr;
       
                                      mtr_start(&ibuf_mtr);
                                      ibuf_mtr.set_named_space(space);
       
                                      /* Avoid logging while truncate table
                                      fix-up is active. */
                                      if (srv_is_tablespace_truncated(space->id)) {
                                              mtr_set_log_mode(
                                                      &ibuf_mtr, MTR_LOG_NO_REDO);
                                      }
       
                                      const page_id_t page_id(
                                              space->id,
                                              i + FSP_IBUF_BITMAP_OFFSET);
       
                                      block = buf_page_create(
                                              page_id, page_size, &ibuf_mtr);
       
                                      buf_page_get(
                                              page_id, page_size, RW_SX_LATCH,
                                              &ibuf_mtr);
       
                                      buf_block_dbg_add_level(block, SYNC_FSP_PAGE);
       
                                      fsp_init_file_page(space, block, &ibuf_mtr);
       
                                      ibuf_bitmap_page_init(block, &ibuf_mtr);
       
                                      mtr_commit(&ibuf_mtr);
      
      

      So InnoDB should commit everything within same mtr or handle the scenario during recovery.

      Attachments

        Issue Links

          Activity

            It indeed seems to be the case that the use of a ‘sub-minitransaction’ ibuf_mtr in fsp_fill_free_list() is ‘overtaking’ the write of FSP_SIZE.

            I believe that the reason for using a separate ibuf_mtr in fsp_fill_free_list() is to avoid trouble with any caller that could access further pages while the change buffer bitmap page latch is being held.

            I see two alternative fixes:

            1. Before invoking fsp_fill_free_list() in fsp_header_init(), commit and restart the mini-transaction. To prevent page eviction, buf_block_buf_fix_inc() could be invoked before the mtr_t::commit().
            2. Remove the ibuf_mtr and use the passed mtr instead. Ensure that no caller is accessing any further pages before invoking mtr_t::commit().

            The transitive closure of all callers of fsp_fill_free_list() in that compilation unit is as follows:

            • fsp_alloc_free_extent()
            • fsp_alloc_free_page()
            • fseg_fill_free_list()
            • fseg_alloc_free_extent()
            • fseg_alloc_free_page_low()
            • fsp_alloc_seg_inode_page()
            • fsp_alloc_seg_inode()
            • fseg_create_general()
            • fseg_create()
            • fseg_alloc_free_page_general()
            • fsp_header_init()

            None of the functions are invoking mtr_t::commit(). The last 4 functions are global, and hence we must conduct a transitive closure of them in all files until the mtr_t::commit() call. Here is the closure of fseg_create_general(), fseg_create(), fseg_alloc_free_page_general() but not fsp_header_init():

            • fseg_create_general(): trx_undo_create(), trx_undo_assign_undo() looks OK
            • fseg_create(): btr_create() is allocating a further page in the same mini-transaction. Not OK!
            • fseg_create(): buf_dblwr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap.
            • fseg_create(): dict_hdr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap.
            • fseg_create(): trx_rseg_header_create() is looking up the TRX_SYS page after allocating a page. Might be problematic, although it only occurs during bootstrap.
            • fseg_create(): trx_sysf_create() might be OK, because it only happens during bootstrap.
            • fseg_alloc_free_page_general(): btr_page_alloc_low(), btr_page_alloc() get the mtr from their callers
            • fseg_alloc_free_page_general(): trx_undo_add_page() gets the mtr from its callers, and its callers trx_undo_report_row_operation() and trx_undo_report_rename() do not seem to be accessing any further pages before invoking mtr_t::commit().
            • btr_page_alloc(): btr_root_raise_and_insert() can allocate further pages in btr_page_split_and_insert(): Not OK!
            • btr_page_alloc(): There are also other callers, which I did not investigate.

            If we let the fsp_alloc_free_extent() call in fseg_alloc_free_extent() continue to use a sub-minitransaction, the transitive closure of affected callers would be limited to fsp_header_init(), which is listed below:

            • fsp_header_init(): Most callers are immediately invoking mtr_t::commit().
            • fsp_header_init(): srv_undo_tablespaces_init() is using a single mini-transaction for creating all undo tablespaces. Should be OK, because it happens at bootstrap.
            • fsp_header_init(): (10.2 and 10.3 only) srv_undo_tablespaces_init() is initializing the undo tablespaces in the recovery of undo tablespace truncation. Probably OK, because it happens at startup.
            • fsp_header_init(): innobase_start_or_create_for_mysql() is invoking btr_create() on the ibuf->index in the same mini-transaction. Probably OK, because it happens at bootstrap.
            marko Marko Mäkelä added a comment - It indeed seems to be the case that the use of a ‘sub-minitransaction’ ibuf_mtr in fsp_fill_free_list() is ‘overtaking’ the write of FSP_SIZE . I believe that the reason for using a separate ibuf_mtr in fsp_fill_free_list() is to avoid trouble with any caller that could access further pages while the change buffer bitmap page latch is being held. I see two alternative fixes: Before invoking fsp_fill_free_list() in fsp_header_init() , commit and restart the mini-transaction. To prevent page eviction, buf_block_buf_fix_inc() could be invoked before the mtr_t::commit() . Remove the ibuf_mtr and use the passed mtr instead. Ensure that no caller is accessing any further pages before invoking mtr_t::commit() . The transitive closure of all callers of fsp_fill_free_list() in that compilation unit is as follows: fsp_alloc_free_extent() fsp_alloc_free_page() fseg_fill_free_list() fseg_alloc_free_extent() fseg_alloc_free_page_low() fsp_alloc_seg_inode_page() fsp_alloc_seg_inode() fseg_create_general() fseg_create() fseg_alloc_free_page_general() fsp_header_init() None of the functions are invoking mtr_t::commit() . The last 4 functions are global, and hence we must conduct a transitive closure of them in all files until the mtr_t::commit() call. Here is the closure of fseg_create_general() , fseg_create() , fseg_alloc_free_page_general() but not fsp_header_init() : fseg_create_general() : trx_undo_create() , trx_undo_assign_undo() looks OK fseg_create() : btr_create() is allocating a further page in the same mini-transaction. Not OK! fseg_create() : buf_dblwr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap. fseg_create() : dict_hdr_create() is allocating multiple pages. Possibly OK, because this only happens during bootstrap. fseg_create() : trx_rseg_header_create() is looking up the TRX_SYS page after allocating a page. Might be problematic, although it only occurs during bootstrap. fseg_create() : trx_sysf_create() might be OK, because it only happens during bootstrap. fseg_alloc_free_page_general() : btr_page_alloc_low() , btr_page_alloc() get the mtr from their callers fseg_alloc_free_page_general() : trx_undo_add_page() gets the mtr from its callers, and its callers trx_undo_report_row_operation() and trx_undo_report_rename() do not seem to be accessing any further pages before invoking mtr_t::commit() . btr_page_alloc() : btr_root_raise_and_insert() can allocate further pages in btr_page_split_and_insert() : Not OK! btr_page_alloc() : There are also other callers, which I did not investigate. If we let the fsp_alloc_free_extent() call in fseg_alloc_free_extent() continue to use a sub-minitransaction, the transitive closure of affected callers would be limited to fsp_header_init() , which is listed below: fsp_header_init() : Most callers are immediately invoking mtr_t::commit() . fsp_header_init() : srv_undo_tablespaces_init() is using a single mini-transaction for creating all undo tablespaces. Should be OK, because it happens at bootstrap. fsp_header_init() : (10.2 and 10.3 only) srv_undo_tablespaces_init() is initializing the undo tablespaces in the recovery of undo tablespace truncation. Probably OK, because it happens at startup. fsp_header_init() : innobase_start_or_create_for_mysql() is invoking btr_create() on the ibuf->index in the same mini-transaction. Probably OK, because it happens at bootstrap.

            We should also check the transitive closure of callers of the following functions, which are modifying FSP_SIZE. No redo log for referring to newly added pages must be written ahead of the FSP_SIZE modification:

            • fsp_header_inc_size(): OK; only called by innobase_start_or_create_for_mysql(), which immediately calls mtr_t::commit()
            • fsp_try_extend_data_file_with_pages()
            • fsp_try_extend_data_file(): fsp_fill_free_list() is using the problematic sub-minitransaction as noted above. And this can occur also as part of fseg_alloc_free_extent() when a new change buffer bitmap page is created.

            I do not yet see how a deadlock involving the change buffer could be possible, as implied by the comment:

            			/* Initialize the ibuf bitmap page in a separate
            			mini-transaction because it is low in the latching
            			order, and we must be able to release its latch
            			before returning from the fsp routine */
            

            Any page allocation operation should be protected by an exclusive fil_space_t::latch. Other threads should not be able to access any of the newly allocated pages (including the change buffer bitmap page) while our mini-transaction is holding the tablespace latch. The only exception that I can think of would be read-ahead. In both buf_read_ahead_random() and buf_read_ahead_linear() we are reading fil_space_t::size without holding fil_space_t::latch, which should be protecting it. If we introduce the proper protection to fil_space_t::size, we should be able to eliminate the sub-minitransaction. If the tablespace is being extended while a read-ahead operation is executing, it would either block until the extension has been completed, or it would limit the read-ahead to pages that were not extended yet.

            Because introducing extra synchronization is expensive, we might rather replace the problematic use of fil_space_t::size with fil_space_t::free_limit. It should not make sense to read-ahead any unallocated pages in any case.

            marko Marko Mäkelä added a comment - We should also check the transitive closure of callers of the following functions, which are modifying FSP_SIZE . No redo log for referring to newly added pages must be written ahead of the FSP_SIZE modification: fsp_header_inc_size() : OK; only called by innobase_start_or_create_for_mysql() , which immediately calls mtr_t::commit() fsp_try_extend_data_file_with_pages() fsp_try_extend_data_file() : fsp_fill_free_list() is using the problematic sub-minitransaction as noted above. And this can occur also as part of fseg_alloc_free_extent() when a new change buffer bitmap page is created. I do not yet see how a deadlock involving the change buffer could be possible, as implied by the comment: /* Initialize the ibuf bitmap page in a separate mini-transaction because it is low in the latching order, and we must be able to release its latch before returning from the fsp routine */ Any page allocation operation should be protected by an exclusive fil_space_t::latch . Other threads should not be able to access any of the newly allocated pages (including the change buffer bitmap page) while our mini-transaction is holding the tablespace latch. The only exception that I can think of would be read-ahead. In both buf_read_ahead_random() and buf_read_ahead_linear() we are reading fil_space_t::size without holding fil_space_t::latch , which should be protecting it. If we introduce the proper protection to fil_space_t::size , we should be able to eliminate the sub-minitransaction. If the tablespace is being extended while a read-ahead operation is executing, it would either block until the extension has been completed, or it would limit the read-ahead to pages that were not extended yet. Because introducing extra synchronization is expensive, we might rather replace the problematic use of fil_space_t::size with fil_space_t::free_limit . It should not make sense to read-ahead any unallocated pages in any case.

            I prototyped 3 different solutions on 10.1 so far:

            1. Cache FSP_FREE_LIMIT in fil_space_t::free_limit, and protect access to it with fil_space_t::latch when determining the bounds for read-ahead and flushing. Problem: FSP_FREE_LIMIT can actually be larger than FSP_SIZE!
            2. Access fil_space_t::size protected by fil_space_t::latch when determining the bounds for read-ahead and neighbour flushing. Problem: Some operations will hang, because the latch acquisition would be violating the latching order.
            3. Copy fil_space_t::size to a new field fil_space_t::committed_size when mtr_commit() is releasing an exclusive fil_space_t::latch. There are some crashes, which I have not debugged yet.

            Once I have a workable solution for preventing read-ahead or flushing from accessing a newly extended change buffer bitmap page, I can remove the ibuf_mtr and use the single mini-transaction object in fsp_fill_free_list(), to fix the ordering problem that is breaking the recovery.

            marko Marko Mäkelä added a comment - I prototyped 3 different solutions on 10.1 so far: Cache FSP_FREE_LIMIT in fil_space_t::free_limit , and protect access to it with fil_space_t::latch when determining the bounds for read-ahead and flushing. Problem: FSP_FREE_LIMIT can actually be larger than FSP_SIZE ! Access fil_space_t::size protected by fil_space_t::latch when determining the bounds for read-ahead and neighbour flushing. Problem: Some operations will hang, because the latch acquisition would be violating the latching order. Copy fil_space_t::size to a new field fil_space_t::committed_size when mtr_commit() is releasing an exclusive fil_space_t::latch . There are some crashes, which I have not debugged yet. Once I have a workable solution for preventing read-ahead or flushing from accessing a newly extended change buffer bitmap page, I can remove the ibuf_mtr and use the single mini-transaction object in fsp_fill_free_list() , to fix the ordering problem that is breaking the recovery.

            I pushed a fix to bb-10.1-marko and merged it up to bb-10.5-MDEV-23190, with some conflict resolution in each of 10.2, 10.3, 10.4, 10.5. mleich, please test.

            marko Marko Mäkelä added a comment - I pushed a fix to bb-10.1-marko and merged it up to bb-10.5-MDEV-23190 , with some conflict resolution in each of 10.2, 10.3, 10.4, 10.5. mleich , please test.

            I pushed this to 10.1 and merged up to 10.5. Between each major version, some additional conflict resolution was needed.

            marko Marko Mäkelä added a comment - I pushed this to 10.1 and merged up to 10.5. Between each major version, some additional conflict resolution was needed.

            People

              marko Marko Mäkelä
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.