Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31042

Certain tables are missing in innodb storage engine of a TSE enabled database after an incremental backup restore

Details

    Description

      Consider the following steps:

      1. Suppose there is an encrypted table "t"
      2. Take a base backup of the database
      3. Create a table like t and name it tmp_t
      4. Rename t to z_t and tmp_t to t
      5. Start an increment backup, while the increment backup is ongoing, execute
      optimize table on t (make sure this is executed after copying t to the backup)
      6. Restore the incremental backup and start the server.

      When we try to access z_t table to t_prime, we get the following error:
      errno: 155 "The table does not exist in the storage engine"

      Further, if the z_t is not empty, we get the following error during incremental backup restore:

      {{2023-04-12 15:07:28 4631856640 [Note] InnoDB: Starting final batch to recover 56 pages from redo log.
      2023-04-12 15:07:28 4631856640 [ERROR] [FATAL] InnoDB: Trying to read page number 4 in space 4, space name test/z_tmp_u1332929658s, which is outside the tablespace bounds. Byte offset 0, len 16384
      230412 15:07:28 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.2.44-MariaDB-debug
      key_buffer_size=0
      read_buffer_size=131072
      max_used_connections=0
      max_threads=1
      thread_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5466 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      Assertion failed: (m_files.empty()), function ~CreateTracker, file /Users/thejaka.kanewala/git/mysql/vagrant-build/thejaka-server/storage/innobase/sync/sync0debug.cc, line 1549.
      0 mariabackup 0x0000000108a6864c my_print_stacktrace + 60
      0 mariabackup 0x0000000107cf5ec0 handle_fatal_signal + 896
      Assertion failed: (m_files.empty()), function ~CreateTracker, file /Users/thejaka.kanewala/git/mysql/vagrant-build/thejaka-server/storage/innobase/sync/sync0debug.cc, line 1549.
      0 libsystem_platform.dylib 0x00007ff81b4badfd _sigtramp + 29
      0 ??? 0xf6b5c00840001004 0x0 + 17777326245780918276
      Assertion failed: (m_files.empty()), function ~CreateTracker, file /Users/thejaka.kanewala/git/mysql/vagrant-build/thejaka-server/storage/innobase/sync/sync0debug.cc, line 1549.
      0 libsystem_c.dylib 0x00007ff81b3f0d24 abort + 123
      Assertion failed: (m_files.empty()), function ~CreateTracker, file /Users/thejaka.kanewala/git/mysql/vagrant-build/thejaka-server/storage/innobase/sync/sync0debug.cc, line 1549.
      0 mariabackup 0x0000000108876d14 _ZN2ib5fatalD2Ev + 84
      0 mariabackup 0x0000000108876d65 _ZN2ib5fatalD1Ev + 21
      0 mariabackup 0x00000001084a1a18 _Z6fil_ioRK9IORequestb9page_id_tRK11page_size_tmmPvS6_b + 3256
      0 mariabackup 0x00000001084018ba _ZL17buf_read_page_lowP7dberr_tbmm9page_id_tRK11page_size_tbb + 1450
      0 mariabackup 0x00000001084037da _Z19buf_read_recv_pagesbmPKmm + 938
      0 mariabackup 0x000000010862755c _ZL17recv_read_in_area9page_id_t + 444
      0 mariabackup 0x0000000108626723 _Z26recv_apply_hashed_log_recsb + 1779
      0 mariabackup 0x00000001087d155a _Z34innobase_start_or_create_for_mysqlv + 12186
      0 mariabackup 0x0000000107c6aa8d _ZL11innodb_initv + 13
      0 mariabackup 0x0000000107c630a7 _ZL23xtrabackup_prepare_funcPPc + 2279
      0 mariabackup 0x0000000107c59ef8 _ZL8main_lowPPc + 1784
      0 mariabackup 0x0000000107c5966b main + 603
      0 dyld 0x00000001140cd52e start + 462
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Core pattern: /cores/core.%P
      ......}}

      Attachments

        Issue Links

          Activity

            Hello –

            To fix this issue, I created a PR against the 10.2. branch. The PR is in [1]. The PR includes 3 MTRs and the first two MTRs shows the behavior of the issue when the table is empty and when the table is not empty (MDEV-31042_1.test and MDEV-31042_2.test).

            [1] https://github.com/MariaDB/server/pull/2594

            My analysis is as follows (also in the PR):

            The Problem
            ==========
            Consider the following steps:

            Suppose there is an encrypted table "t"
            Take a base backup of the database
            Create a table like t and name it tmp_t
            Rename t to z_t and tmp_t to t
            Start an increment backup, while the increment backup is ongoing, execute
            optimize table on t (make sure this is executed after copying t to the backup)
            Restore the incremental backup and start the server.
            When we try to rename z_t table to t_prime, we get the following error:
            errno: 155 "The table does not exist in the storage engine"

            The Cause
            ========
            An incremental backup collect pages modified since a given LSN number.
            The modified pages are written to the backup as .delta files.
            Further, tables created/optimized while the backup is on-going on are
            saved with a suffix such as ".new", ".ren".

            The step 5 above creates a t.new file. These files are processed
            at the begining of the apply log stage when preparing the incremental
            backup. When t.new file is processed, mariabackup deletes the
            existing t.ibd file from the base directory and copies t.new as t.ibd.
            However, the new t.ibd file carries a tablespace id different from
            the old t.ibd file.

            To apply the delta, mariabackup finds the proper tablespace file
            in "xb_delta_open_matching_space". Mariabackup fails to
            find a matching tablespace for z_t (created at step 4 above) inside
            "xb_delta_open_matching_space", because there isn't a matching tablespace
            in the base backup by the name z_t and also there isn't a matching space
            with z_t's tablespace id. At this point, mariabackup creates new tablespace
            file. However, this newly created tablespace file does not contain
            crypt data and it just have 4 allocated pages. If delta file for
            z_t table is empty, the restore operation succeeds but when
            we try to access z_t after restore it remains corrupted. If delta file
            contains pages (i.e., z_t is not empty), then we will see mariabackup
            throwing an error saying that delta file cannot be applied because it cannot
            read page 4 from the ibd file.

            The Fix
            =====
            When processing .new files, deleting original t.ibd file and replacing
            it with t.new file is the main cause for the issue.

            Therefore, in the fix, when processing t.new files we do not
            remove original t.ibd file, instead we rename the existing t.ibd
            file to xtrabackup_tmp#<space_id>.ibd.

            With this, when mariabackup try to find a matching space for z_t.delta.ibd, it will
            return the xtrabackup_tmp#<space_id>.ibd and will be renamed to z_t.ibd in the
            base backup. Then, afterwards, mariabackup should be able to apply the delta
            to the z_t.ibd.

            thejaka Thejaka Kanewala added a comment - Hello – To fix this issue, I created a PR against the 10.2. branch. The PR is in [1] . The PR includes 3 MTRs and the first two MTRs shows the behavior of the issue when the table is empty and when the table is not empty ( MDEV-31042 _1.test and MDEV-31042 _2.test). [1] https://github.com/MariaDB/server/pull/2594 My analysis is as follows (also in the PR): The Problem ========== Consider the following steps: Suppose there is an encrypted table "t" Take a base backup of the database Create a table like t and name it tmp_t Rename t to z_t and tmp_t to t Start an increment backup, while the increment backup is ongoing, execute optimize table on t (make sure this is executed after copying t to the backup) Restore the incremental backup and start the server. When we try to rename z_t table to t_prime, we get the following error: errno: 155 "The table does not exist in the storage engine" The Cause ======== An incremental backup collect pages modified since a given LSN number. The modified pages are written to the backup as .delta files. Further, tables created/optimized while the backup is on-going on are saved with a suffix such as ".new", ".ren". The step 5 above creates a t.new file. These files are processed at the begining of the apply log stage when preparing the incremental backup. When t.new file is processed, mariabackup deletes the existing t.ibd file from the base directory and copies t.new as t.ibd. However, the new t.ibd file carries a tablespace id different from the old t.ibd file. To apply the delta, mariabackup finds the proper tablespace file in "xb_delta_open_matching_space". Mariabackup fails to find a matching tablespace for z_t (created at step 4 above) inside "xb_delta_open_matching_space", because there isn't a matching tablespace in the base backup by the name z_t and also there isn't a matching space with z_t's tablespace id. At this point, mariabackup creates new tablespace file. However, this newly created tablespace file does not contain crypt data and it just have 4 allocated pages. If delta file for z_t table is empty, the restore operation succeeds but when we try to access z_t after restore it remains corrupted. If delta file contains pages (i.e., z_t is not empty), then we will see mariabackup throwing an error saying that delta file cannot be applied because it cannot read page 4 from the ibd file. The Fix ===== When processing .new files, deleting original t.ibd file and replacing it with t.new file is the main cause for the issue. Therefore, in the fix, when processing t.new files we do not remove original t.ibd file, instead we rename the existing t.ibd file to xtrabackup_tmp#<space_id>.ibd. With this, when mariabackup try to find a matching space for z_t.delta.ibd, it will return the xtrabackup_tmp#<space_id>.ibd and will be renamed to z_t.ibd in the base backup. Then, afterwards, mariabackup should be able to apply the delta to the z_t.ibd.

            thejaka, thank you for the report.

            The MariaDB Server 10.2 series has already reached its end of life (EOL). The last release in that series was 10.2.44 in May 2022.

            Also MariaDB Server 10.3 will reach EOL in the upcoming 10.3.39 release. Therefore, the fix should target 10.4.

            Because of MDEV-12353 and MDEV-24626, I think that it would be good to test the fix also on 10.5 and 10.6 as part of the review.

            marko Marko Mäkelä added a comment - thejaka , thank you for the report. The MariaDB Server 10.2 series has already reached its end of life (EOL). The last release in that series was 10.2.44 in May 2022. Also MariaDB Server 10.3 will reach EOL in the upcoming 10.3.39 release. Therefore, the fix should target 10.4. Because of MDEV-12353 and MDEV-24626 , I think that it would be good to test the fix also on 10.5 and 10.6 as part of the review.

            Reproduces in latest 10.4 (f21664414da):

            #12 0x00007f96eb433596 in __GI___assert_fail (assertion=0x559a573fcf75 "fil_page_get_type(page) || (!newest_lsn || (mach_read_from_4(page + 34U) == block->page.id.space() && mach_read_from_
            4(page + 8U) == 0xffffffff && mach_read_from_4(page + 12U) == 0xffffffff && !mach_read_from_4(page + 0)))", file=0x559a573fcb08 "../src/storage/innobase/buf/buf0flu.cc", line=790, function=
            0x559a573fce9e "void buf_flush_init_for_writing(const buf_block_t *, byte *, void *, lsn_t, bool)") at ./assert/assert.c:101
            #13 0x0000559a56dc7f7a in buf_flush_init_for_writing (block=0x7f96e4b58338, page=0x7f96e5030000 "\f\306U-", page_zip_=0x0, newest_lsn=85151, use_full_checksum=false) at ../src/storage/innob
            ase/buf/buf0flu.cc:784
            #14 0x0000559a56dc9e42 in buf_flush_write_block_low (bpage=0x7f96e4b58338, flush_type=BUF_FLUSH_LIST, sync=false) at ../src/storage/innobase/buf/buf0flu.cc:1023
            #15 0x0000559a56dc9073 in buf_flush_page (buf_pool=0x559a5949a590, bpage=0x7f96e4b58338, flush_type=BUF_FLUSH_LIST, sync=false) at ../src/storage/innobase/buf/buf0flu.cc:1180
            #16 0x0000559a56dd1ebf in buf_flush_try_neighbors (page_id=..., flush_type=BUF_FLUSH_LIST, n_flushed=131, n_to_flush=18446744073709551614) at ../src/storage/innobase/buf/buf0flu.cc:1404
            #17 0x0000559a56dcade9 in buf_flush_page_and_try_neighbors (bpage=0x7f96e4b58338, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x7f96bcff8360) at ../src/storage/innobase/buf/buf0flu.cc:1478
            #18 0x0000559a56dd4932 in buf_do_flush_list_batch (buf_pool=0x559a5949a590, min_n=18446744073709551614, lsn_limit=85151) at ../src/storage/innobase/buf/buf0flu.cc:1736
            #19 0x0000559a56dcb853 in buf_flush_batch (buf_pool=0x559a5949a590, flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=85151, n=0x7f96bcff8620) at ../src/storage/innobase/buf/buf0flu.cc:1807
            #20 0x0000559a56dcb1cb in buf_flush_do_batch (buf_pool=0x559a5949a590, type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=85151, n=0x7f96bcff8620) at ../src/storage/innobase/buf/buf0flu.cc:1965
            #21 0x0000559a56dcbe96 in buf_flush_lists (min_n=18446744073709551614, lsn_limit=85151, n_processed=0x7f96bcff86c8) at ../src/storage/innobase/buf/buf0flu.cc:2067
            #22 0x0000559a56e72078 in fil_crypt_start_encrypting_space (space=0x559a59549ec0) at ../src/storage/innobase/fil/fil0crypt.cc:1193
            #23 0x0000559a56e718c0 in fil_crypt_space_needs_rotation (state=0x7f96bcff8cb8, key_state=0x7f96bcff8ca8, recheck=0x7f96bcff8cb7) at ../src/storage/innobase/fil/fil0crypt.cc:1326
            #24 0x0000559a56e6e429 in fil_crypt_find_space_to_rotate (key_state=0x7f96bcff8ca8, state=0x7f96bcff8cb8, recheck=0x7f96bcff8cb7) at ../src/storage/innobase/fil/fil0crypt.cc:1754
            #25 0x0000559a56e6e096 in fil_crypt_thread () at ../src/storage/innobase/fil/fil0crypt.cc:2456
            #26 0x00007f96eb490402 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
            #27 0x00007f96eb51f590 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
            

            midenok Aleksey Midenkov added a comment - Reproduces in latest 10.4 (f21664414da): #12 0x00007f96eb433596 in __GI___assert_fail (assertion=0x559a573fcf75 "fil_page_get_type(page) || (!newest_lsn || (mach_read_from_4(page + 34U) == block->page.id.space() && mach_read_from_ 4(page + 8U) == 0xffffffff && mach_read_from_4(page + 12U) == 0xffffffff && !mach_read_from_4(page + 0)))", file=0x559a573fcb08 "../src/storage/innobase/buf/buf0flu.cc", line=790, function= 0x559a573fce9e "void buf_flush_init_for_writing(const buf_block_t *, byte *, void *, lsn_t, bool)") at ./assert/assert.c:101 #13 0x0000559a56dc7f7a in buf_flush_init_for_writing (block=0x7f96e4b58338, page=0x7f96e5030000 "\f\306U-", page_zip_=0x0, newest_lsn=85151, use_full_checksum=false) at ../src/storage/innob ase/buf/buf0flu.cc:784 #14 0x0000559a56dc9e42 in buf_flush_write_block_low (bpage=0x7f96e4b58338, flush_type=BUF_FLUSH_LIST, sync=false) at ../src/storage/innobase/buf/buf0flu.cc:1023 #15 0x0000559a56dc9073 in buf_flush_page (buf_pool=0x559a5949a590, bpage=0x7f96e4b58338, flush_type=BUF_FLUSH_LIST, sync=false) at ../src/storage/innobase/buf/buf0flu.cc:1180 #16 0x0000559a56dd1ebf in buf_flush_try_neighbors (page_id=..., flush_type=BUF_FLUSH_LIST, n_flushed=131, n_to_flush=18446744073709551614) at ../src/storage/innobase/buf/buf0flu.cc:1404 #17 0x0000559a56dcade9 in buf_flush_page_and_try_neighbors (bpage=0x7f96e4b58338, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x7f96bcff8360) at ../src/storage/innobase/buf/buf0flu.cc:1478 #18 0x0000559a56dd4932 in buf_do_flush_list_batch (buf_pool=0x559a5949a590, min_n=18446744073709551614, lsn_limit=85151) at ../src/storage/innobase/buf/buf0flu.cc:1736 #19 0x0000559a56dcb853 in buf_flush_batch (buf_pool=0x559a5949a590, flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=85151, n=0x7f96bcff8620) at ../src/storage/innobase/buf/buf0flu.cc:1807 #20 0x0000559a56dcb1cb in buf_flush_do_batch (buf_pool=0x559a5949a590, type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=85151, n=0x7f96bcff8620) at ../src/storage/innobase/buf/buf0flu.cc:1965 #21 0x0000559a56dcbe96 in buf_flush_lists (min_n=18446744073709551614, lsn_limit=85151, n_processed=0x7f96bcff86c8) at ../src/storage/innobase/buf/buf0flu.cc:2067 #22 0x0000559a56e72078 in fil_crypt_start_encrypting_space (space=0x559a59549ec0) at ../src/storage/innobase/fil/fil0crypt.cc:1193 #23 0x0000559a56e718c0 in fil_crypt_space_needs_rotation (state=0x7f96bcff8cb8, key_state=0x7f96bcff8ca8, recheck=0x7f96bcff8cb7) at ../src/storage/innobase/fil/fil0crypt.cc:1326 #24 0x0000559a56e6e429 in fil_crypt_find_space_to_rotate (key_state=0x7f96bcff8ca8, state=0x7f96bcff8cb8, recheck=0x7f96bcff8cb7) at ../src/storage/innobase/fil/fil0crypt.cc:1754 #25 0x0000559a56e6e096 in fil_crypt_thread () at ../src/storage/innobase/fil/fil0crypt.cc:2456 #26 0x00007f96eb490402 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #27 0x00007f96eb51f590 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

            People

              midenok Aleksey Midenkov
              thejaka Thejaka Kanewala
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.