Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31791

Crash recovery in the test innodb.recovery_memory occasionally fails

Details

    Description

      While trying to reproduce MDEV-31354, I observed that in the test innodb.recovery_memory, InnoDB occasionally fails to recover, because it is reading garbage, typically from an undo tablespace page. This occurs also with innodb_use_native_aio=0.

      We should try to reproduce this with rr record so that the root cause can be analyzed.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value

            pluto:/data/Server_bin/bb-11.2-MDEV-14795A_asan_Og/mariadb-test$ _RR_TRACE_DIR=MDEV-31791/195/log/mysqld.1.rr/ rr replay --mark-stdio
            

            mleich Matthias Leich added a comment - pluto:/data/Server_bin/bb-11.2-MDEV-14795A_asan_Og/mariadb-test$ _RR_TRACE_DIR=MDEV-31791/195/log/mysqld.1.rr/ rr replay --mark-stdio
            mleich Matthias Leich made changes -
            Labels corruption need_rr recovery corruption recovery

            Thank you, mleich. The trace is of a debug-instrumented server. It fails on the second log_phys_t::apply() on page 1:0x34, which is a WRITE to the page:

            {0x34, 0x1, 0x34, 0x19, 0x6}

            I think that the problem is that the first record on which log_phys_t::apply() is invoked is corrupted. I would expect it to be INIT_PAGE (0x12 0x01 0x34), but I am seeing 0x00 0x40 0x00, and log_phys_t::apply() would treat the first 0x00 as an end-of-record marker, making the first record a no-op. Because no INIT_PAGE record was applied, a page number mismatch in the page frame would trip an debug assertion while trying to apply the second record. It looks like this record was copied due to an error already during the parsing stage:

            #0  recv_ring::get_buf (decrypt_buf=0x7ffc690f8200 "", start=<optimized out>, buf=0x7efcc4ebd001 "8\001X\204r", this=0x7ffc690fcfe0) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2290
            #1  recv_sys_t::parse<recv_ring, true> (this=this@entry=0x55ecf371b700 <recv_sys>, l=@0x7ffc690fcfe0: {<recv_buf> = {ptr = 0x7efcc4ebd001 "8\001X\204r"}, <No data fields>}, if_exists=if_exists@entry=false)
                at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2826
            #2  0x000055ecf185bbf2 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2199
            #3  0x000055ecf1810a60 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:4014
            

            It may be notable that the parsing buffer is log_sys.buf+0x3001, that is, one byte after the start of the log record area. I did not yet figure out how the bytes got transformed into 0x00 0x40 0x40, but I can say that innodb_encrypt_log=OFF.

            I will rerun the test somewhere else than /dev/shm, to see if the MDEV-14425 "fake PMEM" could be involved here.

            marko Marko Mäkelä added a comment - Thank you, mleich . The trace is of a debug-instrumented server. It fails on the second log_phys_t::apply() on page 1:0x34, which is a WRITE to the page: {0x34, 0x1, 0x34, 0x19, 0x6} I think that the problem is that the first record on which log_phys_t::apply() is invoked is corrupted. I would expect it to be INIT_PAGE (0x12 0x01 0x34), but I am seeing 0x00 0x40 0x00, and log_phys_t::apply() would treat the first 0x00 as an end-of-record marker, making the first record a no-op. Because no INIT_PAGE record was applied, a page number mismatch in the page frame would trip an debug assertion while trying to apply the second record. It looks like this record was copied due to an error already during the parsing stage: #0 recv_ring::get_buf (decrypt_buf=0x7ffc690f8200 "", start=<optimized out>, buf=0x7efcc4ebd001 "8\001X\204r", this=0x7ffc690fcfe0) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2290 #1 recv_sys_t::parse<recv_ring, true> (this=this@entry=0x55ecf371b700 <recv_sys>, l=@0x7ffc690fcfe0: {<recv_buf> = {ptr = 0x7efcc4ebd001 "8\001X\204r"}, <No data fields>}, if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2826 #2 0x000055ecf185bbf2 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2199 #3 0x000055ecf1810a60 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:4014 It may be notable that the parsing buffer is log_sys.buf+0x3001 , that is, one byte after the start of the log record area. I did not yet figure out how the bytes got transformed into 0x00 0x40 0x40, but I can say that innodb_encrypt_log=OFF . I will rerun the test somewhere else than /dev/shm , to see if the MDEV-14425 "fake PMEM" could be involved here.

            The "fake PMEM" is not the culprit (and we did test buffer wrap-around in MDEV-14425), because I can reproduce this also as follows:

            mkdir /run/user/$UID/var
            rm -fr var
            ln -sf /run/user/$UID/var var
            ./mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}
            

            11.2 e81fa345020ec6a067583db6a7019d6404b26f93

            innodb.recovery_memory 'innodb,release'  w65 [ 14 fail ]
                    Test ended at 2023-08-01 17:22:19
             
            CURRENT_TEST: innodb.recovery_memory
            mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
            

            marko Marko Mäkelä added a comment - The "fake PMEM" is not the culprit (and we did test buffer wrap-around in MDEV-14425 ), because I can reproduce this also as follows: mkdir /run/user/ $UID /var rm -fr var ln -sf /run/user/ $UID /var var . /mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,} 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 innodb.recovery_memory 'innodb,release' w65 [ 14 fail ] Test ended at 2023-08-01 17:22:19   CURRENT_TEST: innodb.recovery_memory mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

            In the rr replay trace that I am analyzing, the suspicious bytes 0x00 0x40 0x40 are being added for the page as a result of parsing a mini-transaction that wraps around the end of the circular log file. The mini-transaction looks like this, starting at the memory-mapped log_sys.buf+0x9fffc5 and wrapping around 0xa00000:

            bytes description
            0x34, 0x1, 0x0, 0x75, 0xb2 WRITE, page 1:0
            0x35, 0x1, 0x58, 0x84, 0x39, 0xb1 WRITE, page 1:0x58
            0xc3, 0x8, 0x4, 0xff MEMSET, same page
            0xd3, 0x2, 0x6, 0xb MEMMOVE, same page
            0xc3, 0x4, 0x4, 0xff MEMSET, same page
            0xd3, 0x2, 0x6, 0xb MEMMOVE, same page
            0xc3, 0x4, 0x4, 0xff MEMSET, same page
            0xd3, 0x2, 0x6, 0xb MEMMOVE, same page
            0xb5, 0x0, 0x5, 0xd6, 0x69, 0xd2 WRITE, same page
            0xc4, 0x0, 0x80, 0x0, 0xff MEMSET, same page
            0x35, 0x1, 0x0, 0x80, 0x3b, 0xfe WRITE, page 1:0
            0x34, 0x1, 0x0, 0x3d, 0x37 WRITE, page 1:0
            0x12, 0x1, 0x34 INIT_PAGE 1:0x34 (the buffer wraps around between 0x12, 0x1 and 0x34; this is wrongly buffered as 0x00, 0x40, 0x00)
            0x38, 0x1, 0x58, 0x84, 0x72, 0x0, 0x0, 0x0, 0x34 WRITE, page 1:0x58
            0x34, 0x1, 0x34, 0x19, 0x6 WRITE, page 1:0x34, byte 0x6 at offset 0x19 (a debug assertion fails while applying this record)
            0xb3, 0x2a, 0x4, 0xb2 … WRITE, same page …

            This bug is something that must have been introduced by MDEV-14425.

            marko Marko Mäkelä added a comment - In the rr replay trace that I am analyzing, the suspicious bytes 0x00 0x40 0x40 are being added for the page as a result of parsing a mini-transaction that wraps around the end of the circular log file. The mini-transaction looks like this, starting at the memory-mapped log_sys.buf+0x9fffc5 and wrapping around 0xa00000 : bytes description 0x34, 0x1, 0x0, 0x75, 0xb2 WRITE, page 1:0 0x35, 0x1, 0x58, 0x84, 0x39, 0xb1 WRITE, page 1:0x58 0xc3, 0x8, 0x4, 0xff MEMSET, same page 0xd3, 0x2, 0x6, 0xb MEMMOVE, same page 0xc3, 0x4, 0x4, 0xff MEMSET, same page 0xd3, 0x2, 0x6, 0xb MEMMOVE, same page 0xc3, 0x4, 0x4, 0xff MEMSET, same page 0xd3, 0x2, 0x6, 0xb MEMMOVE, same page 0xb5, 0x0, 0x5, 0xd6, 0x69, 0xd2 WRITE, same page 0xc4, 0x0, 0x80, 0x0, 0xff MEMSET, same page 0x35, 0x1, 0x0, 0x80, 0x3b, 0xfe WRITE, page 1:0 0x34, 0x1, 0x0, 0x3d, 0x37 WRITE, page 1:0 0x12, 0x1, 0x34 INIT_PAGE 1:0x34 (the buffer wraps around between 0x12, 0x1 and 0x34; this is wrongly buffered as 0x00, 0x40, 0x00) 0x38, 0x1, 0x58, 0x84, 0x72, 0x0, 0x0, 0x0, 0x34 WRITE, page 1:0x58 0x34, 0x1, 0x34, 0x19, 0x6 WRITE, page 1:0x34, byte 0x6 at offset 0x19 (a debug assertion fails while applying this record) 0xb3, 0x2a, 0x4, 0xb2 … WRITE, same page … This bug is something that must have been introduced by MDEV-14425 .
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Affects Version/s 10.8 [ 26121 ]
            Affects Version/s 10.9 [ 26905 ]
            Affects Version/s 10.10 [ 27530 ]
            Affects Version/s 10.11 [ 27614 ]
            Affects Version/s 11.0 [ 28320 ]
            Affects Version/s 11.1 [ 28549 ]
            Assignee Matthias Leich [ mleich ] Marko Mäkelä [ marko ]
            Labels corruption recovery corruption recovery rr-profile-analyzed
            Priority Critical [ 2 ] Blocker [ 1 ]

            I was able to reproduce the scenario of the rr replay trace using a variant of the patch below that would assert cl == l.ptr when an INIT_PAGE record was parsed:

            diff --git a/mysql-test/suite/innodb/t/recovery_memory.opt b/mysql-test/suite/innodb/t/recovery_memory.opt
            index 7207fd27a42..87c20c9abad 100644
            --- a/mysql-test/suite/innodb/t/recovery_memory.opt
            +++ b/mysql-test/suite/innodb/t/recovery_memory.opt
            @@ -1 +1 @@
            ---innodb_buffer_pool_size=1073741824
            +--innodb_buffer_pool_size=1073741824 --innodb-log-file-size=5m
            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 66b763fdf2e..19b25a0e0f4 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -2303,8 +2303,6 @@ struct recv_ring : public recv_buf
               const byte *copy_if_needed(const byte *iv, byte *tmp, recv_ring start,
                                          size_t len)
               {
            -    if (!len)
            -      return ptr;
                 const size_t s(*this - start);
                 ut_ad(s + len <= srv_page_size);
                 if (!log_sys.is_encrypted())
            @@ -2658,6 +2656,8 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)
                     store_freed_or_init_rec(id, (b & 0x70) == FREE_PAGE);
                     if (UNIV_UNLIKELY(rlen != 0))
                       goto record_corrupted;
            +      copy_if_needed:
            +        cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);
                     break;
                   case EXTENDED:
                     if (UNIV_UNLIKELY(!rlen))
            @@ -2689,10 +2689,7 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)
                     break;
                   case OPTION:
                     if (rlen == 5 && *l == OPT_PAGE_CHECKSUM)
            -        {
            -          cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);
            -          break;
            -        }
            +          goto copy_if_needed;
                     /* fall through */
                   case RESERVED:
                     continue;
            

            Also for me, the last byte of a 3-byte INIT_PAGE record would start at byte offset 12288.

            marko Marko Mäkelä added a comment - I was able to reproduce the scenario of the rr replay trace using a variant of the patch below that would assert cl == l.ptr when an INIT_PAGE record was parsed: diff --git a/mysql-test/suite/innodb/t/recovery_memory.opt b/mysql-test/suite/innodb/t/recovery_memory.opt index 7207fd27a42..87c20c9abad 100644 --- a/mysql-test/suite/innodb/t/recovery_memory.opt +++ b/mysql-test/suite/innodb/t/recovery_memory.opt @@ -1 +1 @@ ---innodb_buffer_pool_size=1073741824 +--innodb_buffer_pool_size=1073741824 --innodb-log-file-size=5m diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 66b763fdf2e..19b25a0e0f4 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2303,8 +2303,6 @@ struct recv_ring : public recv_buf const byte *copy_if_needed(const byte *iv, byte *tmp, recv_ring start, size_t len) { - if (!len) - return ptr; const size_t s(*this - start); ut_ad(s + len <= srv_page_size); if (!log_sys.is_encrypted()) @@ -2658,6 +2656,8 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists) store_freed_or_init_rec(id, (b & 0x70) == FREE_PAGE); if (UNIV_UNLIKELY(rlen != 0)) goto record_corrupted; + copy_if_needed: + cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen); break; case EXTENDED: if (UNIV_UNLIKELY(!rlen)) @@ -2689,10 +2689,7 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists) break; case OPTION: if (rlen == 5 && *l == OPT_PAGE_CHECKSUM) - { - cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen); - break; - } + goto copy_if_needed; /* fall through */ case RESERVED: continue; Also for me, the last byte of a 3-byte INIT_PAGE record would start at byte offset 12288.

            The above patch (with the fix of MDEV-31354 additionally applied) worked for me.

            ./mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}
            

            11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

            innodb.recovery_memory 'innodb,release'  w60 [ 100 pass ]  14458
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 157475.278 of 1635 seconds executing testcases
             
            Completed: All 10000 tests were successful.
            

            I will rerun the test once more without the change for the mysql-test/suite/innodb/t/recovery_memory.opt file.

            marko Marko Mäkelä added a comment - The above patch (with the fix of MDEV-31354 additionally applied) worked for me. . /mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,} 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches … innodb.recovery_memory 'innodb,release' w60 [ 100 pass ] 14458 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 157475.278 of 1635 seconds executing testcases   Completed: All 10000 tests were successful. I will rerun the test once more without the change for the mysql-test/suite/innodb/t/recovery_memory.opt file.

            11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

            innodb.recovery_memory 'innodb,release'  w73 [ 100 pass ]  22498
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 166775.724 of 1760 seconds executing testcases
             
            Completed: All 10000 tests were successful.
            

            Yesterday, I reproduced a failure of the test with InnoDB: Buffered log writes (block size=512 bytes) instead of the "fake PMEM" (memory-mapped log). That failure might have been explained by the absence of the fix of MDEV-31354 in the branch. I conducted one more run of that, while there was some significant concurrent load on the system during part of the run. Surprisingly, near the end of the run, a new type of failure (filed as MDEV-31827) was encountered:

            11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

            innodb.recovery_memory 'innodb,release'  w4 [ 89 fail ]  Found warnings/errors in server log file!
                    Test ended at 2023-08-02 12:21:21
            line
            2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
            2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
            2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
            2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
            2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
            

            marko Marko Mäkelä added a comment - 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches … innodb.recovery_memory 'innodb,release' w73 [ 100 pass ] 22498 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 166775.724 of 1760 seconds executing testcases   Completed: All 10000 tests were successful. Yesterday, I reproduced a failure of the test with InnoDB: Buffered log writes (block size=512 bytes) instead of the "fake PMEM" (memory-mapped log). That failure might have been explained by the absence of the fix of MDEV-31354 in the branch. I conducted one more run of that, while there was some significant concurrent load on the system during part of the run. Surprisingly, near the end of the run, a new type of failure (filed as MDEV-31827 ) was encountered: 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches innodb.recovery_memory 'innodb,release' w4 [ 89 fail ] Found warnings/errors in server log file! Test ended at 2023-08-02 12:21:21 line 2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt! 2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369. 2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. 2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369. 2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2023-08-02 10:28:57.0 2023-08-02 10:28:57.101
            marko Marko Mäkelä made changes -
            Fix Version/s 10.9.8 [ 29015 ]
            Fix Version/s 10.10.6 [ 29017 ]
            Fix Version/s 10.11.5 [ 29019 ]
            Fix Version/s 11.0.3 [ 28920 ]
            Fix Version/s 11.1.2 [ 28921 ]
            Fix Version/s 11.2.1 [ 29034 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.