Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32968

InnoDB fails to restore tablespace first page from doublewrite buffer when page is empty

Details

    Description

      InnoDB fails to find the space id to restore the page from doublewrite
      buffer. It is not easy to find the space id from consecutive pages when
      we don't have page size information (fsp_flags also 0)

      Patch to repeat the failure:

      diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test
      index 89e6577b434..7eedfc7e636 100644
      --- a/mysql-test/suite/innodb/t/doublewrite.test
      +++ b/mysql-test/suite/innodb/t/doublewrite.test
      @@ -65,8 +65,9 @@ my $page_size = $ENV{INNODB_PAGE_SIZE};
       my $page;
       do "$ENV{MTR_SUITE_DIR}/../innodb/include/crc32.pl";
       open(FILE, "+<", $fname) or die;
      -sysseek(FILE, ($page_size/2), 0);
      -syswrite(FILE, chr(0) x ($page_size/2));
      +#sysseek(FILE, ($page_size/2), 0);
      +#syswrite(FILE, chr(0) x ($page_size/2));
      +syswrite(FILE, chr(0) x $page_size);
       sysseek(FILE, 3*$page_size, 0);
       sysread(FILE, $page, $page_size)==$page_size||die "Unable to read $name\n";
       sysseek(FILE, 3*$page_size, 0)||die "Unable to seek $fname\n";
      

      error message in log file:

      2023-12-07 22:32:59 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=49621,49621
      2023-12-07 22:32:59 0 [Note] InnoDB: Header page consists of zero bytes in datafile: ./test/t1.ibd, Space ID:0, Flags: 0
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size:1024. Pages to analyze:64
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size: 1024. Possible space_id count:0
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size:2048. Pages to analyze:64
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size: 2048. Possible space_id count:0
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size:4096. Pages to analyze:40
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size: 4096. Possible space_id count:0
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size:8192. Pages to analyze:20
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size: 8192. Possible space_id count:0
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size:16384. Pages to analyze:10
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size: 16384. Possible space_id count:0
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size:32768. Pages to analyze:5
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size: 32768. Possible space_id count:0
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size:65536. Pages to analyze:2
      2023-12-07 22:32:59 0 [Note] InnoDB: Page size: 65536. Possible space_id count:0
      2023-12-07 22:32:59 0 [ERROR] InnoDB: Datafile './test/t1.ibd' is corrupted. Cannot determine the space ID from the first 64 pages.
      2023-12-07 22:32:59 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/t1.ibd.
      2023-12-07 22:32:59 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
      

      Solution:

      Find the 0th pages in doublewrite buffer
      Extract the space id and flags from page0
      Use flags to find page_size information and read consecutive pages to match space id
      If all pages are matching with doublewrite buffer page0 then write into the datafile.

      Attachments

        Issue Links

          Activity

            This is a good start, but I think that we need some more validation of the doublewrite buffer pages.

            marko Marko Mäkelä added a comment - This is a good start, but I think that we need some more validation of the doublewrite buffer pages.

            I see that some extensive changes to srv_start() are necessary because a stricter validation of the doublewrite buffer requires that the log checkpoint be determined first.

            Some of these changes will conflict with MDEV-14425. Please create a 10.11 based version of the fix as well.

            marko Marko Mäkelä added a comment - I see that some extensive changes to srv_start() are necessary because a stricter validation of the doublewrite buffer requires that the log checkpoint be determined first. Some of these changes will conflict with MDEV-14425 . Please create a 10.11 based version of the fix as well.

            Preliminary results of RQG testing on
            origin/10.11-MDEV-32968 73961f01f921a2ddc11f67dc6ba29f474d990466 2024-01-08T18:39:40+05:30

            I hit non rare the following
            1. Start server and have it under DDL/DML load by several concurrent connections
            2. SIGKILL the DB server
            3. Restart attempt which fails with

            [rr 3000913 3742]mariadbd: /data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl:238: uint32_t page_get_page_no(const ib_page_t*): Assertion `page == page_align((ib_page_t*) page)' failed.
            (rr) bt
            #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=92470683770368) at ./nptl/pthread_kill.c:44
            #1  __pthread_kill_internal (signo=6, threadid=92470683770368) at ./nptl/pthread_kill.c:78
            #2  __GI___pthread_kill (threadid=92470683770368, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
            #3  0x00002445732a9476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
            #4  0x000024457328f7f3 in __GI_abort () at ./stdlib/abort.c:79
            #5  0x000024457328f71b in __assert_fail_base (fmt=0x244573444150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5588b9f27060 "page == page_align((ib_page_t*) page)", 
                file=0x5588b9f0ecf0 "/data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl", line=238, function=<optimized out>) at ./assert/assert.c:92
            #6  0x00002445732a0e96 in __GI___assert_fail (assertion=0x5588b9f27060 "page == page_align((ib_page_t*) page)", file=0x5588b9f0ecf0 "/data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl", line=238, 
                function=0x5588b9f27088 "uint32_t page_get_page_no(const ib_page_t*)") at ./assert/assert.c:101
            #7  0x00005588b97e1fc1 in page_get_page_no (page=page@entry=0x5588bc0e6000 "") at /data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl:238
            #8  0x00005588b97e5429 in recv_dblwr_t::find_first_page (this=0x5588ba6bffd0 <recv_sys+2320>, name=0x5588bc0090c0 "./test/t1.ibd", file=...) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:4918
            #9  0x00005588b9a834b3 in Datafile::validate_for_recovery (this=this@entry=0x7fff186c6b70) at /data/Server/10.11-MDEV-32968/storage/innobase/fsp/fsp0file.cc:439
            #10 0x00005588b9a69b70 in fil_ibd_load (space_id=30, filename=0x7fff186c6e80 "./test/t1.ibd", space=@0x7fff186c6e48: 0x0) at /data/Server/10.11-MDEV-32968/storage/innobase/fil/fil0fil.cc:2500
            #11 0x00005588b97ed7c4 in fil_name_process (name=name@entry=0x59286b5aa090 "./test/t1.ibd\260\001!", len=len@entry=13, space_id=<optimized out>, space_id@entry=30, ftype=FILE_MODIFY, lsn=lsn@entry=9765516, 
                if_exists=if_exists@entry=false) at /usr/include/c++/11/bits/basic_string.h:194
            #12 0x00005588b97fdecb in recv_sys_t::parse<recv_buf, false> (this=this@entry=0x5588ba6bf6c0 <recv_sys>, l=..., if_exists=if_exists@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:3095
            #13 0x00005588b97fe188 in recv_sys_t::parse_mtr<false> (if_exists=if_exists@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:3130
            #14 0x00005588b98009d0 in recv_sys_t::parse_pmem<false> (if_exists=if_exists@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:3141
            #15 0x00005588b97ee3d6 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:4123
            #16 0x00005588b97ef329 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:4606
            #17 0x00005588b992d5d9 in srv_start (create_new_db=<optimized out>) at /data/Server/10.11-MDEV-32968/storage/innobase/srv/srv0start.cc:1457
            #18 0x00005588b973af0b in innodb_init (p=<optimized out>) at /data/Server/10.11-MDEV-32968/storage/innobase/handler/ha_innodb.cc:4205
            #19 0x00005588b940e530 in ha_initialize_handlerton (plugin=0x5588bbf73610) at /data/Server/10.11-MDEV-32968/sql/handler.cc:655
            #20 0x00005588b91204d4 in plugin_do_initialize (plugin=plugin@entry=0x5588bbf73610, state=@0x7fff186cbbbc: 4) at /data/Server/10.11-MDEV-32968/sql/sql_plugin.cc:1453
            #21 0x00005588b9125a92 in plugin_initialize (tmp_root=tmp_root@entry=0x7fff186cc160, plugin=plugin@entry=0x5588bbf73610, argc=argc@entry=0x5588ba781280 <remaining_argc>, argv=argv@entry=0x5588bbec06b0, 
                options_only=<optimized out>) at /data/Server/10.11-MDEV-32968/sql/sql_plugin.cc:1506
            #22 0x00005588b9126aca in plugin_init (argc=argc@entry=0x5588ba781280 <remaining_argc>, argv=<optimized out>, flags=0) at /data/Server/10.11-MDEV-32968/sql/sql_plugin.cc:1764
            #23 0x00005588b8fc4029 in init_server_components () at /data/Server/10.11-MDEV-32968/sql/mysqld.cc:5224
            #24 0x00005588b8fc7bdc in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /data/Server/10.11-MDEV-32968/sql/mysqld.cc:5842
            #25 0x00005588b8fbbd66 in main (argc=<optimized out>, argv=<optimized out>) at /data/Server/10.11-MDEV-32968/sql/main.cc:34
            (rr)
            

            sdp:/data1/results/1704727765/CR-2097$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio mariadbd-1 # Fate of the server till SIGKILL
            sdp:/data1/results/1704727765/CR-2097$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio # Restart attempt

            mleich Matthias Leich added a comment - Preliminary results of RQG testing on origin/10.11- MDEV-32968 73961f01f921a2ddc11f67dc6ba29f474d990466 2024-01-08T18:39:40+05:30 I hit non rare the following 1. Start server and have it under DDL/DML load by several concurrent connections 2. SIGKILL the DB server 3. Restart attempt which fails with [rr 3000913 3742]mariadbd: /data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl:238: uint32_t page_get_page_no(const ib_page_t*): Assertion `page == page_align((ib_page_t*) page)' failed. (rr) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=92470683770368) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=92470683770368) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=92470683770368, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00002445732a9476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x000024457328f7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x000024457328f71b in __assert_fail_base (fmt=0x244573444150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5588b9f27060 "page == page_align((ib_page_t*) page)", file=0x5588b9f0ecf0 "/data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl", line=238, function=<optimized out>) at ./assert/assert.c:92 #6 0x00002445732a0e96 in __GI___assert_fail (assertion=0x5588b9f27060 "page == page_align((ib_page_t*) page)", file=0x5588b9f0ecf0 "/data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl", line=238, function=0x5588b9f27088 "uint32_t page_get_page_no(const ib_page_t*)") at ./assert/assert.c:101 #7 0x00005588b97e1fc1 in page_get_page_no (page=page@entry=0x5588bc0e6000 "") at /data/Server/10.11-MDEV-32968/storage/innobase/include/page0page.inl:238 #8 0x00005588b97e5429 in recv_dblwr_t::find_first_page (this=0x5588ba6bffd0 <recv_sys+2320>, name=0x5588bc0090c0 "./test/t1.ibd", file=...) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:4918 #9 0x00005588b9a834b3 in Datafile::validate_for_recovery (this=this@entry=0x7fff186c6b70) at /data/Server/10.11-MDEV-32968/storage/innobase/fsp/fsp0file.cc:439 #10 0x00005588b9a69b70 in fil_ibd_load (space_id=30, filename=0x7fff186c6e80 "./test/t1.ibd", space=@0x7fff186c6e48: 0x0) at /data/Server/10.11-MDEV-32968/storage/innobase/fil/fil0fil.cc:2500 #11 0x00005588b97ed7c4 in fil_name_process (name=name@entry=0x59286b5aa090 "./test/t1.ibd\260\001!", len=len@entry=13, space_id=<optimized out>, space_id@entry=30, ftype=FILE_MODIFY, lsn=lsn@entry=9765516, if_exists=if_exists@entry=false) at /usr/include/c++/11/bits/basic_string.h:194 #12 0x00005588b97fdecb in recv_sys_t::parse<recv_buf, false> (this=this@entry=0x5588ba6bf6c0 <recv_sys>, l=..., if_exists=if_exists@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:3095 #13 0x00005588b97fe188 in recv_sys_t::parse_mtr<false> (if_exists=if_exists@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:3130 #14 0x00005588b98009d0 in recv_sys_t::parse_pmem<false> (if_exists=if_exists@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:3141 #15 0x00005588b97ee3d6 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:4123 #16 0x00005588b97ef329 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-32968/storage/innobase/log/log0recv.cc:4606 #17 0x00005588b992d5d9 in srv_start (create_new_db=<optimized out>) at /data/Server/10.11-MDEV-32968/storage/innobase/srv/srv0start.cc:1457 #18 0x00005588b973af0b in innodb_init (p=<optimized out>) at /data/Server/10.11-MDEV-32968/storage/innobase/handler/ha_innodb.cc:4205 #19 0x00005588b940e530 in ha_initialize_handlerton (plugin=0x5588bbf73610) at /data/Server/10.11-MDEV-32968/sql/handler.cc:655 #20 0x00005588b91204d4 in plugin_do_initialize (plugin=plugin@entry=0x5588bbf73610, state=@0x7fff186cbbbc: 4) at /data/Server/10.11-MDEV-32968/sql/sql_plugin.cc:1453 #21 0x00005588b9125a92 in plugin_initialize (tmp_root=tmp_root@entry=0x7fff186cc160, plugin=plugin@entry=0x5588bbf73610, argc=argc@entry=0x5588ba781280 <remaining_argc>, argv=argv@entry=0x5588bbec06b0, options_only=<optimized out>) at /data/Server/10.11-MDEV-32968/sql/sql_plugin.cc:1506 #22 0x00005588b9126aca in plugin_init (argc=argc@entry=0x5588ba781280 <remaining_argc>, argv=<optimized out>, flags=0) at /data/Server/10.11-MDEV-32968/sql/sql_plugin.cc:1764 #23 0x00005588b8fc4029 in init_server_components () at /data/Server/10.11-MDEV-32968/sql/mysqld.cc:5224 #24 0x00005588b8fc7bdc in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /data/Server/10.11-MDEV-32968/sql/mysqld.cc:5842 #25 0x00005588b8fbbd66 in main (argc=<optimized out>, argv=<optimized out>) at /data/Server/10.11-MDEV-32968/sql/main.cc:34 (rr) sdp:/data1/results/1704727765/CR-2097$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio mariadbd-1 # Fate of the server till SIGKILL sdp:/data1/results/1704727765/CR-2097$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio # Restart attempt

            origin/10.11-MDEV-32968 55c83f5cc89a923d2f67744381bd550383ffb7c2 2024-01-11T18:05:57+05:30
            behaved well in RQG testing.

            mleich Matthias Leich added a comment - origin/10.11- MDEV-32968 55c83f5cc89a923d2f67744381bd550383ffb7c2 2024-01-11T18:05:57+05:30 behaved well in RQG testing.

            People

              thiru Thirunarayanan Balathandayuthapani
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.