Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11799

InnoDB can abort if the doublewrite buffer contains a bad and a good copy

Details

    • 10.1.21

    Description

      This came up when testing MDEV-11623:

      # 2017-01-14T19:58:03 [2066] --------- Contents of /home/buildbot/upgrade-10.1-to-10.1/current1_1/mysql.err -------------
      2017-01-14 19:58:02 139960389887872 [Note] /home/buildbot/build/sql/mysqld (mysqld 10.1.21-MariaDB-debug) starting as process 2233 ...
      2017-01-14 19:58:02 139960389887872 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used;  This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=debian-jessie-amd64' or '--log-bin=mysql-bin' to avoid this problem.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: innodb_page_size has been changed from default value 16384 to 8192.
      2017-01-14 19:58:02 7f4b11526780 InnoDB: innodb-page-size has been changed from the default value 16384 to 8192.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: The InnoDB memory heap is disabled
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Compressed tables use zlib 1.2.8
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using Linux native AIO
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using generic crc32 instructions
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Initializing buffer pool, size = 128.0M
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Completed initialization of buffer pool
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 0 name ./ibdata1 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 4294967280 name ./ib_logfile0 key_id 0 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 4294967281 name arch_log_space key_id 0 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Highest supported file format is Barracuda.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: The log sequence numbers 6373461 and 6373461 in ibdata files do not match the log sequence number 6373471 in the ib_logfiles!
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Database was not shutdown normally!
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Starting crash recovery.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Reading tablespace information from the .ibd files...
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 11 name test/t6 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 15 name test/oltp1 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 18 name test/oltp7 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 9 name test/t4 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 17 name test/oltp5 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 6 name test/t1 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 16 name test/oltp3 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 10 name test/t5 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 7 name test/t2 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 8 name test/t3 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 14 name test/t9 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 12 name test/t7 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 13 name test/t8 key_id 1 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 2 name mysql/innodb_index_stats key_id 0 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 3 name mysql/gtid_slave_pos key_id 0 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 1 name mysql/innodb_table_stats key_id 0 encryption 0.
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Restoring possible half-written data pages 
      2017-01-14 19:58:02 139960389887872 [Note] InnoDB: from the doublewrite buffer...
      InnoDB: Database page corruption or a failed
      InnoDB: file read of space 0 page 674.
      InnoDB: Trying to recover it from the doublewrite buffer.
      InnoDB: Dump of corresponding page in doublewrite buffer:
      

      The last quoted line is displayed by the function buf_dblwr_process():

                                      } else if (buf_page_is_corrupted(true,
                                                                       page,
                                                                       zip_size)) {
                                              fprintf(stderr,
                                                      "InnoDB: Dump of the page:\n");
                                              buf_page_print(
                                                      read_buf, zip_size,
                                                      BUF_PAGE_PRINT_NO_CRASH);
                                              fprintf(stderr,
                                                      "InnoDB: Dump of"
                                                      " corresponding page"
                                                      " in doublewrite buffer:\n");
      

      The logic is flawed, because nothing prevents us from having multiple copies of the same page in the doublewrite buffer. If the server was killed during a write to the doublewrite page, then restarted, and killed again during a wite to the data page so that the doublewrite buffer still contains the corrupted copy from the first kill, recovery would unnecessarily abort the server process, making the database inaccessible.

      Attachments

        Issue Links

          Activity

            Pushed to 10.1.21. I think that we should do it in 10.0 as well, so I am not closing this yet.

            marko Marko Mäkelä added a comment - Pushed to 10.1.21. I think that we should do it in 10.0 as well, so I am not closing this yet.

            I think that a slight improvement to the logic is needed. Currently, buf_dblwr_process() is choosing the first valid-looking copy of the page that it encounters, even though there could be newer copies (with later LSN). Restoring a too old copy of the page may cause information loss and corruption of the database. buf_dblwr_process() should restore a valid copy of the page with the latest LSN, similar to how fil_user_tablespace_restore_page() does it.

            marko Marko Mäkelä added a comment - I think that a slight improvement to the logic is needed. Currently, buf_dblwr_process() is choosing the first valid-looking copy of the page that it encounters, even though there could be newer copies (with later LSN). Restoring a too old copy of the page may cause information loss and corruption of the database. buf_dblwr_process() should restore a valid copy of the page with the latest LSN, similar to how fil_user_tablespace_restore_page() does it.

            While investigating this further, I noticed a bug in recv_dblwr_t::find_page() which is used by fil_user_tablespace_restore_page() for restoring the first page of each tablespace, if that page is corrupted.
            The function recv_dblwr_t::find_page() is only looking for the page with the biggest LSN matching the tablespace ID and the page number. If that page is corrupted but there is a slightly older but good copy of the page, we will ruin our changes to recover from the crash.

            The fix of this issue in 10.1.21 was incomplete, and more work will be needed.

            marko Marko Mäkelä added a comment - While investigating this further, I noticed a bug in recv_dblwr_t::find_page() which is used by fil_user_tablespace_restore_page() for restoring the first page of each tablespace, if that page is corrupted. The function recv_dblwr_t::find_page() is only looking for the page with the biggest LSN matching the tablespace ID and the page number. If that page is corrupted but there is a slightly older but good copy of the page, we will ruin our changes to recover from the crash. The fix of this issue in 10.1.21 was incomplete, and more work will be needed.

            Patch is in bb-10.2-MDEV-11799

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.2- MDEV-11799

            Thank you, the high-level idea of the fix looks good. I spotted some further problems and potential for improvement.
            I think that we will also need a 10.4 version of this for testing the MDEV-23233 scenario.

            This fix must be stress-tested with RQG (killing and restarting the server during a DML-only workload, to avoid hitting DDL-related problems that would hopefully some day be addressed by MDEV-17567). Testing with Mariabackup is not going to be fruitful, because Mariabackup is not using the doublewrite buffer at all.

            I do not think that this can be meaningfully tested with MTR. Only for the MDEV-23233 scenario where we would wrongly replace a valid page_compressed page with an invalid uncompressed one, we could write a test, but only starting with 10.3, because the exact scenario depends on MDEV-16328.

            marko Marko Mäkelä added a comment - Thank you, the high-level idea of the fix looks good. I spotted some further problems and potential for improvement. I think that we will also need a 10.4 version of this for testing the MDEV-23233 scenario. This fix must be stress-tested with RQG (killing and restarting the server during a DML-only workload, to avoid hitting DDL-related problems that would hopefully some day be addressed by MDEV-17567 ). Testing with Mariabackup is not going to be fruitful, because Mariabackup is not using the doublewrite buffer at all. I do not think that this can be meaningfully tested with MTR. Only for the MDEV-23233 scenario where we would wrongly replace a valid page_compressed page with an invalid uncompressed one, we could write a test, but only starting with 10.3, because the exact scenario depends on MDEV-16328 .

            I merged this up to 10.4 and tested 10.4 with the MDEV-23231 data directory.

            marko Marko Mäkelä added a comment - I merged this up to 10.4 and tested 10.4 with the MDEV-23231 data directory.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.