Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30132

Crash after recovery, with InnoDB: Tried to read ... bytes at offset

    XMLWordPrintable

Details

    Description

      We got a data directory and an rr trace of failed recovery:

      2022-11-28 14:25:35 0 [Note] InnoDB: Starting a batch to recover 504 pages from redo log.
      2022-11-28 14:25:36 0 [Note] InnoDB: Starting a batch to recover 578 pages from redo log.
      2022-11-28 14:25:37 0 [Note] InnoDB: Starting final batch to recover 295 pages from redo log.
      ...
      2022-11-28 14:25:37 0 [Note] InnoDB: log sequence number 31381348 (memory-mapped); transaction id 27971
      ...
      2022-11-28 14:25:39 0 [Note] /data/Server_bin/bb-10.11-new-innodb-defaults_asan_Og/bin/mysqld: ready for connections.
      ...
      2022-11-28 14:25:40 6 [Warning] InnoDB: 8192 bytes should have been read. Only 512 bytes read. Retrying for the remaining bytes.
      ...
      2022-11-28 14:25:40 6 [Warning] InnoDB: Retry attempts for reading partial data failed.
      2022-11-28 14:25:40 6 [ERROR] InnoDB: File (null) was not found
      2022-11-28 14:25:40 6 [ERROR] [FATAL] InnoDB: Tried to read 8192 bytes at offset 335872, but was only able to read 512.Cannot read from file. OS error number 2.
      

      The fix of MDEV-30069 turns out to be ineffective after all. Before recovery was attempted, there was a file cool_down/t7.ibd that consists of 0x58000 NUL bytes. Recovery would actually shrink the file here:

      #0  0x00005643699d96ef in fil_space_extend_must_retry (space=..., node=..., 
          size=..., success=...)
          at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/fil/fil0fil.cc:550
      #1  0x00005643699da7cb in fil_space_t::prepare_acquired (this=...)
          at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/fil/fil0fil.cc:667
      #2  0x00005643699daae4 in fil_space_t::acquire_and_prepare (this=...)
          at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/fil/fil0fil.cc:707
      #3  0x000056436935aba3 in fil_space_t::acquire (this=...)
          at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/log/log0recv.cc:956
      #4  recv_sys_t::recover_deferred (this=..., p=..., name=..., free_block=...)
          at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/log/log0recv.cc:954
      

      We must invoke a special form of fil_space_t::acquire(), or we must set fil_space_t::committed_size based on the actual file size, so that the file will not be shrunk.

      Furthermore, the intentional crash during a file read is against the spirit of MDEV-13542. The read error must be propagated up the call stack.

      I can reproduce the crash on MariaDB 10.9 (anything later than 10.8 will do) as follows:

      tar xJf data.tar.xz
      mariadbd --innodb-buffer-pool-size=4m --innodb-page-size=8k --innodb-log-file-size=100663296 --datadir "$(pwd)"/data --plugin-dir="$BUILDDIR"/plugin/file_key_management --plugin-load-add=file_key_management.so --file-key-management-filename="$(pwd)"/encryption_keys.txt --innodb-undo-tablespaces=3 &
      echo 'check table cool_down.t7 extended;'|mariadb -uroot
      

      10.9 9abd70e390bd8ef2e8347677a2aedd00151f0712

      ...
      2022-11-30  9:08:29 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2106076
      2022-11-30  9:08:29 0 [Note] InnoDB: Starting a batch to recover 159 pages from redo log.
      2022-11-30  9:08:30 0 [Note] InnoDB: Starting a batch to recover 255 pages from redo log.
      2022-11-30  9:08:30 0 [Note] InnoDB: Starting a batch to recover 348 pages from redo log.
      2022-11-30  9:08:30 0 [Note] InnoDB: Starting a batch to recover 372 pages from redo log.
      2022-11-30  9:08:30 0 [Note] InnoDB: Starting a batch to recover 429 pages from redo log.
      2022-11-30  9:08:30 0 [Note] InnoDB: Starting a batch to recover 410 pages from redo log.
      2022-11-30  9:08:30 0 [Note] InnoDB: Starting a batch to recover 478 pages from redo log.
      2022-11-30  9:08:30 0 [Note] InnoDB: Opened 3 undo tablespaces
      2022-11-30  9:08:30 0 [Note] InnoDB: Starting final batch to recover 146 pages from redo log.
      ...
      Version: '10.9.5-MariaDB-debug'  socket: '/dev/shm/mariadb.sock'  port: 3306  Source distribution
      2022-11-30  9:08:54 3 [Warning] InnoDB: 8192 bytes should have been read. Only 512 bytes read. Retrying for the remaining bytes.
      ...
      2022-11-30  9:08:54 3 [Warning] InnoDB: Retry attempts for reading partial data failed.
      2022-11-30  9:08:54 3 [ERROR] InnoDB: File (null) was not found
      2022-11-30  9:08:54 3 [ERROR] [FATAL] InnoDB: Tried to read 8192 bytes at offset 335872, but was only able to read 512.Cannot read from file. OS error number 2.
      

      If I use a larger innodb_buffer_pool_size so that everything will be recovered in a single batch, the database will not crash.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.