[MDEV-30132] Crash after recovery, with InnoDB: Tried to read ... bytes at offset - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
Fix Version/s: 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3, 10.11.2
Component/s: Data Definition - Alter Table, Data Definition - Create Table, Storage Engine - InnoDB
Labels:

Description

We got a data directory and an rr trace of failed recovery:

2022-11-28 14:25:35 0 [Note] InnoDB: Starting a batch to recover 504 pages from redo log.

2022-11-28 14:25:36 0 [Note] InnoDB: Starting a batch to recover 578 pages from redo log.

2022-11-28 14:25:37 0 [Note] InnoDB: Starting final batch to recover 295 pages from redo log.

...

2022-11-28 14:25:37 0 [Note] InnoDB: log sequence number 31381348 (memory-mapped); transaction id 27971

...

2022-11-28 14:25:39 0 [Note] /data/Server_bin/bb-10.11-new-innodb-defaults_asan_Og/bin/mysqld: ready for connections.

...

2022-11-28 14:25:40 6 [Warning] InnoDB: 8192 bytes should have been read. Only 512 bytes read. Retrying for the remaining bytes.

...

2022-11-28 14:25:40 6 [Warning] InnoDB: Retry attempts for reading partial data failed.

2022-11-28 14:25:40 6 [ERROR] InnoDB: File (null) was not found

2022-11-28 14:25:40 6 [ERROR] [FATAL] InnoDB: Tried to read 8192 bytes at offset 335872, but was only able to read 512.Cannot read from file. OS error number 2.

The fix of ~~MDEV-30069~~ turns out to be ineffective after all. Before recovery was attempted, there was a file cool_down/t7.ibd that consists of 0x58000 NUL bytes. Recovery would actually shrink the file here:

#0  0x00005643699d96ef in fil_space_extend_must_retry (space=..., node=...,

    size=..., success=...)

    at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/fil/fil0fil.cc:550

#1  0x00005643699da7cb in fil_space_t::prepare_acquired (this=...)

    at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/fil/fil0fil.cc:667

#2  0x00005643699daae4 in fil_space_t::acquire_and_prepare (this=...)

    at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/fil/fil0fil.cc:707

#3  0x000056436935aba3 in fil_space_t::acquire (this=...)

    at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/log/log0recv.cc:956

#4  recv_sys_t::recover_deferred (this=..., p=..., name=..., free_block=...)

    at /data/Server/bb-10.11-new-innodb-defaults/storage/innobase/log/log0recv.cc:954

We must invoke a special form of fil_space_t::acquire(), or we must set fil_space_t::committed_size based on the actual file size, so that the file will not be shrunk.

Furthermore, the intentional crash during a file read is against the spirit of ~~MDEV-13542~~. The read error must be propagated up the call stack.

I can reproduce the crash on MariaDB 10.9 (anything later than 10.8 will do) as follows:

tar xJf data.tar.xz

mariadbd --innodb-buffer-pool-size=4m --innodb-page-size=8k --innodb-log-file-size=100663296 --datadir "$(pwd)"/data --plugin-dir="$BUILDDIR"/plugin/file_key_management --plugin-load-add=file_key_management.so --file-key-management-filename="$(pwd)"/encryption_keys.txt --innodb-undo-tablespaces=3 &

echo 'check table cool_down.t7 extended;'|mariadb -uroot

10.9 9abd70e390bd8ef2e8347677a2aedd00151f0712
...
2022-11-30 9:08:29 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2106076
2022-11-30 9:08:29 0 [Note] InnoDB: Starting a batch to recover 159 pages from redo log.
2022-11-30 9:08:30 0 [Note] InnoDB: Starting a batch to recover 255 pages from redo log.
2022-11-30 9:08:30 0 [Note] InnoDB: Starting a batch to recover 348 pages from redo log.
2022-11-30 9:08:30 0 [Note] InnoDB: Starting a batch to recover 372 pages from redo log.
2022-11-30 9:08:30 0 [Note] InnoDB: Starting a batch to recover 429 pages from redo log.
2022-11-30 9:08:30 0 [Note] InnoDB: Starting a batch to recover 410 pages from redo log.
2022-11-30 9:08:30 0 [Note] InnoDB: Starting a batch to recover 478 pages from redo log.
2022-11-30 9:08:30 0 [Note] InnoDB: Opened 3 undo tablespaces
2022-11-30 9:08:30 0 [Note] InnoDB: Starting final batch to recover 146 pages from redo log.
...
Version: '10.9.5-MariaDB-debug' socket: '/dev/shm/mariadb.sock' port: 3306 Source distribution
2022-11-30 9:08:54 3 [Warning] InnoDB: 8192 bytes should have been read. Only 512 bytes read. Retrying for the remaining bytes.
...
2022-11-30 9:08:54 3 [Warning] InnoDB: Retry attempts for reading partial data failed.
2022-11-30 9:08:54 3 [ERROR] InnoDB: File (null) was not found
2022-11-30 9:08:54 3 [ERROR] [FATAL] InnoDB: Tried to read 8192 bytes at offset 335872, but was only able to read 512.Cannot read from file. OS error number 2.

If I use a larger innodb_buffer_pool_size so that everything will be recovered in a single batch, the database will not crash.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

encryption_keys.txt
0.4 kB
2022-11-30 07:09
data.tar.xz
4.34 MB
2022-11-30 07:09

Issue Links

relates to

MDEV-13542 Crashing on a corrupted page is unhelpful

Closed

MDEV-24626 Remove synchronous write of page0 and flushing file during file creation

Closed

MDEV-30069 InnoDB: Trying to write ... bytes at ... outside the bounds of the file ...

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2022-11-30 07:15

Updated:: 2024-07-07 19:49

Resolved:: 2022-11-30 12:14

Time Tracking

Estimated:

Remaining:

Logged:

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.