[MDEV-27593] Crashing on I/O error is unhelpful Created: 2022-01-24  Updated: 2023-12-08  Resolved: 2023-08-01

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.13, 10.8.1
Fix Version/s: 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1

Type: Bug Priority: Critical
Reporter: Daniel Black Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: None
Environment:

fc35 + uring


Issue Links:
Blocks
blocks MDEV-26970 MariaDB InnoDB Plugin Failure - NUC/C... Confirmed
is blocked by MDEV-13542 Crashing on a corrupted page is unhel... Closed
Duplicate
is duplicated by MDEV-32011 'aio write' returned OS error 222. Ca... Closed
Relates
relates to MDEV-29610 uring tmpfs InnoDB: IO Error: 125 dur... Stalled
relates to MDEV-28751 mariadb-install-db fails writing ibda... Open
relates to MDEV-28857 innodb assertion cb->m_err == DB_SUCC... Closed
relates to MDEV-28859 MariaDB Assert Crash Using mysqlbackup Closed
relates to MDEV-32972 mysqld got exception 0x80000003 Closed

 Description   

In a case of filling up a disk with space an assertion was generated.

Like MDEV-25760 fatal error with code would be better.

2022-01-24  8:21:51 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-01-24  8:21:51 0 [Note] InnoDB: Loading buffer pool(s) from /mnt/dd/ib_buffer_pool
2022-01-24  8:21:51 0 [Note] InnoDB: Buffer pool(s) load completed at 220124  8:21:51
2022-01-24  8:21:51 0 [Note] sql/mysqld: ready for connections.
Version: '10.8.0-MariaDB'  socket: '/tmp/mysql.sock'  port: 0  Source distribution
2022-01-24 12:04:01 4 [Warning] Aborted connection 4 to db: 'test' user: 'dan' host: 'localhost' (Got an error writing communication packets)
2022-01-24 12:09:33 0x7f7a8dffb640  InnoDB: Assertion failure in file /home/dan/repos/mariadb-server-10.8/storage/innobase/os/os0file.cc line 3596
InnoDB: Failing assertion: cb->m_err == DB_SUCCESS
InnoDB: We intentionally generate a memory trap.

backtrace

$ coredumpctl debug
           PID: 16931 (mysqld)
           UID: 1000 (dan)
           GID: 1000 (dan)
        Signal: 6 (ABRT)
     Timestamp: Mon 2022-01-24 12:09:33 AEDT (4min 3s ago)
  Command Line: sql/mysqld --no-defaults --skip-networking --datadir=/mnt/dd --verbose
    Executable: /home/dan/repos/build-mariadb-server-10.8/sql/mariadbd
 
Core was generated by `sql/mysqld --no-defaults --skip-networking --datadir=/mnt/dd --verbose'.
Program terminated with signal SIGABRT, Aborted.
 
gdb) bt full
#0  0x00007f7ad93df84c in __pthread_kill_implementation () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f7ad93926a6 in raise () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007f7ad937c865 in abort () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000000000c9ecfb in ut_dbg_assertion_failed (expr=0x1152a4f "cb->m_err == DB_SUCCESS", file=<optimized out>, line=<optimized out>, line@entry=3596) at /home/dan/repos/mariadb-server-10.8/storage/innobase/ut/ut0dbg.cc:60
No locals.
#4  0x0000000000c037d0 in io_callback (cb=<optimized out>) at /home/dan/repos/mariadb-server-10.8/storage/innobase/os/os0file.cc:3596
        request = <optimized out>
#5  0x0000000000d6588d in tpool::task_group::execute (this=0x2348930, t=0x23613d0) at /home/dan/repos/mariadb-server-10.8/tpool/task_group.cc:55
        lk = {_M_device = 0x2348960, _M_owns = false}
#6  0x0000000000d63976 in tpool::thread_pool_generic::worker_main (this=0x22dd700, thread_var=0x22ed2a0) at /home/dan/repos/mariadb-server-10.8/tpool/tpool_generic.cc:549
        task = 0x4223
#7  0x00007f7ad964e5c4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#8  0x00007f7ad93dda87 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#9  0x00007f7ad9462640 in clone3 () from /lib64/libc.so.6
No symbol table info available.
(gdb) up
#1  0x00007f7ad93926a6 in raise () from /lib64/libc.so.6
(gdb) up
#2  0x00007f7ad937c865 in abort () from /lib64/libc.so.6
(gdb) up
#3  0x0000000000c9ecfb in ut_dbg_assertion_failed (expr=0x1152a4f "cb->m_err == DB_SUCCESS", file=<optimized out>, line=<optimized out>, line@entry=3596) at /home/dan/repos/mariadb-server-10.8/storage/innobase/ut/ut0dbg.cc:60
60		abort();
(gdb) up
#4  0x0000000000c037d0 in io_callback (cb=<optimized out>) at /home/dan/repos/mariadb-server-10.8/storage/innobase/os/os0file.cc:3596
3596	  ut_a(cb->m_err == DB_SUCCESS);
(gdb) p cb
$1 = <optimized out>
(gdb) p cb->m_err
value has been optimized out
(gdb) list
3591	
3592	extern void fil_aio_callback(const IORequest &request);
3593	
3594	static void io_callback(tpool::aiocb *cb)
3595	{
3596	  ut_a(cb->m_err == DB_SUCCESS);
3597	  const IORequest &request= *static_cast<const IORequest*>
3598	    (static_cast<const void*>(cb->m_userdata));
3599	
3600	  /* Return cb back to cache*/
(gdb) up
#5  0x0000000000d6588d in tpool::task_group::execute (this=0x2348930, t=0x23613d0) at /home/dan/repos/mariadb-server-10.8/tpool/task_group.cc:55
55	        t->m_func(t->m_arg);
(gdb) p *t
$2 = (tpool::task) {_vptr$task = 0x1174848 <vtable for tpool::task+16>, m_func = 0xc03740 <io_callback(tpool::aiocb*)>, m_arg = 0x2361378, m_group = 0x2348930}
(gdb) p (tpool::aiocb *) t->m_arg
$3 = (tpool::aiocb *) 0x2361378
(gdb) p *((tpool::aiocb *) t->m_arg)
$4 = {<iovec> = {iov_base = 0x7f7ab47f8000, iov_len = 2097152}, m_fh = 12, m_opcode = tpool::aio_opcode::AIO_PWRITE, m_offset = 1048576, m_buffer = 0x7f7ab47f8000, m_len = 2097152, m_callback = 0xc03740 <io_callback(tpool::aiocb*)>, m_group = 0x2348930, m_ret_len = 0, m_err = 28, m_internal = 0x0, m_internal_task = {_vptr$task = 0x1174848 <vtable for tpool::task+16>, m_func = 0xc03740 <io_callback(tpool::aiocb*)>, m_arg = 0x2361378, m_group = 0x2348930}, m_userdata = '\000' <repeats 16 times>, "\310\065V\002\000\000\000\000\031\000\000\000\000\000\000"}



 Comments   
Comment by Daniel Black [ 2022-01-24 ]

PR 1995.

The file descriptor to name is a bit of work wlad mentioned in a former PR. So I'm hoping a number is sufficient for now.

Comment by Marko Mäkelä [ 2022-01-24 ]

I think that we should distinguish page read and write errors.

On page write error, we do not have much other choice than to eventually terminate the process. I think that the minimum necessary handling is that we simply will leave the page dirty in the buffer pool, which will correctly prevent a log checkpoint from occurring.

Page read errors should be handled in the same way as checksum mismatch: evict the page frame from the buffer pool, and possibly propagate the error some way to the caller (as noted in MDEV-13542).

I think that at least the write side can be improved. There is no need to immediately crash the server on a page write failure.

Comment by Daniel Black [ 2022-02-18 ]

We should retry a corrupted page read for 100 times and then give up. For the system tablespace we will intentionally crash the server. That is not optimal, and maybe also the 100 rereads are not optimal, now that HDDs are getting as rare as hen’s teeth.

Comment by Marko Mäkelä [ 2022-02-18 ]

I can repeat crashes with io_uring, but not with libaio, when running the following:

./mtr --parallel=auto --repeat=100 encryption.innodb-checksum-algorithm{,,,}{,,}

On NVMe instead of /dev/shm, this took a bit longer time to crash. The cb->m_err was always 195 (not 28). I repeated it on 5.17-rc3 and Debian’s 5.16-0-1-amd64 (5.16.7).

Comment by Daniel Black [ 2022-07-12 ]

NFS note from lwn article, EIO is accessing a file descriptor when its NFS lock has been lost.

Comment by suresh ramagiri [ 2023-06-26 ]

One of our customers running on MariaDB 10.5.10, hit the same assertion reported here, disk usage wise, it was not full, but can see I/O errors in the dmesg

error logs:
======
2023-06-23 09:22:50 0x7fc2e96a4700 InnoDB: Assertion failure in file /home/jenkins/workspace/DEV-Branch/RPM-Builds/label/rhel-7/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/innobase/os/os0file.cc line 3890
InnoDB: Failing assertion: cb->m_err == DB_SUCCESS
InnoDB: We intentionally generate a memory trap.

dmesg:
=======
[Fri Jun 23 08:55:20 2023] blk_update_request: I/O error, dev fd0, sector 0
[Fri Jun 23 09:25:16 2023] blk_update_request: I/O error, dev fd0, sector 0
[Fri Jun 23 09:25:19 2023] blk_update_request: I/O error, dev fd0, sector 0
[Fri Jun 23 09:25:22 2023] blk_update_request: I/O error, dev fd0, sector 0
[Fri Jun 23 09:55:14 2023] blk_update_request: I/O error, dev fd0, sector 0
[Fri Jun 23 09:55:17 2023] blk_update_request: I/O error, dev fd0, sector 0

It looks me I/O error is related to the floppy drive, which is not there and still trying to refer and getting the I/O error.
However, from our side how best we can handle and avoid the assertion is something we should understand and fix it, is want I feel. Looking forward for your comments on this.

Comment by Marko Mäkelä [ 2023-06-26 ]

While 10.5 only uses libaio and not liburing, fixing MDEV-29610 could be helpful.

Comment by Marko Mäkelä [ 2023-08-01 ]

This is conceptually related to MDEV-13542.

When a page write fails, we should do everything like we do for a successful page write, except mark the page as clean (or remove it from buf_pool.flush_list or buf_pool.LRU). A subsequent iteration of buf_flush_page_cleaner() should retry the write later. If all subsequent writes fail, then buf_flush_page_cleaner() will fail to make any progress, and all writer threads will eventually be blocked in log_free_check() because the log checkpoint cannot be advanced to make room in the circular write-ahead log ib_logfile0.

When an asynchronous page read fails, we can simply discard the corrupted page from the buffer pool. If the page is requested via buf_page_get_low(), then a synchronous read will be invoked via buf_page_read(). In MDEV-13542, that code path was already fixed so that I/O errors and checksum mismatch are being treated in a similar way.

Generated at Thu Feb 08 09:54:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.