[MDEV-17482] InnoDB fails to say which fatal error fsync() returned Created: 2018-10-17  Updated: 2020-12-08  Resolved: 2019-03-18

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB
Affects Version/s: 10.2.15
Fix Version/s: 10.2.23, 10.3.14, 10.4.4

Type: Bug Priority: Critical
Reporter: Seonghwan Kim Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: galera, innodb
Environment:

IBM Cloud Env.
Linux
3-noce Galera Cluster


Attachments: Text File error_log-18.10.17.txt    
Issue Links:
Relates
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-21215 Random InnoDB: fsync() returned 5 usi... Closed

 Description   

Hi,
Today, Galera's one node was crashed down.
I don't know why is it happened.
Crashed node is partitioned from cluster, and some time later(28min) crashed down.

I think it seems like InnoDB engine's bug
I attached error log.

Thank you.



 Comments   
Comment by Jan Lindström (Inactive) [ 2019-03-18 ]

fsync has failed but log does not contain any error code so it is not exactly clear why. Did you check that file system has enough space ? Have you seen this crash more than one time or is this the only time ?

Comment by Seonghwan Kim [ 2019-03-18 ]

I think it may enough disk space in that node at that time.
I known, this is the only time that crash is happened.

Comment by Marko Mäkelä [ 2019-03-18 ]

The Linux manual page for fsync(2) reports the following possible errors, which could cause the intentional crash in InnoDB:

EBADF
fd is not a valid open file descriptor.
ENOSPC
Disk space was exhausted while synchronizing.
EROFS, EINVAL
fd is bound to a special file (e.g., a pipe, FIFO, or socket) which does not support synchronization.
ENOSPC, EDQUOT
fd is bound to a file on NFS or another filesystem which does not allocate space at the time of a write(2) system call, and some previous write failed due to insufficient storage space.

Since MariaDB Server 10.2.17, also EIO will result in an intentional crash.

Theoretically, once MDEV-13542 has been finally fixed, we might handle fsync() or write or allocation failures on user data files a little more gracefully: flag the affected index or table as corrupted, and mark the table read-only.

If a write to the redo log fails, I think that the only reasonable options are to kill the server (like we do now) or to make all InnoDB tables read-only.

For now, I will only change the diagnostics so that before killing the server, InnoDB will say which error code was returned by fsync().

Generated at Thu Feb 08 08:36:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.