[MDEV-19319] InnoDB: Assertion failure in thread Created: 2019-04-24  Updated: 2021-12-08  Resolved: 2021-12-08

Status: Closed
Project: MariaDB Server
Component/s: Server, Storage Engine - InnoDB
Affects Version/s: 5.5.60
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Leo Kirchner Assignee: Marko Mäkelä
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Virtual Red Hat Enterprise Linux 7.6
2 GB RAM, 1 CPU (x64)


Attachments: Text File mariadb.log    

 Description   

Using software PHPIPAM mysqld restarts whenever there's a bit of load on the database. I think that there's no lost data, but I'm not sure. I've attached the mariadb.log file. One of the instances of this issue starts at line 10140 of the file and ends at the very end.



 Comments   
Comment by Marko Mäkelä [ 2019-04-24 ]

To make this more searchable: mariadb.log contains the following:

Version: '5.5.60-MariaDB'  socket: '/data/mysql/mysql.sock'  port: 3306  MariaDB Server
190408  6:48:49  InnoDB: Error: the OS said file flush did not succeed
190408  6:48:49  InnoDB: Operating system error number 5 in a file operation.
InnoDB: Error number 5 means 'Input/output error'.

Which file system is this running on? Are there any related messages in the system log? sudo dmesg? Have you run a file system check?

kevg mentioned today that InnoDB is sometimes ignoring errors from fsync(), but this seems to be the opposite case.

I think that the only viable alternative to crashing would be to make the database instance read-only. But that could be even harder to pull off than more robust handling of read errors (MDEV-13542). And some users could still prefer a crash to a degraded level of service, so that they can quickly switch over to a ‘healthy’ replica.

Comment by Leo Kirchner [ 2019-04-25 ]

sudo dmesg shows this four times:

[288957.956246] sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[288957.956262] sd 2:0:0:0: [sda] Sense Key : Hardware Error [current]
[288957.956267] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[288957.956274] sd 2:0:0:0: [sda] CDB: Write(10) 2a 00 00 f6 56 a8 00 00 08 00
[288957.956282] blk_update_request: critical target error, dev sda, sector 16144040

the times stamps don't correspond to the mariadb-errors though. The filesystem is XFS. The system log isn't showing any related errors either.

Comment by Leo Kirchner [ 2019-04-25 ]

Alright now the process has crashed and isn't restarting. Systemd shows the following errors

Apr 25 10:53:30 phpipam systemd: Starting MariaDB database server...
Apr 25 10:53:30 phpipam mariadb-prepare-db-dir: Database MariaDB is probably initialized in /data/mysql already, nothing is done.
Apr 25 10:53:30 phpipam mariadb-prepare-db-dir: If this is not the case, make sure the /data/mysql is empty before running mariadb-prepare-db-dir.
Apr 25 10:53:30 phpipam mysqld_safe: 190425 10:53:30 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'.
Apr 25 10:53:30 phpipam mysqld_safe: 190425 10:53:30 mysqld_safe Starting mysqld daemon with databases from /data/mysql
Apr 25 10:53:31 phpipam systemd: mariadb.service: control process exited, code=exited status=1
Apr 25 10:53:31 phpipam systemd: Failed to start MariaDB database server.
Apr 25 10:53:31 phpipam systemd: Unit mariadb.service entered failed state.
Apr 25 10:53:31 phpipam systemd: mariadb.service failed.

while the mariadb.log file shows the exact same errors.

When I remove the redo-logfiles from the mariadb directory the service starts up correctly.

Comment by Marko Mäkelä [ 2019-04-29 ]

It looks like we cannot rule out hardware failure. You could also try to see how many errors are reported by the following:

sudo smartctl -a /dev/sda

If you remove the InnoDB redo logs (or use innodb_force_recovery=6), then you can expect any sort of corruption. That could only be used as a last resort for dumping the data if you did not have recent backups. And I would not blindly trust the consistency of the data. If the server buffer pool was mostly clean before the restart, it might be close to consistent.

Comment by Marko Mäkelä [ 2021-12-08 ]

I think that it is perfectly reasonable to kill the server if a write to the redo log fails. Theoretically, it could be thinkable to make the server read-only, but it would be rather much effort to do that, for relatively little gain.

Generated at Thu Feb 08 08:50:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.