[MDEV-9771] server doesn't start because of one crashed table Created: 2016-03-21  Updated: 2016-04-22  Resolved: 2016-04-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.23
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Pavel Dvorak Assignee: Jan Lindström (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

debian 8.3


Attachments: Text File mariadb_error.log     HTML File syslog    

 Description   

server crashed and then wouldn't start
we didn't know why and last we deleted one table and it started again
why is this a fatal error? thanks



 Comments   
Comment by Elena Stepanova [ 2016-03-21 ]

Hi,

I see two variations of a fatal data error in your log:

2016-03-18 20:38:22 7f2793865780  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
160318 20:38:22 [ERROR] InnoDB: File (unknown): 'read' returned OS error 71. Cannot continue operation

repeated a number of times between 160318 20:38:18 and 160318 21:26:58;
then there was a more successful (although very messy) startup at 160318 21:26:58, it got the server up and somewhat running, but it went down later anyway with

2016-03-18 21:45:38 7f0b2d6d1700  InnoDB: Operating system error number 17 in a file operation.
InnoDB: Error number 17 means 'File exists'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
160318 21:45:38 [ERROR] InnoDB: Cannot create file './gps_DAITE/data.ibd'
...
160318 21:52:01 [ERROR] InnoDB: Failed to find tablespace for table '"gps_DAITE"."data"' in the cache. Attempting to load the tablespace with space id 4578136.
160318 21:52:01 [ERROR] InnoDB: In file './gps_DAITE/data.ibd', tablespace id and flags are 4578186 and 0, but in the InnoDB data dictionary they are 4578136 and 0. Have you moved InnoDB .ibd files around without using the commands DISCARD TABLESPACE and IMPORT TABLESPACE? Please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2016-03-18 21:52:01 7f0b392da700  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
160318 21:52:01 [ERROR] InnoDB: Could not find a valid tablespace file for 'gps_DAITE/data'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
160318 21:52:01 [Warning] InnoDB: Orphan table encountered during DROP DATABASE. This is possible if 'gps_DAITE/data.frm' was lost.
160318 21:52:01 [Warning] InnoDB: Missing gps_DAITE/data.ibd file for table gps_DAITE/data.
2016-03-18 21:53:10 7f0b2fffa700  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
160318 21:53:10 [ERROR] InnoDB: File ./gps_DAITE/data.ibd: 'open' returned OS error 71. Cannot continue operation

Then there was a relatively successful (but also not clean) startup at 160318 21:55:10, and even normal shutdown.
And then there was the last startup at 160318 22:00:40, again on a fairly broken datadir, but it worked.

So, which of the two "error 71" variations are you referring to, and which one did you solve by removing a table (and what do you mean by that – that you removed an *.ibd file?

Also might worth paying attention:

160318 21:37:00 [Warning] InnoDB: Failed to set O_DIRECT on file /mnt/ramdisk/#sql78ce_3c5_0.ibd: CREATE: Invalid argument, continuing anyway. O_DIRECT is known to result in 'Invalid argument' on Linux on tmpfs, see MySQL Bug#26662.

Comment by Pavel Dvorak [ 2016-03-21 ]

i think gps_DAITE errors occured when we tried to reimport the database and the server was running

problem was this 160318 20:38:22 [ERROR] InnoDB: File (unknown): 'read' returned OS error 71. Cannot continue operation
there's unknown so we didn't know which file to remove
i think trese was a file strarting with #
so we deleted it and after that it started

Comment by Elena Stepanova [ 2016-03-21 ]

Thanks.
I don't see any sign of wsrep in the log, so I assume it wasn't 10.0.23-galera, but regular 10.0.23, and change the version accordingly. Please correct me if I'm wrong.

jplindst, there is a choice of things to look at in the error log.
Apparently the problem started with the disk space issue, then there was an assertion failure, then these OS errors, then remaining inconsistency.

While it's questionable if the engine should actually start with a badly corrupted data (without force_recovery) – and it's still corrupted even after the file was removed – still, obviously things go wrong there, the whole "File (unknown)" to begin with; so, I'm assigning it to you to see what and how should be fixed here.

Comment by Pavel Dvorak [ 2016-03-21 ]

yes it's regular version
maybe it should not start but i think it should have said which table/file causes problems
the startup was messy but i think there was no data loss except that table

Comment by Jan Lindström (Inactive) [ 2016-03-22 ]

Hmm, most likely the file (unknown) is the system tablespace as it has no table name, this is not a bug it is by design. InnoDB has protection that it will not start by default on corrupted datadir, instead you need to provide force_recovery option. Error log contains several different errors, out of disk space, long semaphore wait during rename table (known problem), O_DIRECT errors (known problem), etc. Can you start your database know or do you still need help to investigate the issue? If you need investigation, I would need your database.

Comment by Pavel Dvorak [ 2016-03-22 ]

yes db is running now
out of disk space was caused by tmp partition we made it bigger
long semaphore was caused by disk space error or it's something else?

so if i understand that correctly i save safely delete all temp files starting with "#"?

Comment by Elena Stepanova [ 2016-03-22 ]

jplindst, I would vote for better diagnostics in the error log, including replacing "unknown" with something meaningful, and maybe some hints on what to do with the problem (like in other cases when InnoDB helpfully suggests to use innodb_force_recovery).

Comment by Jan Lindström (Inactive) [ 2016-04-22 ]

The diagnostics that can be easily added for GA product is already there. Adding additional diagnostics is possible but not possible to do exhaustive and all cases cover fix. For example fixing this (unknow) file is not possible on all cases easily.

Comment by Jan Lindström (Inactive) [ 2016-04-22 ]

commit 298e1d3f71cce89c83645ea9cc1eade4bdaf4917
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Fri Apr 22 11:36:10 2016 +0300

Improve error diagnostics on I/O errors. If node->name is NULL
try to use space->name instead.

Generated at Thu Feb 08 07:37:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.