[MDEV-25414] Server crashed and impossible start it Created: 2021-04-14  Updated: 2021-04-16  Resolved: 2021-04-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Upgrades
Affects Version/s: 10.5.9
Fix Version/s: N/A

Type: Bug Priority: Blocker
Reporter: stanley h Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: crash, innodb, need_feedback, pdo
Environment:

CentOS 7


Attachments: Text File _session.log     JPEG File mysql.JPG     Text File mysql_error.log    

 Description   

I have very very simple identical query with one WRERE which will return only one row result. After random time from 1 minute to few hours with same query will return nothing, but that row is in table. My PHP 7.4.xx PDO client will not throw any exception, no error, no warning, nothing. Just return 0 rowCount.
Otherwise it works well and always return requested row.
At the moment when this happen, I send my ping() command which is like "SELECT 1+1" and it will return 2 without problem.

options:
PDO::ATTR_PERSISTENT => false,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_EMULATE_PREPARES => false,
PDO::MYSQL_ATTR_FOUND_ROWS => true,

queries after connections:
SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;
SET CHARACTER SET utf8mb4;
SET SESSION sql_mode = "NO_BACKSLASH_ESCAPES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION";
SET collation_connection = utf8mb4_unicode_ci;
SET collation_server = utf8mb4_unicode_ci;

By me query is not soo important as the fact.
CREATE TABLE IF NOT EXISTS `datas`
(
`id` varchar(32) NOT NULL,
`access` int(10) unsigned DEFAULT NULL,
`data` text NULL,
PRIMARY KEY (`session_id`)
) ROW_FORMAT = DYNAMIC
ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4
COLLATE = utf8mb4_unicode_ci;

I made all possible debugs which I can do but nothing found.
Today I tried create test with some procedure which will select same row and after some correct calls my DB crashed and its impossible start it.

My test with Win10 machine is everything OK.



 Comments   
Comment by Marko Mäkelä [ 2021-04-14 ]

What happened between the following lines in mysql_error.log?

2021-04-13  8:04:52 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-04-13  8:04:52 0 [Note] Server socket created on IP: '::'.
2021-04-13 13:37:18 0 [Note] mysqld.exe: Aria engine: starting recovery

It looks like the 10.4.13 server was killed in action and then a 10.5.9 server was started up on the data files.

The InnoDB redo log format was changed in MDEV-12353, and we removed support for the old redo log format. That is why InnoDB tells you that it cannot start up:

2021-04-13 13:37:18 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.13.
2021-04-13 13:37:18 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2021-04-13 13:37:18 0 [Note] InnoDB: Starting shutdown...
2021-04-13 13:37:18 0 [ERROR] Plugin 'InnoDB' init function returned error.
2021-04-13 13:37:18 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2021-04-13 13:37:18 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-04-13 13:37:18 0 [ERROR] Unknown/unsupported storage engine: InnoDB
2021-04-13 13:37:18 0 [ERROR] Aborting
InnoDB: using atomic writes.

The last line that I quoted is somewhat curious. Was it really displayed by the 10.5.9 server, after the startup already failed? In any case, that message would be removed in MDEV-23076.

Maybe you could check the system and kernel logs for what happened during that day. I would think that upgrades should be planned and executed carefully, with a clean shutdown of the old server, possibly backing up all data, and then starting the new server.

Comment by Marko Mäkelä [ 2021-04-14 ]

The contents of mysql_error.log suggests that after a couple upgrade attempts, the ib_logfile* files were deleted and the 10.5.9 server started up. That is a recipe for disaster. Any changes that had not been written from the buffer pool of the killed 10.4.13 server would be lost. It is possible that some links between data pages would be broken. InnoDB has never been designed to deal with that; it loves to crash in various ways. For example, if an index tree split occurred and only a part of the affected pages had been written to the data files, some pointers between B+ tree pages would be incorrect. Likewise, the DB_TRX_ID,DB_ROLL_PTR system columns in clustered index leaf page records could point to something that is not a valid undo log record, because the undo log page and the clustered index leaf page were not written to the data files in sync.

I believe that anyone should achieve the same effect (in the worst case, loss of all InnoDB data) by the following:

  1. Start the server.
  2. Start some workload on the server.
  3. Kill the server.
  4. Delete the log files.
  5. Restart the server.

The redo and undo logs are the glue that make InnoDB crash-safe and ACID.

I have been thinking to remove the dangerous InnoDB practice of automatically creating a new log file if it is missing on startup. For this to work, we would have to provide means for creating an empty log with a specified log sequence number (LSN) for those (rare) use cases where it is needed. That could be done in a later major release as part of MDEV-14425 (which would be a hopefully final change to the redo log file format).

Comment by Vladislav Vaintroub [ 2021-04-14 ]

MariaDB's own installation, on Windows, never upgrades in place. It does make use of the fact that different versions install into different directories
https://mariadb.com/kb/en/upgrading-mariadb-on-windows/

XAMPP might have its own idea on how to upgrade, which is not really compatible with what we are doing. You should not have been able to upgrade without an orderly shutdown, and there is no single orderly shutdown in the server error log.
The message "2021-04-13 13:39:29 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.13."
is accurate, and after that you should not have been able to start again in a new version. Hence
There is a strong indicator that the redo log was deleted somewhere between 2021-04-13 13:39:29 and 2021-04-13 13:45:22. Whether you did that yourself, or whatever procedure you used, it is a no-do, and whatever happens after that, can't really be fixed by us.

To reiterate, whenever you see "Upgrade after a crash is not supported" , it means that there was no shutdown. A shutdown leaves some shutdown messages in the log, as you can see, if you issue a "SHUTDOWN" command e.g in mysql.exe, or if you shut down the Windows service, or if you use mysqladmin to do shutdown, etc

Comment by stanley h [ 2021-04-14 ]

OK, soo why my select query not return row which exists in database? Why its connection alive and connector not reporting any errors? Why its happen only sometimes and not everytimes? And how to resolve it or how get more info about db server at that moment to analyze it because it looks like everything is OK?

Comment by stanley h [ 2021-04-14 ]

Looks like similar problem is reported here.
https://bugs.php.net/bug.php?id=80730

Comment by Marko Mäkelä [ 2021-04-15 ]

standus, I am closing this as "not a bug", because I do not think that this is a bug in MariaDB Server or the upgrade that has been implemented in MariaDB. We are not responsible for packaging that has been created by others.

It is not a bug that if you delete the InnoDB write-ahead log (ib_logfile*) or cause recovery to be skipped (by setting innodb_force_recovery=6), various bad things can happen.

InnoDB is just one of the many storage engines that MariaDB Server supports. It is a design decision that the failure to start up one storage engine will not prevent the entire server from starting up. Oracle made a different design decision in MySQL 5.7, making InnoDB a mandatory storage engine, that is, refusing to start up the entire server if InnoDB cannot be started. In MariaDB Server 10.2, I made an effort to make the InnoDB startup more fault-tolerant, so that on any failure, the subsystem will be closed properly and the rest of the server can start up.

It appears that you are using XAMPP and that the upgrade procedure that you followed did not perform an orderly shutdown of the old server. It appears to me that the proper place to report this would be https://community.apachefriends.org or an XAMPP bug reporting system, if there is one.

Comment by stanley h [ 2021-04-15 ]

OK, I understand that crash could be my fault but another problem which I described exists on CentOS.

Comment by stanley h [ 2021-04-16 ]

I found another important information. Every day 1 hour after midnight server has no queries active. Why?

Generated at Thu Feb 08 09:37:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.