I filed MDEV-12103 for fixing MySQL Bug#80788 in MariaDB.
This request is about increasing the signal-to-noise ratio of the error log output of crash recovery. The InnoDB crash recovery consists of a few tasks:
- Apply the redo log.
- If a DDL transaction was active when the server was killed, roll it back.
- Drop any incompletely created indexes.
- Start a background thread to roll back any recovered transactions.
- Start accepting connections.
I think that as part of this ticket, we should concentrate on the redo log apply. The background rollback can be somewhat noisy too, but it can be fixed separately.
The redo log apply does not have a concept of transactions. Instead, it only deals with mini-transactions, redo log records (measured in LSN or bytes) and data file pages.
If the last redo log checkpoint was far away, crash recovery may comprise multiple passes. Usually one pass will be enough. Each pass comprises the following steps:
- Read and parse the redo log records, putting them into recv_sys->addr_hash table by (space_id,page_number), in a linked list, ordered by LSN. This should be fast, as it only is sequential I/O.
- Take a random bucket from recv_sys->addr_hash, and read a range of pages starting from the (space_id,page_number) of that bucket.
- When the read for a given page completes, apply all buffered records for the page, and schedule the page for writing.
- If this is the last pass, apply the change buffer to the page if applicable (and write redo log about this).
- Write the pages back to the data files.
- Empty the recv_sys->addr_hash.
Because the I/O is asynchronous, it is somewhat challenging to provide a meaningful progress measure.
I think that we should consider a progress update that is taking place every 5 to 60 seconds (maybe introduce a parameter --innodb-log-recovery-reports= for this). This progress counter could expose the following information:
- Size of redo log scanned so far (last checkpoint LSN, scanned-up-to-LSN, and whether the end of the log (end of last pass) was reached). A percentage cannot be reported, because we do not know where the log ends until we reach the end.
- Number of redo log passes so far. (Or emit a message every time a new pass is started.)
- Number of pages to process in the current pass (number of distinct pages in recv_sys->addr_hash)
- Number of pages that have been read in the current pass (<= the above)
- Number of pages that have been written in the current pass (<= the above)
I would not necessarily expose the change buffer merges separately. If that is needed, it could be done separately. We do know in advance how many pages need a change buffer merge, because that information is available in the change buffer bitmap pages.
On a related note: When the log files are missing on startup and they have just been created by the very same InnoDB startup run, also in that case InnoDB will display messages about doing crash recovery, even though there is nothing to recover. This is the case at least in 10.2, maybe also 10.0 and 10.1.