Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6
-
None
Description
Issue description:
MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1.
MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:
create database testdb;
|
use testdb;
|
CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
DELIMITER $$
|
CREATE PROCEDURE generateredo(count int) |
BEGIN
|
DECLARE counter INT DEFAULT 1; |
WHILE counter <= count DO
|
insert into t1 values(counter,counter);
|
update t1 set col2=counter+1 where col1=counter; |
delete from t1 where col1=counter;
|
SET counter = counter + 1; |
END WHILE;
|
END$$
|
DELIMITER ;
|
 |
call testdb.generateredo(10000000); |
The mysql error log when recovering:
2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found |
2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: |
2021-12-20 18:55:50 0 [Note] cf=default |
2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 |
2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 |
2021-12-20 18:55:50 0 [Note] cf=__system__ |
2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 |
2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 |
2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables |
2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer |
2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. |
2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 |
2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 |
2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions |
2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO |
2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 |
2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. |
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo |
2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 |
2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. |
2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables |
2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... |
2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. |
2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions |
2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed |
2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 |
2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool |
2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog |
2021-12-20 18:55:50 0 [Note] Starting table crash recovery... |
2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-20 18:55:50 0 [Note] Crash table recovery finished. |
2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. |
2021-12-20 18:55:50 0 [ERROR] Aborting |
The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.
Some investigation:
Based on the implementation of function init_server_components(), during the recovery, ha_recover() function is called twice
- Stack trace of the first call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292
sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057]
sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd]
sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931]
sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda]
sql/mysqld.cc:5289(init_server_components())[0x5599475098ce]
sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55]
??:0(__libc_start_main)[0x14f07bff20ba]
- Stack trace of the second call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298
sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b]
sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df]
sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55]
??:0(__libc_start_main)[0x1542b650e0ba]
??:0(_start)[0x55e575e4764a]
However, in the commit the ha_recover() function call was moved from the end of function TC_LOG_BINLOG::recover to the beginning.
- Before the commit, in the second call of ha_recover(), no actual recover was done because trx_sys.rw_trx_hash.iterate_no_dups() iterate doesn't have any items - https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980 .
- However after this commit, the second call of ha_recover() also tries to do some recovery which eventually caused the error of:
https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756[ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
Tried a quick fix
I tried a simple fix of moving the ha_recover() back to the end of recover() then the issue was fixed:
diff --git a/sql/log.cc b/sql/log.cc
|
index 41eed80ef66..ddda2e5d330 100644 |
--- a/sql/log.cc
|
+++ b/sql/log.cc
|
@@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, |
fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error |
|
/* finds xids when root is not NULL */ |
- if (do_xa && ha_recover(&xids, &mem_root)) |
- goto err1; |
+ //if (do_xa && ha_recover(&xids, &mem_root)) |
+ // goto err1; |
|
/*
|
Scan the binlog for XIDs that need to be committed if still in the |
@@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, |
|
if (do_xa) |
{
|
+ if (ha_recover(&xids)) |
+ goto err2; |
if (binlog_checkpoint_found) |
{
|
#ifndef HAVE_REPLICATION
|
Error log before and after this fix
Before the fix, the error was in the second call of ha_recover():
2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 |
2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool |
2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. |
2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog |
2021-12-21 20:07:45 0 [Note] Starting table crash recovery... |
2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-21 20:07:45 0 [Note] Crash table recovery finished. |
2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. |
2021-12-21 20:07:45 0 [ERROR] Aborting |
After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call:
2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 |
2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool |
2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog |
2021-12-21 1:00:20 0 [Note] Starting table crash recovery... |
2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery |
2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-21 1:00:20 0 [Note] Crash table recovery finished. |
2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. |
2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. |
2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 |
2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. |
Could you please investigate whether we should update the function or other conditions need to be considered?