Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6
-
None
Description
Issue description:
MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1.
MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:
create database testdb;
|
use testdb;
|
CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
DELIMITER $$
|
CREATE PROCEDURE generateredo(count int) |
BEGIN
|
DECLARE counter INT DEFAULT 1; |
WHILE counter <= count DO
|
insert into t1 values(counter,counter);
|
update t1 set col2=counter+1 where col1=counter; |
delete from t1 where col1=counter;
|
SET counter = counter + 1; |
END WHILE;
|
END$$
|
DELIMITER ;
|
|
call testdb.generateredo(10000000); |
The mysql error log when recovering:
2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found |
2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: |
2021-12-20 18:55:50 0 [Note] cf=default |
2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 |
2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 |
2021-12-20 18:55:50 0 [Note] cf=__system__ |
2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 |
2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 |
2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables |
2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer |
2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. |
2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 |
2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 |
2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions |
2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO |
2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 |
2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. |
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo |
2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 |
2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. |
2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables |
2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... |
2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. |
2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions |
2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed |
2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 |
2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool |
2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog |
2021-12-20 18:55:50 0 [Note] Starting table crash recovery... |
2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-20 18:55:50 0 [Note] Crash table recovery finished. |
2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. |
2021-12-20 18:55:50 0 [ERROR] Aborting |
The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.
Some investigation:
Based on the implementation of function init_server_components(), during the recovery, ha_recover() function is called twice
- Stack trace of the first call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292
sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057]
sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd]
sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931]
sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda]
sql/mysqld.cc:5289(init_server_components())[0x5599475098ce]
sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55]
??:0(__libc_start_main)[0x14f07bff20ba]
- Stack trace of the second call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298
sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b]
sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df]
sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55]
??:0(__libc_start_main)[0x1542b650e0ba]
??:0(_start)[0x55e575e4764a]
However, in the commit the ha_recover() function call was moved from the end of function TC_LOG_BINLOG::recover to the beginning.
- Before the commit, in the second call of ha_recover(), no actual recover was done because trx_sys.rw_trx_hash.iterate_no_dups() iterate doesn't have any items - https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980 .
- However after this commit, the second call of ha_recover() also tries to do some recovery which eventually caused the error of:
https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756[ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
Tried a quick fix
I tried a simple fix of moving the ha_recover() back to the end of recover() then the issue was fixed:
diff --git a/sql/log.cc b/sql/log.cc
|
index 41eed80ef66..ddda2e5d330 100644 |
--- a/sql/log.cc
|
+++ b/sql/log.cc
|
@@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, |
fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error |
|
/* finds xids when root is not NULL */ |
- if (do_xa && ha_recover(&xids, &mem_root)) |
- goto err1; |
+ //if (do_xa && ha_recover(&xids, &mem_root)) |
+ // goto err1; |
|
/*
|
Scan the binlog for XIDs that need to be committed if still in the |
@@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, |
|
if (do_xa) |
{
|
+ if (ha_recover(&xids)) |
+ goto err2; |
if (binlog_checkpoint_found) |
{
|
#ifndef HAVE_REPLICATION
|
Error log before and after this fix
Before the fix, the error was in the second call of ha_recover():
2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 |
2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool |
2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. |
2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog |
2021-12-21 20:07:45 0 [Note] Starting table crash recovery... |
2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-21 20:07:45 0 [Note] Crash table recovery finished. |
2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. |
2021-12-21 20:07:45 0 [ERROR] Aborting |
After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call:
2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 |
2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool |
2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog |
2021-12-21 1:00:20 0 [Note] Starting table crash recovery... |
2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... |
2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery |
2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows |
2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery |
2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB |
2021-12-21 1:00:20 0 [Note] Crash table recovery finished. |
2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. |
2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. |
2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 |
2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. |
Could you please investigate whether we should update the function or other conditions need to be considered?
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Priority | Major [ 3 ] | Minor [ 4 ] |
Affects Version/s | 10.6 [ 24028 ] |
Summary | MariaDB 10.6 failed to do crash recovery due to changes in https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981 | MariaDB 10.6 failed to do crash recovery, potential regression in commit 6c39eaeb |
Description |
h3. Issue description: *MariaDB 10.6.5 fails to recover from crash, regression in commit [https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981]* *Failed to recover the database from an EBS data directory snapshot which was taken while following transaction is running:* {code:java} create database testdb; use testdb; CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; DELIMITER $$ CREATE PROCEDURE generateredo(count int) BEGIN DECLARE counter INT DEFAULT 1; WHILE counter <= count DO insert into t1 values(counter,counter); update t1 set col2=counter+1 where col1=counter; delete from t1 where col1=counter; SET counter = counter + 1; END WHILE; END$$ DELIMITER ; call testdb.generateredo(10000000); {code} The mysql error log when recovering: {code:java} 2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found 2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: 2021-12-20 18:55:50 0 [Note] cf=default 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] cf=__system__ 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO 2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. 2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-20 18:55:50 0 [Note] Starting table crash recovery... 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [Note] Crash table recovery finished. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-20 18:55:50 0 [ERROR] Aborting {code} *The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.* h3. Some investigation: Based on the implementation of function {+}{*}init_server_components{*}(){+}, during the recovery, [+ha_recover()+|https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2703] function is called *twice* * Stack trace of the {*}first call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292] {code:java} sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057] sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd] sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931] sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda] sql/mysqld.cc:5289(init_server_components())[0x5599475098ce] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55] ??:0(__libc_start_main)[0x14f07bff20ba] {code} * Stack trace of the {*}second call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298] {code:java} sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b] sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55] ??:0(__libc_start_main)[0x1542b650e0ba] ??:0(_start)[0x55e575e4764a] {code} *However, in [the commit|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981] the +ha_recover()+ function call was moved from the [end of function|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8L10686-L10687] +TC_LOG_BINLOG::recover+ to [the beginning|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8R11174-R11177].* * Before the commit, in the second call of {+}ha_recover(){+}, no actual recover was done because +trx_sys.rw_trx_hash.iterate_no_dups()+ iterate doesn't have any items - [https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980] . * However after this commit, the second call of +ha_recover()+ also tries to do some recovery which eventually caused the error of: [https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756] {code:java} [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. {code} h3. Tried a quick fix *I tried a simple fix of moving the +ha_recover()+ back to the end of +recover()+ then the issue was fixed:* {code:java} diff --git a/sql/log.cc b/sql/log.cc index 41eed80ef66..ddda2e5d330 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error /* finds xids when root is not NULL */ - if (do_xa && ha_recover(&xids, &mem_root)) - goto err1; + //if (do_xa && ha_recover(&xids, &mem_root)) + // goto err1; /* Scan the binlog for XIDs that need to be committed if still in the @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, if (do_xa) { + if (ha_recover(&xids)) + goto err2; if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION {code} *Error log before and after this fix* Before the fix, the error was in the second call of ha_recover(): {code:java} 2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 20:07:45 0 [Note] Starting table crash recovery... 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [Note] Crash table recovery finished. 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-21 20:07:45 0 [ERROR] Aborting{code} After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call: {code:java} 2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 1:00:20 0 [Note] Starting table crash recovery... 2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 1:00:20 0 [Note] Crash table recovery finished. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. 2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. {code} Could you please investigate whether we should update the function or other conditions need to be considered? |
h3. Issue description:
*MariaDB 10.6.5 fails to recover from crash, regression in commit [https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981]* *Failed to recover the database from a data directory snapshot which was taken while following transaction is running:* {code:java} create database testdb; use testdb; CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; DELIMITER $$ CREATE PROCEDURE generateredo(count int) BEGIN DECLARE counter INT DEFAULT 1; WHILE counter <= count DO insert into t1 values(counter,counter); update t1 set col2=counter+1 where col1=counter; delete from t1 where col1=counter; SET counter = counter + 1; END WHILE; END$$ DELIMITER ; call testdb.generateredo(10000000); {code} The mysql error log when recovering: {code:java} 2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found 2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: 2021-12-20 18:55:50 0 [Note] cf=default 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] cf=__system__ 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO 2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. 2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-20 18:55:50 0 [Note] Starting table crash recovery... 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [Note] Crash table recovery finished. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-20 18:55:50 0 [ERROR] Aborting {code} *The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.* h3. Some investigation: Based on the implementation of function {+}{*}init_server_components{*}(){+}, during the recovery, [+ha_recover()+|https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2703] function is called *twice* * Stack trace of the {*}first call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292] {code:java} sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057] sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd] sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931] sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda] sql/mysqld.cc:5289(init_server_components())[0x5599475098ce] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55] ??:0(__libc_start_main)[0x14f07bff20ba] {code} * Stack trace of the {*}second call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298] {code:java} sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b] sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55] ??:0(__libc_start_main)[0x1542b650e0ba] ??:0(_start)[0x55e575e4764a] {code} *However, in [the commit|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981] the +ha_recover()+ function call was moved from the [end of function|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8L10686-L10687] +TC_LOG_BINLOG::recover+ to [the beginning|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8R11174-R11177].* * Before the commit, in the second call of {+}ha_recover(){+}, no actual recover was done because +trx_sys.rw_trx_hash.iterate_no_dups()+ iterate doesn't have any items - [https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980] . * However after this commit, the second call of +ha_recover()+ also tries to do some recovery which eventually caused the error of: [https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756] {code:java} [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. {code} h3. Tried a quick fix *I tried a simple fix of moving the +ha_recover()+ back to the end of +recover()+ then the issue was fixed:* {code:java} diff --git a/sql/log.cc b/sql/log.cc index 41eed80ef66..ddda2e5d330 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error /* finds xids when root is not NULL */ - if (do_xa && ha_recover(&xids, &mem_root)) - goto err1; + //if (do_xa && ha_recover(&xids, &mem_root)) + // goto err1; /* Scan the binlog for XIDs that need to be committed if still in the @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, if (do_xa) { + if (ha_recover(&xids)) + goto err2; if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION {code} *Error log before and after this fix* Before the fix, the error was in the second call of ha_recover(): {code:java} 2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 20:07:45 0 [Note] Starting table crash recovery... 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [Note] Crash table recovery finished. 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-21 20:07:45 0 [ERROR] Aborting{code} After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call: {code:java} 2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 1:00:20 0 [Note] Starting table crash recovery... 2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 1:00:20 0 [Note] Crash table recovery finished. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. 2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. {code} Could you please investigate whether we should update the function or other conditions need to be considered? |
Description |
h3. Issue description:
*MariaDB 10.6.5 fails to recover from crash, regression in commit [https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981]* *Failed to recover the database from a data directory snapshot which was taken while following transaction is running:* {code:java} create database testdb; use testdb; CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; DELIMITER $$ CREATE PROCEDURE generateredo(count int) BEGIN DECLARE counter INT DEFAULT 1; WHILE counter <= count DO insert into t1 values(counter,counter); update t1 set col2=counter+1 where col1=counter; delete from t1 where col1=counter; SET counter = counter + 1; END WHILE; END$$ DELIMITER ; call testdb.generateredo(10000000); {code} The mysql error log when recovering: {code:java} 2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found 2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: 2021-12-20 18:55:50 0 [Note] cf=default 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] cf=__system__ 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO 2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. 2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-20 18:55:50 0 [Note] Starting table crash recovery... 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [Note] Crash table recovery finished. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-20 18:55:50 0 [ERROR] Aborting {code} *The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.* h3. Some investigation: Based on the implementation of function {+}{*}init_server_components{*}(){+}, during the recovery, [+ha_recover()+|https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2703] function is called *twice* * Stack trace of the {*}first call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292] {code:java} sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057] sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd] sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931] sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda] sql/mysqld.cc:5289(init_server_components())[0x5599475098ce] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55] ??:0(__libc_start_main)[0x14f07bff20ba] {code} * Stack trace of the {*}second call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298] {code:java} sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b] sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55] ??:0(__libc_start_main)[0x1542b650e0ba] ??:0(_start)[0x55e575e4764a] {code} *However, in [the commit|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981] the +ha_recover()+ function call was moved from the [end of function|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8L10686-L10687] +TC_LOG_BINLOG::recover+ to [the beginning|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8R11174-R11177].* * Before the commit, in the second call of {+}ha_recover(){+}, no actual recover was done because +trx_sys.rw_trx_hash.iterate_no_dups()+ iterate doesn't have any items - [https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980] . * However after this commit, the second call of +ha_recover()+ also tries to do some recovery which eventually caused the error of: [https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756] {code:java} [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. {code} h3. Tried a quick fix *I tried a simple fix of moving the +ha_recover()+ back to the end of +recover()+ then the issue was fixed:* {code:java} diff --git a/sql/log.cc b/sql/log.cc index 41eed80ef66..ddda2e5d330 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error /* finds xids when root is not NULL */ - if (do_xa && ha_recover(&xids, &mem_root)) - goto err1; + //if (do_xa && ha_recover(&xids, &mem_root)) + // goto err1; /* Scan the binlog for XIDs that need to be committed if still in the @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, if (do_xa) { + if (ha_recover(&xids)) + goto err2; if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION {code} *Error log before and after this fix* Before the fix, the error was in the second call of ha_recover(): {code:java} 2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 20:07:45 0 [Note] Starting table crash recovery... 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [Note] Crash table recovery finished. 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-21 20:07:45 0 [ERROR] Aborting{code} After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call: {code:java} 2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 1:00:20 0 [Note] Starting table crash recovery... 2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 1:00:20 0 [Note] Crash table recovery finished. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. 2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. {code} Could you please investigate whether we should update the function or other conditions need to be considered? |
h3. Issue description:
MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1. *MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:* {code:java} create database testdb; use testdb; CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; DELIMITER $$ CREATE PROCEDURE generateredo(count int) BEGIN DECLARE counter INT DEFAULT 1; WHILE counter <= count DO insert into t1 values(counter,counter); update t1 set col2=counter+1 where col1=counter; delete from t1 where col1=counter; SET counter = counter + 1; END WHILE; END$$ DELIMITER ; call testdb.generateredo(10000000); {code} The mysql error log when recovering: {code:java} 2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found 2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: 2021-12-20 18:55:50 0 [Note] cf=default 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] cf=__system__ 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO 2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. 2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-20 18:55:50 0 [Note] Starting table crash recovery... 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [Note] Crash table recovery finished. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-20 18:55:50 0 [ERROR] Aborting {code} *The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.* h3. Some investigation: Based on the implementation of function {+}{*}init_server_components{*}(){+}, during the recovery, [+ha_recover()+|https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2703] function is called *twice* * Stack trace of the {*}first call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292] {code:java} sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057] sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd] sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931] sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda] sql/mysqld.cc:5289(init_server_components())[0x5599475098ce] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55] ??:0(__libc_start_main)[0x14f07bff20ba] {code} * Stack trace of the {*}second call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298] {code:java} sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b] sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55] ??:0(__libc_start_main)[0x1542b650e0ba] ??:0(_start)[0x55e575e4764a] {code} *However, in [the commit|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981] the +ha_recover()+ function call was moved from the [end of function|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8L10686-L10687] +TC_LOG_BINLOG::recover+ to [the beginning|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8R11174-R11177].* * Before the commit, in the second call of {+}ha_recover(){+}, no actual recover was done because +trx_sys.rw_trx_hash.iterate_no_dups()+ iterate doesn't have any items - [https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980] . * However after this commit, the second call of +ha_recover()+ also tries to do some recovery which eventually caused the error of: [https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756] {code:java} [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. {code} h3. Tried a quick fix *I tried a simple fix of moving the +ha_recover()+ back to the end of +recover()+ then the issue was fixed:* {code:java} diff --git a/sql/log.cc b/sql/log.cc index 41eed80ef66..ddda2e5d330 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error /* finds xids when root is not NULL */ - if (do_xa && ha_recover(&xids, &mem_root)) - goto err1; + //if (do_xa && ha_recover(&xids, &mem_root)) + // goto err1; /* Scan the binlog for XIDs that need to be committed if still in the @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, if (do_xa) { + if (ha_recover(&xids)) + goto err2; if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION {code} *Error log before and after this fix* Before the fix, the error was in the second call of ha_recover(): {code:java} 2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 20:07:45 0 [Note] Starting table crash recovery... 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [Note] Crash table recovery finished. 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-21 20:07:45 0 [ERROR] Aborting{code} After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call: {code:java} 2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 1:00:20 0 [Note] Starting table crash recovery... 2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 1:00:20 0 [Note] Crash table recovery finished. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. 2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. {code} Could you please investigate whether we should update the function or other conditions need to be considered? |
Description |
h3. Issue description:
MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1. *MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:* {code:java} create database testdb; use testdb; CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; DELIMITER $$ CREATE PROCEDURE generateredo(count int) BEGIN DECLARE counter INT DEFAULT 1; WHILE counter <= count DO insert into t1 values(counter,counter); update t1 set col2=counter+1 where col1=counter; delete from t1 where col1=counter; SET counter = counter + 1; END WHILE; END$$ DELIMITER ; call testdb.generateredo(10000000); {code} The mysql error log when recovering: {code:java} 2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found 2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: 2021-12-20 18:55:50 0 [Note] cf=default 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] cf=__system__ 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO 2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. 2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-20 18:55:50 0 [Note] Starting table crash recovery... 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [Note] Crash table recovery finished. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-20 18:55:50 0 [ERROR] Aborting {code} *The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.* h3. Some investigation: Based on the implementation of function {+}{*}init_server_components{*}(){+}, during the recovery, [+ha_recover()+|https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2703] function is called *twice* * Stack trace of the {*}first call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292] {code:java} sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057] sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd] sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931] sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda] sql/mysqld.cc:5289(init_server_components())[0x5599475098ce] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55] ??:0(__libc_start_main)[0x14f07bff20ba] {code} * Stack trace of the {*}second call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298] {code:java} sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b] sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55] ??:0(__libc_start_main)[0x1542b650e0ba] ??:0(_start)[0x55e575e4764a] {code} *However, in [the commit|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981] the +ha_recover()+ function call was moved from the [end of function|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8L10686-L10687] +TC_LOG_BINLOG::recover+ to [the beginning|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8R11174-R11177].* * Before the commit, in the second call of {+}ha_recover(){+}, no actual recover was done because +trx_sys.rw_trx_hash.iterate_no_dups()+ iterate doesn't have any items - [https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980] . * However after this commit, the second call of +ha_recover()+ also tries to do some recovery which eventually caused the error of: [https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756] {code:java} [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. {code} h3. Tried a quick fix *I tried a simple fix of moving the +ha_recover()+ back to the end of +recover()+ then the issue was fixed:* {code:java} diff --git a/sql/log.cc b/sql/log.cc index 41eed80ef66..ddda2e5d330 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error /* finds xids when root is not NULL */ - if (do_xa && ha_recover(&xids, &mem_root)) - goto err1; + //if (do_xa && ha_recover(&xids, &mem_root)) + // goto err1; /* Scan the binlog for XIDs that need to be committed if still in the @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, if (do_xa) { + if (ha_recover(&xids)) + goto err2; if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION {code} *Error log before and after this fix* Before the fix, the error was in the second call of ha_recover(): {code:java} 2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 20:07:45 0 [Note] Starting table crash recovery... 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [Note] Crash table recovery finished. 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-21 20:07:45 0 [ERROR] Aborting{code} After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call: {code:java} 2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 1:00:20 0 [Note] Starting table crash recovery... 2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 1:00:20 0 [Note] Crash table recovery finished. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. 2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. {code} Could you please investigate whether we should update the function or other conditions need to be considered? |
h3. Issue description:
MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in [6c39eaeb1|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981]. *MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:* {code:java} create database testdb; use testdb; CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1; DELIMITER $$ CREATE PROCEDURE generateredo(count int) BEGIN DECLARE counter INT DEFAULT 1; WHILE counter <= count DO insert into t1 values(counter,counter); update t1 set col2=counter+1 where col1=counter; delete from t1 where col1=counter; SET counter = counter + 1; END WHILE; END$$ DELIMITER ; call testdb.generateredo(10000000); {code} The mysql error log when recovering: {code:java} 2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found 2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start: 2021-12-20 18:55:50 0 [Note] cf=default 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] cf=__system__ 2021-12-20 18:55:50 0 [Note] write_buffer_size=67108864 2021-12-20 18:55:50 0 [Note] target_file_size_base=67108864 2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1 2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO 2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728 2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state. 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266 2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active. 2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267 2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-20 18:55:50 0 [Note] Starting table crash recovery... 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [Note] Crash table recovery finished. 2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery 2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-20 18:55:50 0 [ERROR] Aborting {code} *The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.* h3. Some investigation: Based on the implementation of function {+}{*}init_server_components{*}(){+}, during the recovery, [+ha_recover()+|https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2703] function is called *twice* * Stack trace of the {*}first call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292] {code:java} sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057] sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd] sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931] sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda] sql/mysqld.cc:5289(init_server_components())[0x5599475098ce] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55] ??:0(__libc_start_main)[0x14f07bff20ba] {code} * Stack trace of the {*}second call{*}: [https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298] {code:java} sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b] sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df] sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55] ??:0(__libc_start_main)[0x1542b650e0ba] ??:0(_start)[0x55e575e4764a] {code} *However, in [the commit|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981] the +ha_recover()+ function call was moved from the [end of function|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8L10686-L10687] +TC_LOG_BINLOG::recover+ to [the beginning|https://github.com/MariaDB/server/commit/6c39eaeb126328e7813b146ecf652d51e4508981#diff-484732279f81ad1700aa2f7f19305fffa88d8891ad52af5b103a2a9ee9f766b8R11174-R11177].* * Before the commit, in the second call of {+}ha_recover(){+}, no actual recover was done because +trx_sys.rw_trx_hash.iterate_no_dups()+ iterate doesn't have any items - [https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980] . * However after this commit, the second call of +ha_recover()+ also tries to do some recovery which eventually caused the error of: [https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756] {code:java} [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. {code} h3. Tried a quick fix *I tried a simple fix of moving the +ha_recover()+ back to the end of +recover()+ then the issue was fixed:* {code:java} diff --git a/sql/log.cc b/sql/log.cc index 41eed80ef66..ddda2e5d330 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error /* finds xids when root is not NULL */ - if (do_xa && ha_recover(&xids, &mem_root)) - goto err1; + //if (do_xa && ha_recover(&xids, &mem_root)) + // goto err1; /* Scan the binlog for XIDs that need to be committed if still in the @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, if (do_xa) { + if (ha_recover(&xids)) + goto err2; if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION {code} *Error log before and after this fix* Before the fix, the error was in the second call of ha_recover(): {code:java} 2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557 2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 20:07:45 0 [Note] Starting table crash recovery... 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [Note] Crash table recovery finished. 2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-12-21 20:07:45 0 [ERROR] Aborting{code} After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call: {code:java} 2021-12-21 1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557 2021-12-21 1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021-12-21 1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021-12-21 1:00:20 0 [Note] Starting table crash recovery... 2021-12-21 1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021-12-21 1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-12-21 1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-12-21 1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-12-21 1:00:20 0 [Note] Crash table recovery finished. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-12-21 1:00:20 0 [Note] Server socket created on IP: '::'. 2021-12-21 1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221 1:00:20 2021-12-21 1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections. {code} Could you please investigate whether we should update the function or other conditions need to be considered? |
Priority | Minor [ 4 ] | Major [ 3 ] |
Fix Version/s | 10.6 [ 24028 ] |
Assignee | Andrei Elkin [ elkin ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Fix Version/s | 10.6.8 [ 27506 ] | |
Fix Version/s | 10.7.4 [ 27504 ] | |
Fix Version/s | 10.8.3 [ 27502 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |