Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6
-
None
Description
This issue is a bug in the atomic/crash-safe DDL feature.
The idea in atomic DDL is during crash recovery to check if a DDL has already been written to the binlog before the crash. If so, the DDL is completed; if not the DDL is aborted and rolled back. This way the state of the table(s) and of the binlog are supposed to be kept in sync.
The implementation is that the ddl log code uses the scan of the binlog that happens during crash recovery to get a hash with all of the DDL statements contained in the binlog. The problem is that there is nothing in the code to guarantee that the scan starts sufficiently early in the binlog to include all pending DDLs.
The following testcase demonstrates the problem. By having a binlog rotate in parallel with a RENAME TABLE just around the point of a server crash, the server ends up in a state where the RENAME TABLE is in the binlog, but the table is left not renamed, an inconsistency that could eg. lead to replication breaking etc. The test case requires a small patch to the code to insert a debug_sync point:
diff --git a/sql/sql_rename.cc b/sql/sql_rename.cc
|
index 421c7198c10..4b8c23d8978 100644
|
--- a/sql/sql_rename.cc
|
+++ b/sql/sql_rename.cc
|
@@ -182,6 +182,7 @@ bool mysql_rename_tables(THD *thd, TABLE_LIST *table_list, bool silent,
|
thd->binlog_xid= thd->query_id;
|
ddl_log_update_xid(&ddl_log_state, thd->binlog_xid);
|
binlog_error= write_bin_log(thd, TRUE, thd->query(), thd->query_length());
|
+ DEBUG_SYNC(thd, "ddl_log_rename_after_binlog");
|
if (binlog_error)
|
error= 1;
|
thd->binlog_xid= 0;
|
--source include/have_debug_sync.inc
|
--source include/have_binlog_format_mixed.inc
|
|
|
CREATE TABLE t1 (a INT PRIMARY KEY);
|
connect (con1,localhost,root,,);
|
SET debug_sync= 'ddl_log_rename_after_binlog SIGNAL ready WAIT_FOR ever';
|
send RENAME TABLE t1 TO t2;
|
--connection default
|
SET debug_sync= 'now WAIT_FOR ready';
|
FLUSH BINARY LOGS;
|
--source include/wait_for_binlog_checkpoint.inc
|
|
|
--let $shutdown_timeout=0
|
--source include/restart_mysqld.inc
|
|
|
SHOW BINLOG EVENTS IN 'master-bin.000001';
|
SHOW BINLOG EVENTS IN 'master-bin.000002';
|
query_vertical
|
SHOW CREATE TABLE t2;
|
|
|
DROP TABLE t2;
|
CREATE TABLE t1 (a INT PRIMARY KEY);
|
connect con1,localhost,root,,;
|
SET debug_sync= 'ddl_log_rename_after_binlog SIGNAL ready WAIT_FOR ever';
|
RENAME TABLE t1 TO t2;
|
connection default;
|
SET debug_sync= 'now WAIT_FOR ready';
|
FLUSH BINARY LOGS;
|
# restart
|
SHOW BINLOG EVENTS IN 'master-bin.000001';
|
Log_name Pos Event_type Server_id End_log_pos Info
|
master-bin.000001 4 Format_desc 1 256 Server ver: 11.4.10-MariaDB-debug-log, Binlog ver: 4
|
master-bin.000001 256 Gtid_list 1 285 []
|
master-bin.000001 285 Binlog_checkpoint 1 329 master-bin.000001
|
master-bin.000001 329 Gtid 1 371 GTID 0-1-1
|
master-bin.000001 371 Query 1 482 use `test`; CREATE TABLE t1 (a INT PRIMARY KEY)
|
master-bin.000001 482 Gtid 1 524 GTID 0-1-2
|
master-bin.000001 524 Query 1 621 use `test`; RENAME TABLE t1 TO t2
|
master-bin.000001 621 Rotate 1 669 master-bin.000002;pos=4
|
SHOW BINLOG EVENTS IN 'master-bin.000002';
|
Log_name Pos Event_type Server_id End_log_pos Info
|
master-bin.000002 4 Format_desc 1 256 Server ver: 11.4.10-MariaDB-debug-log, Binlog ver: 4
|
master-bin.000002 256 Gtid_list 1 299 [0-1-2]
|
master-bin.000002 299 Binlog_checkpoint 1 343 master-bin.000001
|
master-bin.000002 343 Binlog_checkpoint 1 387 master-bin.000002
|
SHOW CREATE TABLE t2;
|
main.elenst [ fail ]
|
Test ended at 2026-01-21 13:59:44
|
|
|
CURRENT_TEST: main.elenst
|
mysqltest: At line 18: query 'SHOW CREATE TABLE t2' failed: ER_NO_SUCH_TABLE (1146): Table 'test.t2' doesn't exist
|
As a fix, my suggestion is that the ddl log should save the current GTID position of the server in the DDL log, along with the query XID, prior to binlogging. Then in case of crash recovery, the ddl log code should do its own binlog scan from the saved GTID position to look for the query XID. This will ensure that the binlog scan happens from just the point in the binlog necessary. It will also integrate well with the new binlog-in-engine code (which normally doesn't need to scan the whole binlog file in case of crash).