Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6
-
Can result in unexpected behaviour
-
Description
An ALTER TABLE to MERGE or MRG_MyISAM is binlogged incorrectly. It is not marked as DDL, and therefore is scheduled incorrectly in parallel replication. The MDEV-21107 is one possible symptom of this.
--source include/have_binlog_format_mixed.inc
|
CREATE TABLE t (i1 int, i2 int, pk int) ;
|
CREATE TABLE t3 LIKE t ;
|
ALTER TABLE t3 ENGINE = MERGE UNION (t1,t2);
|
The GTID for the ALTER in the binlog is missing the "ddl" mark:
#251020 13:02:56 server id 1 end_log_pos 668 CRC32 0xa81c998f GTID 0-1-3
|
#251020 13:02:56 server id 1 end_log_pos 787 CRC32 0x68bf0315 Query thread_id=5 exec_time=0 error_code=0 xid=26
|
ALTER TABLE t3 ENGINE = MERGE UNION (t1,t2)
|
The ddl mark is decided by the DID_DDL bit of THD_TRANS::m_unsafe_rollback_flags.
For the ALTER, this gets set in mysql_execute_command():
if (stmt_causes_implicit_commit(thd, CF_IMPLICIT_COMMIT_BEGIN)) {
|
...
|
thd->transaction->stmt.mark_trans_did_ddl();
|
However, for the ALTER to MERGE, the flag gets cleared again before binlogging, causing the ddl flag to be lost on the GTID event. This happens in THD_TRANS::reset():
void reset() {
|
...
|
m_unsafe_rollback_flags= 0;
|
This gets called through trans_commit_stmt() and trans_commit_implicit() from mysql_alter_table():
/*
|
We do not copy data for MERGE tables. Only the children have data.
|
MERGE tables have HA_NO_COPY_ON_ALTER set.
|
*/
|
if (!(new_table->file->ha_table_flags() & HA_NO_COPY_ON_ALTER))
|
...
|
else {
|
...
|
if (trans_commit_stmt(thd) || trans_commit_implicit(thd))
|
This seems to explain why the bug happens, the HA_NO_COPY_ON_ALTER code branch is dropping the DID_DDL flag and then the GTID event will be binlogged later without the ddl mark.
But I am not sure what is the right way to fix it. Should mysql_alter_table() save and restore the flag across the resetting of the THD_TRANS? Should mysql_alter_table() maybe mark the DID_DDL unconditionally at the end? Should the Gtid_log_event constructor be changed to detect DDL in another way? Could there be other consequences elsewhere in the code for the DID_DDL flag to be lost?
I filed a separate bug for this with a clear description of what the problem is, it is the root cause of MDEV-21107, but that bug has a long discussion that mixes several different issues.
Attachments
Issue Links
- relates to
-
MDEV-21107 Assertion `!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id' failed in finish_event_group
-
- Confirmed
-