[MDEV-33268] IO Thread Can Write Gtid_list_log_event Mid-transaction into Relay Log Created: 2024-01-17  Updated: 2024-01-25

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3

Type: Bug Priority: Major
Reporter: Brandon Nesterenko Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-9670 server_id mysteriously set to 0 in bi... Closed

 Description   

If the IO thread received ignored events (say by IGNORE_DOMAIN_IDS from CHANGE MASTER TO), if the IO thread is stopped while actively logging a transaction to the relay log (i.e. the transaction start event was logged, but no end event), the function write_ignored_events_info_to_relay_log() in the IO thread's stop logic will write a Gtid_list_log_event. There is a DBUG_ASSERT which tries to catch this, but uses the wrong condition, and it should also be updated:

diff --git a/sql/slave.cc b/sql/slave.cc
index f4d76e447cd..b4f3e829070 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc
@@ -2687,7 +2687,7 @@ static void write_ignored_events_info_to_relay_log(THD *thd, Master_info *mi)
     }
     if (rli->ign_gtids.count())
     {
-      DBUG_ASSERT(!rli->is_in_group());         // Ensure no active transaction
+      DBUG_ASSERT(!mi->events_queued_since_last_gtid);         // Ensure no active transaction
       glev= new Gtid_list_log_event(&rli->ign_gtids,
                                     Gtid_list_log_event::FLAG_IGN_GTIDS);
       rli->ign_gtids.reset();

A couple suggested fixes from Andrei are

Anything that would cause SQL thread to stop, and retry later from the beginning of the group.
Maybe Incident.

Should be converted to an error.
The Glle should not be logged, and perhaps the right way is to log a ROLLBACK instead in this case.


Generated at Thu Feb 08 10:37:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.