[MDEV-7850] Extend GTID Binlog Events with Thread Id Created: 2015-03-26  Updated: 2024-02-06

Status: In Progress
Project: MariaDB Server
Component/s: Replication
Fix Version/s: 11.5

Type: New Feature Priority: Major
Reporter: Kolbe Kegel (Inactive) Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: Preview_11.4, triage, verified

Issue Links:
Duplicate
is duplicated by MDEV-15875 the lost of "BEGIN" in mariadb server... Closed
PartOf
is part of MDEV-28906 MySQL 8.0 desired compatibility Open
Problem/Incident
causes MDEV-33048 Assertion `info->end_of_file - (info-... Closed

 Description   

It seems that MySQL 5.6 includes the thread id in the binary log, which can be very handy. It would be nice if MariaDB also did this.

From MariaDB 10.1.3:

#150326 12:13:53 server id 1  end_log_pos 407   GTID 0-1-55 trans
/*!100101 SET @@session.skip_parallel_replication=0*//*!*/;
/*!100001 SET @@session.gtid_domain_id=0*//*!*/;
/*!100001 SET @@session.server_id=1*//*!*/;
/*!100001 SET @@session.gtid_seq_no=55*//*!*/;
BEGIN
/*!*/;
# at 407
# at 449
#150326 12:13:53 server id 1  end_log_pos 449   Table_map: `test`.`t1` mapped to number 23
#150326 12:13:53 server id 1  end_log_pos 487   Write_rows: table id 23 flags: STMT_END_F
 
BINLOG '
cVoUVRMBAAAAKgAAAMEBAAAAABcAAAAAAAEABHRlc3QAAnQxAAIDAwAC
cVoUVRcBAAAAJgAAAOcBAAAAABcAAAAAAAEAAv/8AQAAAAMAAAA=
'/*!*/;
### INSERT INTO `test`.`t1`
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=3 /* INT meta=0 nullable=1 is_null=0 */
# at 487
#150326 12:13:53 server id 1  end_log_pos 514   Xid = 13
COMMIT/*!*/;
# at 514

From MySQL 5.6.22:

#150326 11:52:25 server id 1  end_log_pos 192 CRC32 0xcf566f36  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1427395945/*!*/;
SET @@session.pseudo_thread_id=1/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 192
#150326 11:52:25 server id 1  end_log_pos 238 CRC32 0x85143a10  Table_map: `test`.`t1` mapped to number 72
# at 238
#150326 11:52:25 server id 1  end_log_pos 282 CRC32 0xc79c4fe7  Write_rows: table id 72 flags: STMT_END_F
 
BINLOG '
aVUUVRMBAAAALgAAAO4AAAAAAEgAAAAAAAEABHRlc3QAAnQxAAIDAwACEDoUhQ==
aVUUVR4BAAAALAAAABoBAAAAAEgAAAAAAAEAAgAC//wBAAAAAQAAAOdPnMc=
'/*!*/;
### INSERT INTO `test`.`t1`
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1 /* INT meta=0 nullable=1 is_null=0 */
# at 282
#150326 11:52:25 server id 1  end_log_pos 313 CRC32 0xd6b11791  Xid = 15
COMMIT/*!*/;
# at 313



 Comments   
Comment by Kolbe Kegel (Inactive) [ 2015-03-26 ]

This is incredibly useful if a user finds strange binary log entires and needs to correlate them with the general log or audit log (or slow query log, or...).

Comment by Kolbe Kegel (Inactive) [ 2015-03-26 ]

Perhaps this is only a display issue in mysqlbinlog, where for some reason it simply skips showing information about the "Query" event? It only occurs for ROW-based events. The MariaDB behavior for other events is the same as MySQL... maybe it's related to changes in mysqlbinlog to facilitate display of GTID information?

#150326 12:31:54 server id 1  end_log_pos 511   Query   thread_id=3     exec_time=0     error_code=0
use `test`/*!*/;
SET TIMESTAMP=1427398314/*!*/;
SET @@session.pseudo_thread_id=3/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
DROP TABLE `t1` /* generated by server */
/*!*/;
# at 511

Comment by Elena Stepanova [ 2015-03-26 ]

We lost it somewhere in 10.0, 5.5 works all right.
Assigning to knielsen because he can probably find out faster than I where and why it got lost.

Scenario

Scenario:
- connect to the server
- run create table
- run insert
- run insert
- disconnect 
- connect to the server
- run insert

Binlog from 10.0

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150327  0:00:24 server id 2  end_log_pos 248 	Start: binlog v 4, server v 10.0.17-MariaDB-debug-log created 150327  0:00:24 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
# at 248
#150327  0:00:24 server id 2  end_log_pos 273 	Gtid list []
# at 273
#150327  0:00:24 server id 2  end_log_pos 316 	Binlog checkpoint wheezy-64-bin.000001
# at 316
#150327  0:00:32 server id 2  end_log_pos 354 	GTID 0-2-1
/*!100001 SET @@session.gtid_domain_id=0*//*!*/;
/*!100001 SET @@session.server_id=2*//*!*/;
/*!100001 SET @@session.gtid_seq_no=1*//*!*/;
# at 354
#150327  0:00:32 server id 2  end_log_pos 440 	Query	thread_id=4	exec_time=1	error_code=0
use `test`/*!*/;
SET TIMESTAMP=1427403632/*!*/;
SET @@session.pseudo_thread_id=4/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
create table t1 (i int)
/*!*/;
# at 440
#150327  0:00:35 server id 2  end_log_pos 478 	GTID 0-2-2
/*!100001 SET @@session.gtid_seq_no=2*//*!*/;
BEGIN
/*!*/;
# at 478
# at 519
#150327  0:00:35 server id 2  end_log_pos 519 	Table_map: `test`.`t1` mapped to number 70
#150327  0:00:35 server id 2  end_log_pos 558 	Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
###   @1=1
### INSERT INTO `test`.`t1`
### SET
###   @1=2
# at 558
#150327  0:00:35 server id 2  end_log_pos 585 	Xid = 6
COMMIT/*!*/;
# at 585
#150327  0:00:37 server id 2  end_log_pos 623 	GTID 0-2-3
/*!100001 SET @@session.gtid_seq_no=3*//*!*/;
BEGIN
/*!*/;
# at 623
# at 664
#150327  0:00:37 server id 2  end_log_pos 664 	Table_map: `test`.`t1` mapped to number 70
#150327  0:00:37 server id 2  end_log_pos 703 	Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
###   @1=3
### INSERT INTO `test`.`t1`
### SET
###   @1=4
# at 703
#150327  0:00:37 server id 2  end_log_pos 730 	Xid = 7
COMMIT/*!*/;
# at 730
#150327  0:00:42 server id 2  end_log_pos 768 	GTID 0-2-4
/*!100001 SET @@session.gtid_seq_no=4*//*!*/;
BEGIN
/*!*/;
# at 768
# at 809
#150327  0:00:42 server id 2  end_log_pos 809 	Table_map: `test`.`t1` mapped to number 70
#150327  0:00:42 server id 2  end_log_pos 848 	Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
###   @1=5
### INSERT INTO `test`.`t1`
### SET
###   @1=6
# at 848
#150327  0:00:42 server id 2  end_log_pos 875 	Xid = 13
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Binlog from 5.5

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150327  0:01:51 server id 3  end_log_pos 245 	Start: binlog v 4, server v 5.5.42-MariaDB-debug-log created 150327  0:01:51 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
# at 245
#150327  0:01:57 server id 3  end_log_pos 331 	Query	thread_id=2	exec_time=1	error_code=0
use `test`/*!*/;
SET TIMESTAMP=1427403717/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
create table t1 (i int)
/*!*/;
# at 331
#150327  0:01:59 server id 3  end_log_pos 399 	Query	thread_id=2	exec_time=0	error_code=0
SET TIMESTAMP=1427403719/*!*/;
BEGIN
/*!*/;
# at 399
# at 440
#150327  0:01:59 server id 3  end_log_pos 440 	Table_map: `test`.`t1` mapped to number 33
#150327  0:01:59 server id 3  end_log_pos 479 	Write_rows: table id 33 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
###   @1=1
### INSERT INTO `test`.`t1`
### SET
###   @1=2
# at 479
#150327  0:01:59 server id 3  end_log_pos 506 	Xid = 7
COMMIT/*!*/;
# at 506
#150327  0:02:02 server id 3  end_log_pos 574 	Query	thread_id=2	exec_time=0	error_code=0
SET TIMESTAMP=1427403722/*!*/;
BEGIN
/*!*/;
# at 574
# at 615
#150327  0:02:02 server id 3  end_log_pos 615 	Table_map: `test`.`t1` mapped to number 33
#150327  0:02:02 server id 3  end_log_pos 654 	Write_rows: table id 33 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
###   @1=3
### INSERT INTO `test`.`t1`
### SET
###   @1=4
# at 654
#150327  0:02:02 server id 3  end_log_pos 681 	Xid = 8
COMMIT/*!*/;
# at 681
#150327  0:02:17 server id 3  end_log_pos 749 	Query	thread_id=3	exec_time=0	error_code=0
SET TIMESTAMP=1427403737/*!*/;
BEGIN
/*!*/;
# at 749
# at 790
#150327  0:02:17 server id 3  end_log_pos 790 	Table_map: `test`.`t1` mapped to number 33
#150327  0:02:17 server id 3  end_log_pos 829 	Write_rows: table id 33 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
###   @1=5
### INSERT INTO `test`.`t1`
### SET
###   @1=6
# at 829
#150327  0:02:17 server id 3  end_log_pos 856 	Xid = 15
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Comment by Kristian Nielsen [ 2015-03-26 ]

The thread_id is part of query events.

In MySQL, and MariaDB <= 5.5, transactions start with a BEGIN query event.

From MariaDB 10.0, a GTID event is used instead of the BEGIN query event. That
is why the thread id is there for MySQL row-based binlog, but not for MariaDB
10.0.

The thread_id was not included in the GTID to save space in the binlog, as it
is not used by the server (I was not aware of it being useful for debugging).

Elena, I'm closing as "not a bug", but feel free to do something else with it
if you find that more appropriate. I hope the above at least explains what is
going on.

Comment by Elena Stepanova [ 2015-03-26 ]

Yes, thanks, it's clear now.

I'm reopening it as I beg you to reconsider (if possible, for 10.0, otherwise at least for 10.1).
Not having thread_id is very bad for investigation, both while searching for events generated by a particular thread, and while trying to understand the concurrent flow. I used it plenty of times before, it's a miracle I didn't notice it disappear (actually, not a miracle, because I tend to dump all replication-related investigation on you).
I can convert it to a feature request if you so prefer.

Comment by Kolbe Kegel (Inactive) [ 2015-03-26 ]

Yes, I agree. We should add this back ASAP. It's not acceptable to omit thread_id from the binlog.

Comment by Elena Stepanova [ 2015-07-03 ]

knielsen, serg, monty,
Please, please, could it be implemented? It's really difficult to analyze problems without it.
I"m setting it to be fixed in 10.0/10.1 to make sure it's back on the radar, but if it cannot go to 10.0, please feel free to adjust the value.

Comment by Kristian Nielsen [ 2015-07-05 ]

Why not instead put the GTID into the slow log / general log / audit log?
That would seem more useful, and still allows to cross-reference.
And it does not penalise the performance of normal production systems.

Comment by Elena Stepanova [ 2015-07-05 ]

I'm okay with that as well, as long as it's safe for whatever popular tools exist for working with these logs. (I don't know what those tools will be, if any, as I use my hand-made ones; but I assume if there are some, kolbe will know).
We know for sure that having thread_id in the binary log won't break anything, because it was there before, and partially still is.

knielsen,
On the second thought – but it's impossible?
At least in the general log, queries appear before they are executed, so they can't have GTID yet? And making them appear afterwards would be a disaster, because currently it's often the only way to find out which query the server crashed upon.
Adding them to the slow log or audit log only does not make any sense, the whole point is to have the complete sequence.

Comment by Kolbe Kegel (Inactive) [ 2015-07-06 ]

knielsen, GTID also included in other logs would be good, but it doesn't resolve the issue entirely. GTIDs in the binary log do not help to determine which statements were executed by a particular thread, do they? Being able to see the full sequence of actions executed by a single thread is an important use case.

And Elena's point about the general log is crucial. It is not possible to get GTIDs into the general log; it seems like the only chance is to use thread_id from the binlog to compare with thread_id from the general log.

Comment by Sergei Golubchik [ 2016-06-15 ]

Isn't it impossible to fix now? If only Query_log_event has thread_id, then the only way to see the thread_id is to have at least one Query_log_event in the log. If the transaction consists entirely of row events — binlog will not have thread_id. There is no way to change it without changing binlog event format, like, introducing new event type GTID_WITH_THREAD_ID_EVENT.

Comment by Andrei Elkin [ 2020-12-01 ]

ralf.gebhardt: from the pov of drop-in-replacement it's the bug. By FR I meant that the maria users never saw the required 'feature' in maria's GTID. Which of the two views on the ticket should be taken - is a good question, thank you!

Comment by Sergei Golubchik [ 2020-12-02 ]

Elkin, where do you plan to store the thread_id?

Comment by Andrei Elkin [ 2020-12-02 ]

serg Into Gtid_log_event::{flags,data}. (The new format would be readable/processable without any error by OLD programs, as usual). I've not made it yet fully clear for myself whether the flag/data should be set ON/IN always, or we do that optional, still I lean towards the optional.

Comment by Sachin Setiya (Inactive) [ 2021-04-16 ]

code is in this branch bb-10.3-MDEV-21117-7820

Comment by Andrei Elkin [ 2021-04-21 ]

The patch is generally good, but I had notes to the testing part, please follow up on github.

Comment by Sachin Setiya (Inactive) [ 2021-04-23 ]

Hi Elkin, Updated patch location https://github.com/MariaDB/server/commit/f89e3d48a2fc1cd4ac5be79d99afb1d2f3d6e144

Comment by Andrei Elkin [ 2021-12-27 ]

Richard, sort of. The developer left the company and some of his uncompleted work may not
in proper state. I'll update this one which is on its way to finish. Thanks for the heads up!

Comment by Andrei Elkin [ 2022-02-28 ]

I am taking over to complete.

Comment by Ian Gilfillan [ 2023-03-14 ]

Elkin is there an update on this issue? The fix version should at least probably be updated

Comment by Andrei Elkin [ 2023-03-14 ]

greenman, thanks for reminding on it! The progress stoled near the end. We need to find a couple of days time slot for completion. The current sprint looks very hard though as myself am going to have over a week off.

Comment by Brandon Nesterenko [ 2023-07-11 ]

Howdy Andrei!

This is ready for review as PR 2693

Comment by Andrei Elkin [ 2023-09-06 ]

Review will be completed within few days.

Comment by Kristian Nielsen [ 2023-10-25 ]

This surely isn't for stable release, let alone ancient 10.4.
It changes the format of the binlog file, the GTID event is extended. That's not appropriate for a stable release.
This should go to a preview release and then 11.4.

- Kristian.

Comment by Michael Widenius [ 2023-10-26 ]

knielsen: Yes, this will be added to 11.4, not 10.4.
I agree that we should not extend events in older releases, except if it cannot be avoidable because a serious crashing/data corruption bug.
This is not such a case.

Comment by Andrei Elkin [ 2023-10-31 ]

Review is done, its notes are incorportated to bb-10.5-andrei.
This branch needs rebasing onto 11.4.

Comment by Brandon Nesterenko [ 2023-11-30 ]

Hi Andrei!

This is ready for review: PR-2886

Comment by Roel Van de Paar [ 2023-12-18 ]

Created MDEV-33048 Assertion `info->end_of_file - (info->append_read_pos-info->write_buffer) == (res= inline_mysql_file_tell("/test/bb-11.4-MDEV-7850_dbg/mysys/mf_iocache2.c", 151, info->file, (myf) (0)))' failed.

Comment by Roel Van de Paar [ 2024-01-20 ]

OK to push. w/ Thanks to bnestere

Comment by Brandon Nesterenko [ 2024-01-22 ]

Pushed into 11.4 as c37b2087b4a

Comment by Michael Widenius [ 2024-01-24 ]

The patch has to be reverted as it is not backward and forward compatible.
This is a requirement for us to be able to use backport it to earlier releases if needed

Comment by Andrei Elkin [ 2024-01-24 ]

monty, in what sense it not compatible? Do you have a scenario in mind? (just in case this is just a mysqlbinlog feature to print extra information, think of `-v` too).

Comment by Brandon Nesterenko [ 2024-01-25 ]

FWIW, I manually tested replicating to/from various versions of MariaDB (and MySQL 5.7, IIRC) earlier in the development process with this patch, and I didn't see anything broken. Then just now re-tested 11.4->10.6 and 10.6->11.4 and it still seemed to work.

Comment by Michael Widenius [ 2024-01-27 ]

Have you tested with MySQL 5.6 or MariaDB 5.5 ?

In Query_log_event::begin_event() we have the code:

DBUG_ASSERT(data_len == LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN ||
data_len == LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2);
if (data_len != LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN &&
data_len != LOG_EVENT_HEADER_LEN + GTID_HEADER_LEN + 2)
return 1;

The new code adds 9 to event length of gtid-events.

Assume we have the following replication setup:
master (11.4) -> slave 11.3 (with shorter gtid-events) -> slave 5.5 (without gtid)

I assume this scenario could be possible if the gtid event from 11.4 will be saved in the slave binary log as such:
The slave 11.3 will, when sending events to the 5.5 slave, read a long gtid-event and then try to convert it to a begin event for the 5.5 slave, which will fail as the data length does not match to the above assert.

Has this scenario been tested or do we know that the 11.3 slave will write a short binlog event to it's binary log?

Comment by Michael Widenius [ 2024-01-29 ]

kolbe We have now an implementation for thread_id in the binary log
How this should work on the master is easy (just store the thread_id from the thread the executed the query)
However on the slave, there is an open question:
What should thread_id be in the slave's binary log for event that comes from a master?
The current implementation stores the slave threads thread_id in the events that the slave generates. I have been told this is also what MySQL is doing.
This means that if the slave has only one applier thread, all events will have the thread_id. (Which I do not find very useful)
The other option is that we preserver the original thread_id (and maybe also the internal commit_id (= query_id)) from the master.
This would make it possible to track a binary log event back to the original master (thanks to server_id) and to the master's binary log.

What would you find to be the more useful value for the thread_id on the slave?

Comment by Roel Van de Paar [ 2024-02-01 ]

kolbek (using your personal account) ^ FYI

Comment by Roel Van de Paar [ 2024-02-01 ]

Given the changes, this will require re-testing once reviewed.

Comment by Brandon Nesterenko [ 2024-02-05 ]

Reverted from 11.4 to re-target 11.5 for ample time to re-test with thread_id preservation from master.

Generated at Thu Feb 08 07:22:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.