[MDEV-17156] Local transactions on a Slave don't update GTID's gtid_current_pos after RESET MASTER on Slave (master_use_gtid value is not relevant) Created: 2018-09-07  Updated: 2020-08-25  Resolved: 2018-10-31

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.33
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Claudio Nanni Assignee: Andrei Elkin
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-10279 gtid_current_pos is not updated with ... Open
relates to MDEV-17368 RESET MASTER doesn't completely reset... Closed
relates to MDEV-17369 Document how to reset gtid_current_pos Closed
relates to MDEV-17370 Create method to purge a specific dom... Closed
relates to MDEV-17853 Document that gtid_binlog_pos can lag... Closed
relates to MDEV-17867 Make RESET MASTER throw GTID-related ... Open
relates to MDEV-16834 GTID current_pos easily breaks replic... Closed
relates to MDEV-20122 Deprecate MASTER_USE_GTID=Current_Pos... Closed

 Description   

Server A = Master id=10133
Server B = Slave id=20133

{{MASTER_USE_GTID = slave_pos|current_pos }} (not relevant)

Writing transactions on both A and B properly updates the global variable gtid_current_pos on the Slave (B).

If I issue RESET MASTER on the Slave gtid_current_pos on the Slave stops being updated.

I am aware of Kristian's comment on https://jira.mariadb.org/browse/MDEV-10279 but I still am not convinced.

MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
+------------------------+------------+
| Variable_name          | Value      |
+------------------------+------------+
| gtid_binlog_pos        |            |
| gtid_binlog_state      |            |
| gtid_current_pos       | 0-10133-40 |
| gtid_domain_id         | 0          |
| gtid_ignore_duplicates | OFF        |
| gtid_slave_pos         | 0-10133-40 |
| gtid_strict_mode       | OFF        |
+------------------------+------------+
7 rows in set (0.00 sec)
 
MariaDB [test]> insert into a values (22);
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
+------------------------+------------+
| Variable_name          | Value      |
+------------------------+------------+
| gtid_binlog_pos        | 0-20133-1  |
| gtid_binlog_state      | 0-20133-1  |
| gtid_current_pos       | 0-10133-40 |
| gtid_domain_id         | 0          |
| gtid_ignore_duplicates | OFF        |
| gtid_slave_pos         | 0-10133-40 |
| gtid_strict_mode       | OFF        |
+------------------------+------------+
7 rows in set (0.00 sec)

Look what happens when the sequence number of the local transaction becomes higher than the sequence number of the latest transaction applied as Slave (gtid_slave_pos):

Generate ~20 transactions on the Slave

MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
+------------------------+------------+
| Variable_name          | Value      |
+------------------------+------------+
| gtid_binlog_pos        | 0-20133-23 |
| gtid_binlog_state      | 0-20133-23 |
| gtid_current_pos       | 0-10133-40 |
| gtid_domain_id         | 0          |
| gtid_ignore_duplicates | OFF        |
| gtid_slave_pos         | 0-10133-40 |
| gtid_strict_mode       | OFF        |
+------------------------+------------+
7 rows in set (0.00 sec)

_Generate other ~20 transactions on the Slave _

MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
+------------------------+------------+
| Variable_name          | Value      |
+------------------------+------------+
| gtid_binlog_pos        | 0-20133-40 |
| gtid_binlog_state      | 0-20133-40 |
| gtid_current_pos       | 0-10133-40 |
| gtid_domain_id         | 0          |
| gtid_ignore_duplicates | OFF        |
| gtid_slave_pos         | 0-10133-40 |
| gtid_strict_mode       | OFF        |
+------------------------+------------+
7 rows in set (0.00 sec)

And one more...

MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
+------------------------+------------+
| Variable_name          | Value      |
+------------------------+------------+
| gtid_binlog_pos        | 0-20133-41 |
| gtid_binlog_state      | 0-20133-41 |
| gtid_current_pos       | *0-20133-41* | 
| gtid_domain_id         | 0          |
| gtid_ignore_duplicates | OFF        |
| gtid_slave_pos         | 0-10133-40 |
| gtid_strict_mode       | OFF        |
+------------------------+------------+
7 rows in set (0.01 sec)

Note gtid_current_pos changing value now from 0-10133-40 to 0-20133-41

To be said that clearing gtid_slave_pos "solves" that but of course it has its consequences.

From what I see due to the fact that the locally generated transaction has a lower sequence number it's basically ignored until the sequence number surpasses the one contained in gtid_slave_pos

I don't know if this is an intended behaviour.

Manual page says: https://mariadb.com/kb/en/library/gtid/#gtid_current_pos

[1]This variable is the GTID of the last change to the database for each replication domain. Such changes can either be master events (ie. local changes made by user or application), or replicated events originating from another master server.

[2]For each replication domain, if the server ID of the corresponding GTID in @@gtid_binlog_pos is equal to the servers own server_id, and the sequence number is higher than the corresponding GTID in @@gtid_slave_pos, then the GTID from @@gtid_binlog_pos will be used. Otherwise the GTID from @@gtid_slave_pos will be used for that domain.

[3]Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave. This value is used as the starting point of replication when a slave is configured with CHANGE MASTER TO master_use_gtid=current_pos.

[4]The value is read-only, but it is updated whenever a DML or DDL statement is written to the binary log and/or replicated by a slave thread.

If you read paragraph [2] explains exactly the behaviour, and so it seems documented (even if not clear to me the rationale):
"and the sequence number is higher than the corresponding GTID in @@gtid_slave_pos" ,

but paragraph [3] says:
"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."

Which is not true, gtid_current_pos does not contain the most recent GTID executed on the server, at least not until the sequence number is greater than the one in gtid_slave_pos.

I don't know if it's just the documentation to be updated or there is something else.



 Comments   
Comment by Andrei Elkin [ 2018-09-17 ]

claudio.nanni, hello.

To
>but paragraph [3] says:
>"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."

'the most recent' is defined in terms of the gtid sequence number. As the master and slave share the domain (0) the most recent
was the master's 40 all the way until the slave has generated 41th.
I hope this clarifies away your confusion.

Andrei

PS.

to
> {{MASTER_USE_GTID = slave_pos|current_pos }} (not relevant)
actually when {{ MASTER_USE_GTID = slave_pos }} it really does not matter what slave topped over the mater's replicated gtid. But does matter when the mode is {{ current_pos }}.

Comment by Claudio Nanni [ 2018-09-18 ]

To
>but paragraph [3] says:
>"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."
'the most recent' is defined in terms of the gtid sequence number. As the master and slave share the domain (0) the most recent
was the master's 40 all the way until the slave has generated 41th.
I hope this clarifies away your confusion.

I still think it's wrong.
Moreover I think this is also due to a flaw in GTID design.
The slaves are aware of the latest sequence number applied by the Master but no one is aware of the latest sequence number applied by any Slave, in a simple setup, the Master doesn't know (doesn't have a feedback mechanism for it) that the Slave generated a sequence number.
The fact that the gtid_current_pos only starts being updated when the local sequence number is higher than the latest sequence number arrived from the Master (gtid_slave_pos) does not make sense to me.

Read again:
"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."

That variable must be updated by definition, and it makes no sense that it isn't, what's the logic?
Moreover maybe this Slave won't ever be slave of that Master again.

Wouldn't it be simpler and more logical to just always update gtid_current_pos with local transactions?

Comment by Andrei Elkin [ 2018-09-21 ]

gtid_current_pos is updated this way for the reason that it holds 'the most recent' value in the logical time sense.
So it remained 40 until the slave server itself (as a "master") generated 41 and then the clock steps.
Gtid - current_pos and slave_pos are clocks really, so they can't go backward.

To gtid_current_pos specifically KN admitted that Change-Master's MASTER_USE_GTID = current_pos confused many people.
Oth to understand gtid_slave_pos must be comparatively easy.
What practical purpose do you really need gtid_current_pos for?

Comment by Claudio Nanni [ 2018-10-23 ]

Andrei,

> Oth to understand gtid_slave_pos must be comparatively easy.
>What practical purpose do you really need gtid_current_pos for?

Yes, gtid_slave_pos would be an easier choice indeed but MaxScale uses gtid_current_pos for it's failover mechanism, maybe there is a valid reason for them to use gtid_slave_pos.

Comment by Andrei Elkin [ 2018-10-31 ]

claudio.nanni, salute.

I suggest we sum up what the docs say and how that matches our observations.

1. This report's synopsis states

gtid_current_pos is not updated until a GTID arrived with gtid_seq_no > max(d), where d is a gtid domain present in gtid_current_pos.

Such behavior matches the docs when 'recent' is understood in the logical time sense.

2. Notice that gtid_binlog_state may show changes but only for a reason
that this array collects GTID per server. Union of its GTID:s relating to any domain in gtid_current_pos can produce a value at most matching or less than
one in the latter array.

3. There is unreported observation that despite gtid_strict_mode = ON
the slave is able to update gtid_binlog_state even though those updates
having gtid_seq_no < max(d), max(d) = 41 in your example.
This is benign by gtid_strict_mode definition as the strict mode rules
apply exclusively to binlogged set of GTID:s. So once they are removed (RESET MASTER) less than 41 local GTID:s also gets accepted (over again).

All in all I don't see there is anything that sticks out as questionable though
I admit we may challenge design decisions from perspective of intuitive understanding and easy-doing. For instance, I am unhappy myself to find out
gtid_strict_mode is actually "gtid_binlog_strict_mode". I really thought it should apply generally, including --skip-log-bin or --log-slave-updates=0 slaves, which it does not.

Regardless of our dissatisfaction to this part the current issue is
not a bug in my opinion and should be addressed by extending our documentation
on three referred items:

gtid_current_pos - should be stressed on that it's per domain array
gtid_binlog_state - it's per server-id one
gtid_strict_mode - makes sense only for binlogging slave

Feel free to revert the issue status, should you have more feedback.

Cheers,

Andrei

Comment by Andrei Elkin [ 2018-10-31 ]

Reported traces of execution match definitions of the slave GTID state descriptors, as
extensively commented out. Specifically to slave_current_pos it is updated correctly.

Generated at Thu Feb 08 08:34:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.