[MDEV-20122] Deprecate MASTER_USE_GTID=Current_Pos to favor new MASTER_DEMOTE_TO_SLAVE option - Jira

Geoff Montee (Inactive) created issue - 2019-07-22 22:45

Geoff Montee (Inactive) made changes - 2019-07-22 22:56

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-16834~~ [ ~~MDEV-16834~~ ]

Geoff Montee (Inactive) made changes - 2019-07-23 00:28

Link

This issue relates to ~~MDEV-17156~~ [ ~~MDEV-17156~~ ]

Geoff Montee (Inactive) made changes - 2019-07-23 00:28

Link

This issue relates to MDEV-10279 [ MDEV-10279 ]

Andrei Elkin added a comment - 2019-07-23 07:40 - edited

GeoffMontee, thanks for the report and analysis done. We might consider your proposals. However, let me first copy-paste a mail pertaining to ~~MDEV-18404~~ discussion with knielsen about 'current_pos', its goal, semantics and, defacto, a recommendation not to use it.

Could we work this case around with switching to slave_pos instead of elaborating
current_pos? Could you please consider that first. [While I am on vacation, I can read mails anyway. Feel free to escalate the issue if really necessary so my colleagues could start working on it earlier than when I am back. ]

Quote, unquote:
_The @@gtid_current_pos exists for one sole purpose. This is to let the user
promote a slave as the new master and attach the old master as a slave to
the new master.
By using master_use_gtid=current_pos, the exact same command can be used to
attach a slave to the new master, regardless of whether that slave was
previously a slave or a master:

CHANGE MASTER TO master_host=new_promoted_master

If not using gtid_current_pos (ie master_use_gtid=slave_pos), then to let
the old master become a slave of the new master, the old master's position
must explicitly be set:

SET GLOBAL gtid_slave_pos=@@gtid_binlog_pos

This is because for efficiency reasons, the master doesn't update the
mysql.gtid_slave_pos in each commit.

So now we can see why only GTIDs with the servers own server_id should
contribute to @@gtid_current_pos. If a GTID was replicated from another
server, that GTID will appear in the @@gtid_slave_pos. If the GTID
originated on this server, it will appear in @@gtid_binlog_pos. The
@@gtid_current_pos is the @@gtid_slave_pos extended with GTIDs originating
on this server, hence only GTIDs with our own server id.

Normally, every GTID in the binlog with a different server id than our own
will already be in the @@gtid_slave_pos as well - since it originated on
another server and was replicated to this server.

Thus, in the normal case, where user did not play tricks with the binlog and
slave state, extending the @@gtid_current_pos as suggested in ~~MDEV-18404~~ has
no effect - the GTIDs are already in the @@gtid_slave_pos, so
@@gtid_current_pos is unaffected.

And in case the user deliberately modified the state, it should be up to the
user to decide what goes into @@gtid_slave_state and @@gtid_binlog_state.
For example, the ~~MDEV-18404~~ change would make it impossible on a server to
remove a replicated GTID from the @@gtid_current_pos if --log-slave-updates
(without the drastic RESET MASTER).

Another problem is that the server cannot reliably compare GTIDs with
distinct server ids to decide which one is the most recent. There is no
guarantee that sequence numbers are monotonic across different server ids.
Thus the ~~MDEV-18404~~ method could create completely invalid positions in
some setups where @@gtid_strict_mode=0 and replication domains are not
strictly maintained.

I don't see the value in ~~MDEV-18404~~. If the user is updating
@@gtid_binlog_state (itself a very drastic operation), and wants a specific
GTID to go into the slave position - just update the @@gtid_slave_pos with
the desired GTID, don't leave the server with an inconsistent replication
state.
_

*And finally, let me reiterate: I consider the @@gtid_current_pos a design
mistake. Better to just transfer the @@gtid_binlog_pos to the
@@gtid_slave_pos only at the point where an old master is turned into a
slave. This can be done manually already, and it would be simple to
implement automatic support for this with an extra option for CHANGE MASTER.
*_

Andrei Elkin added a comment - 2019-07-23 07:40 - edited GeoffMontee , thanks for the report and analysis done. We might consider your proposals. However, let me first copy-paste a mail pertaining to MDEV-18404 discussion with knielsen about 'current_pos', its goal, semantics and, defacto, a recommendation not to use it. Could we work this case around with switching to slave_pos instead of elaborating current_pos ? Could you please consider that first. [While I am on vacation, I can read mails anyway. Feel free to escalate the issue if really necessary so my colleagues could start working on it earlier than when I am back. ] Quote, unquote: _The @@gtid_current_pos exists for one sole purpose. This is to let the user promote a slave as the new master and attach the old master as a slave to the new master. By using master_use_gtid=current_pos, the exact same command can be used to attach a slave to the new master, regardless of whether that slave was previously a slave or a master: CHANGE MASTER TO master_host=new_promoted_master If not using gtid_current_pos (ie master_use_gtid=slave_pos), then to let the old master become a slave of the new master, the old master's position must explicitly be set: SET GLOBAL gtid_slave_pos=@@gtid_binlog_pos This is because for efficiency reasons, the master doesn't update the mysql.gtid_slave_pos in each commit. So now we can see why only GTIDs with the servers own server_id should contribute to @@gtid_current_pos. If a GTID was replicated from another server, that GTID will appear in the @@gtid_slave_pos. If the GTID originated on this server, it will appear in @@gtid_binlog_pos. The @@gtid_current_pos is the @@gtid_slave_pos extended with GTIDs originating on this server, hence only GTIDs with our own server id. Normally, every GTID in the binlog with a different server id than our own will already be in the @@gtid_slave_pos as well - since it originated on another server and was replicated to this server. Thus, in the normal case, where user did not play tricks with the binlog and slave state, extending the @@gtid_current_pos as suggested in MDEV-18404 has no effect - the GTIDs are already in the @@gtid_slave_pos, so @@gtid_current_pos is unaffected. And in case the user deliberately modified the state, it should be up to the user to decide what goes into @@gtid_slave_state and @@gtid_binlog_state. For example, the MDEV-18404 change would make it impossible on a server to remove a replicated GTID from the @@gtid_current_pos if --log-slave-updates (without the drastic RESET MASTER). Another problem is that the server cannot reliably compare GTIDs with distinct server ids to decide which one is the most recent. There is no guarantee that sequence numbers are monotonic across different server ids. Thus the MDEV-18404 method could create completely invalid positions in some setups where @@gtid_strict_mode=0 and replication domains are not strictly maintained. I don't see the value in MDEV-18404 . If the user is updating @@gtid_binlog_state (itself a very drastic operation), and wants a specific GTID to go into the slave position - just update the @@gtid_slave_pos with the desired GTID, don't leave the server with an inconsistent replication state. _ *And finally, let me reiterate: I consider the @@gtid_current_pos a design mistake. Better to just transfer the @@gtid_binlog_pos to the @@gtid_slave_pos only at the point where an old master is turned into a slave. This can be done manually already, and it would be simple to implement automatic support for this with an extra option for CHANGE MASTER. *_

Geoff Montee (Inactive) added a comment - 2019-07-23 08:42 - edited

Hi Elkin,

While I am on vacation, I can read mails anyway. Feel free to escalate the issue if really necessary so my colleagues could start working on it earlier than when I am back.

Thanks for the response! This issue isn't really urgent. I hope you enjoy your vacation!

However, let me first copy-paste a mail pertaining to ~~MDEV-18404~~ discussion with Kristian Nielsen about 'current_pos', its goal, semantics and, defacto, a recommendation not to use it.

I'm not quite sure how ~~MDEV-18404~~ is relevant to this specific issue, but I appreciate you sharing Kristian's comments on that issue. My reasoning for submitting ~~MDEV-18404~~ was only related to my interest in finding a way to back up and restore GTID state using Mariabackup. I agree that it would not generally be a good idea to manually try out the steps in ~~MDEV-18404~~. However, I was trying to determine if it would be feasible for Mariabackup to back up and restore a server's gtid_binlog_pos value without backing up all of the binary logs. Mariabackup already backs up and restores a server's gtid_slave_pos, since it is stored in an InnoDB table. The full details are in MDEV-18405. But anyway, after finding out from Kristian that gtid_current_pos intentionally excludes transactions from gtid_binlog_pos that don't have the server's own server_id component, I mentioned in MDEV-18405 that if we want to back up and restore gtid_binlog_pos, then it would probably make more sense to restore its value to gtid_slave_pos. I see Kristian's perspective, and I don't have any issue with it.

Regardless, I understand the purpose of current_pos, and I understand the semantics. I also understand that it is very easy for users to accidentally break a slave using current_pos. Currently, if a slave is using current_pos, then the slave doesn't do anything to try to detect if the user has done any unsafe operations that may cause the slave to break.

If we want to continue to support current_pos, then I am just suggesting that the slave should try to detect if the user has done any unsafe operations that may cause the slave to break. Maybe it could write a warning to the error log. Maybe the warning should suggest that the user may want to switch to slave_pos instead.

Could we work this case around with switching to slave_pos instead of elaborating
current_pos? Could you please consider that first.

Yes, I always recommend to use slave_pos, rather than current_pos.

Our Mariabackup documentation on how to build a slave also recommends to use slave_pos.

https://mariadb.com/kb/en/library/setting-up-a-replication-slave-with-mariabackup/#gtids

However, a lot of users are already using current_pos for whatever reason.

And finally, let me reiterate: I consider the @@gtid_current_pos a design
mistake. Better to just transfer the @@gtid_binlog_pos to the
@@gtid_slave_pos only at the point where an old master is turned into a
slave. This can be done manually already, and it would be simple to
implement automatic support for this with an extra option for CHANGE MASTER.

Do we have plans to remove current_pos or change the way it works?

Geoff Montee (Inactive) added a comment - 2019-07-23 08:42 - edited Hi Elkin , While I am on vacation, I can read mails anyway. Feel free to escalate the issue if really necessary so my colleagues could start working on it earlier than when I am back. Thanks for the response! This issue isn't really urgent. I hope you enjoy your vacation! However, let me first copy-paste a mail pertaining to MDEV-18404 discussion with Kristian Nielsen about 'current_pos', its goal, semantics and, defacto, a recommendation not to use it. I'm not quite sure how MDEV-18404 is relevant to this specific issue, but I appreciate you sharing Kristian's comments on that issue. My reasoning for submitting MDEV-18404 was only related to my interest in finding a way to back up and restore GTID state using Mariabackup. I agree that it would not generally be a good idea to manually try out the steps in MDEV-18404 . However, I was trying to determine if it would be feasible for Mariabackup to back up and restore a server's gtid_binlog_pos value without backing up all of the binary logs. Mariabackup already backs up and restores a server's gtid_slave_pos, since it is stored in an InnoDB table. The full details are in MDEV-18405 . But anyway, after finding out from Kristian that gtid_current_pos intentionally excludes transactions from gtid_binlog_pos that don't have the server's own server_id component, I mentioned in MDEV-18405 that if we want to back up and restore gtid_binlog_pos, then it would probably make more sense to restore its value to gtid_slave_pos. I see Kristian's perspective, and I don't have any issue with it. Regardless, I understand the purpose of current_pos, and I understand the semantics. I also understand that it is very easy for users to accidentally break a slave using current_pos. Currently, if a slave is using current_pos, then the slave doesn't do anything to try to detect if the user has done any unsafe operations that may cause the slave to break. If we want to continue to support current_pos, then I am just suggesting that the slave should try to detect if the user has done any unsafe operations that may cause the slave to break. Maybe it could write a warning to the error log. Maybe the warning should suggest that the user may want to switch to slave_pos instead. Could we work this case around with switching to slave_pos instead of elaborating current_pos? Could you please consider that first. Yes, I always recommend to use slave_pos, rather than current_pos. Our Mariabackup documentation on how to build a slave also recommends to use slave_pos. https://mariadb.com/kb/en/library/setting-up-a-replication-slave-with-mariabackup/#gtids However, a lot of users are already using current_pos for whatever reason. And finally, let me reiterate: I consider the @@gtid_current_pos a design mistake. Better to just transfer the @@gtid_binlog_pos to the @@gtid_slave_pos only at the point where an old master is turned into a slave. This can be done manually already, and it would be simple to implement automatic support for this with an extra option for CHANGE MASTER. Do we have plans to remove current_pos or change the way it works?

Julien Fritsch made changes - 2019-10-28 10:39

Assignee

Andrei Elkin [ elkin ]

Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]

Julien Fritsch made changes - 2019-10-28 10:39

Assignee

Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]

Andrei Elkin [ elkin ]

Julien Fritsch made changes - 2020-05-18 07:27

Priority

Major [ 3 ]

Critical [ 2 ]

Sachin Setiya (Inactive) made changes - 2020-08-20 13:33

Assignee

Andrei Elkin [ elkin ]

Sachin Setiya [ sachin.setiya.007 ]

Sujatha Sivakumar (Inactive) made changes - 2020-10-15 04:22

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Sujatha Sivakumar [ sujatha.sivakumar ]

Sujatha Sivakumar (Inactive) made changes - 2020-10-16 07:35

Status

Open [ 1 ]

In Progress [ 3 ]

Andrei Elkin added a comment - 2020-10-16 11:24

GeoffMontee, howdy.

Let's first settle our opinions on the Semantics of MASTER_USE_GTID=current_pos.

Like slave_pos it's a form of connection mode that presents a
slave's gtid state to Master. That is, on the slave server it only
affects the slave IO thread. The current_pos mode is made
IO to regard gtid_current_pos as the slave gtid state. More
specifically IO acquires a snapshot of gtid_current_pos at
connecting time to present it to Master. Master is to validate the
slave's state.

When later, after the successful validation is done,
gtid_current_pos is locally updated it must be fair to claim that
the local update may not affect the current slave connection. Even
when it comes to the inconsistency matter, then it will have been
caught when (though not necessarily instantly) Slave executes
events and gtid_strict_mode is set.

Notice too, that the preferred slave_pos mode is also vulnerable
to the current issue in the multi-source scenario. The second source
playing a role of local connection desynchronizes slave_pos state.

Personally I prefer this interpretation of a "dumb" simple IO that
is not concerned with what gtids it carries in.

Secondly, to learn by Slave about potential inconsistency might be useful though.
A watching mechanism should error log online changes to gtid_current_pos or gtid_slave_pos done
slave locally or through second source in the domains of concern.
E.g when a replication source is defined as
CHANGE MASTER ... do_domain_ids = (d1,d2) that would be domains d1 or d2.

I would limit this watcher to gtid_strict_mode = ON.

We're considering its technical implementation as IO:s would do
the marking, local transaction handlers and slave appliers would do
the checking and warning. This method apparently addresses a natural
interest of when the slave state gets exactly screwed.

GeoffMontee, feel free to remove the SI association if it's no longer relevant to the customer.

Cheers,
Andrei

Andrei Elkin added a comment - 2020-10-16 11:24 GeoffMontee , howdy. Let's first settle our opinions on the Semantics of MASTER_USE_GTID=current_pos . Like slave_pos it's a form of connection mode that presents a slave's gtid state to Master. That is, on the slave server it only affects the slave IO thread . The current_pos mode is made IO to regard gtid_current_pos as the slave gtid state. More specifically IO acquires a snapshot of gtid_current_pos at connecting time to present it to Master. Master is to validate the slave's state. When later, after the successful validation is done, gtid_current_pos is locally updated it must be fair to claim that the local update may not affect the current slave connection. Even when it comes to the inconsistency matter, then it will have been caught when (though not necessarily instantly) Slave executes events and gtid_strict_mode is set. Notice too, that the preferred slave_pos mode is also vulnerable to the current issue in the multi-source scenario. The second source playing a role of local connection desynchronizes slave_pos state. Personally I prefer this interpretation of a "dumb" simple IO that is not concerned with what gtids it carries in. Secondly, to learn by Slave about potential inconsistency might be useful though. A watching mechanism should error log online changes to gtid_current_pos or gtid_slave_pos done slave locally or through second source in the domains of concern. E.g when a replication source is defined as CHANGE MASTER ... do_domain_ids = (d1,d2) that would be domains d1 or d2 . I would limit this watcher to gtid_strict_mode = ON . We're considering its technical implementation as IO :s would do the marking, local transaction handlers and slave appliers would do the checking and warning. This method apparently addresses a natural interest of when the slave state gets exactly screwed . GeoffMontee , feel free to remove the SI association if it's no longer relevant to the customer. Cheers, Andrei

Sujatha Sivakumar (Inactive) added a comment - 2020-10-20 07:32

Hello GeoffMontee

Current issue is observed in case of "GTID_STRICT_MODE=off".
I tried to reproduce ~~MDEV-20122~~ in case "GTID_STRICT_MODE=on"

Enable circular replication between master-slave.
Do 'CREATE TABLE t' on master and 'INSERT INTO t' on slave.
Following state is achieved.

Master:
========

MariaDB [test]> show global variables like '%gtid%';

+------------------------+-------------+

| Variable_name          | Value       |

+------------------------+-------------+

| gtid_binlog_pos        | 0-2-2       |

| gtid_binlog_state      | 0-1-1,0-2-2 |

| gtid_current_pos       | 0-2-2       |

| gtid_domain_id         | 0           |

| gtid_ignore_duplicates | OFF         |

| gtid_slave_pos         | 0-2-2       |

| gtid_strict_mode       | ON          |

| wsrep_gtid_domain_id   | 0           |

| wsrep_gtid_mode        | OFF         |

+------------------------+-------------+

9 rows in set (0.01 sec)

Slave:
======

MariaDB [test]> show global variables like '%gtid%';

+------------------------+-------------+

| Variable_name          | Value       |

+------------------------+-------------+

| gtid_binlog_pos        | 0-2-2       |

| gtid_binlog_state      | 0-1-1,0-2-2 |

| gtid_current_pos       | 0-2-2       |

| gtid_domain_id         | 0           |

| gtid_ignore_duplicates | OFF         |

| gtid_slave_pos         | 0-2-2       |

| gtid_strict_mode       | ON          |

| wsrep_gtid_domain_id   | 0           |

| wsrep_gtid_mode        | OFF         |

+------------------------+-------------+

9 rows in set (0.01 sec)

Now do 'STOP SLAVE' on 'Server_2'. Execute 'CHANGE MASTER TO' with 'MASTER_USE_GTID=current_pos'

Case 1:
======
With circular replication in effect, ~~MDEV-20122~~ will never occur. Replication will be smooth with both 'current_pos' and 'slave_pos'.
As both servers are in sync.

Case 2: [No circular replication between master and slave. i.e slave becomes new 'master' and its 'slave' is using 'current_pos'
=======

MariaDB [test]> start slave;

Query OK, 0 rows affected (0.01 sec)

MariaDB [test]> insert into t values (30);

Query OK, 1 row affected (0.00 sec)

Please note: "gtid_binlog_pos" got updated.

MariaDB [test]> show global variables like '%gtid%';

+------------------------+-------------+

| Variable_name          | Value       |

+------------------------+-------------+

| gtid_binlog_pos        | 0-2-3       |

| gtid_binlog_state      | 0-1-1,0-2-3 |

| gtid_current_pos       | 0-2-3       |

| gtid_domain_id         | 0           |

| gtid_ignore_duplicates | OFF         |

| gtid_slave_pos         | 0-2-2       |

| gtid_strict_mode       | ON          |

| wsrep_gtid_domain_id   | 0           |

| wsrep_gtid_mode        | OFF         |

+------------------------+-------------+

9 rows in set (0.01 sec)

As long as Master is muted/slient, Slave works fine.
Now do a DML on master, observe that Slave stops.

MariaDB [test]> show slave status\G;

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: localhost

                  Master_User: root

                  Master_Port: 16000

                Connect_Retry: 60

              Master_Log_File: master-bin.000001

          Read_Master_Log_Pos: 842

               Relay_Log_File: slave-relay-bin.000002

                Relay_Log_Pos: 658

        Relay_Master_Log_File: master-bin.000001

             Slave_IO_Running: Yes

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table: test.t_ignored1

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 1950

                   Last_Error: An attempt was made to binlog GTID 0-1-3 which would create an out-of-order sequence number with existing GTID 0-2-3, and gtid strict mode is enabled. (edited)

Slave stops with an error, upon processing the first GTID received from master, it doesn't have to reconnect to observe the discrepancy.
Hence there is no bug in case where 'GTID_STRICT_MODE=ON'.

Please let us know your thoughts.

Sujatha Sivakumar (Inactive) added a comment - 2020-10-20 07:32 Hello GeoffMontee Current issue is observed in case of "GTID_STRICT_MODE=off". I tried to reproduce MDEV-20122 in case "GTID_STRICT_MODE=on" Enable circular replication between master-slave. Do 'CREATE TABLE t' on master and 'INSERT INTO t' on slave. Following state is achieved. Master: ======== MariaDB [test]> show global variables like '%gtid%'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | gtid_binlog_pos | 0-2-2 | | gtid_binlog_state | 0-1-1,0-2-2 | | gtid_current_pos | 0-2-2 | | gtid_domain_id | 0 | | gtid_ignore_duplicates | OFF | | gtid_slave_pos | 0-2-2 | | gtid_strict_mode | ON | | wsrep_gtid_domain_id | 0 | | wsrep_gtid_mode | OFF | +------------------------+-------------+ 9 rows in set (0.01 sec) Slave: ====== MariaDB [test]> show global variables like '%gtid%'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | gtid_binlog_pos | 0-2-2 | | gtid_binlog_state | 0-1-1,0-2-2 | | gtid_current_pos | 0-2-2 | | gtid_domain_id | 0 | | gtid_ignore_duplicates | OFF | | gtid_slave_pos | 0-2-2 | | gtid_strict_mode | ON | | wsrep_gtid_domain_id | 0 | | wsrep_gtid_mode | OFF | +------------------------+-------------+ 9 rows in set (0.01 sec) Now do 'STOP SLAVE' on 'Server_2'. Execute 'CHANGE MASTER TO' with 'MASTER_USE_GTID=current_pos' Case 1: ====== With circular replication in effect, MDEV-20122 will never occur. Replication will be smooth with both 'current_pos' and 'slave_pos'. As both servers are in sync. Case 2: [No circular replication between master and slave. i.e slave becomes new 'master' and its 'slave' is using 'current_pos' ======= MariaDB [test]> start slave; Query OK, 0 rows affected (0.01 sec) MariaDB [test]> insert into t values (30); Query OK, 1 row affected (0.00 sec) Please note: "gtid_binlog_pos" got updated. MariaDB [test]> show global variables like '%gtid%'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | gtid_binlog_pos | 0-2-3 | | gtid_binlog_state | 0-1-1,0-2-3 | | gtid_current_pos | 0-2-3 | | gtid_domain_id | 0 | | gtid_ignore_duplicates | OFF | | gtid_slave_pos | 0-2-2 | | gtid_strict_mode | ON | | wsrep_gtid_domain_id | 0 | | wsrep_gtid_mode | OFF | +------------------------+-------------+ 9 rows in set (0.01 sec) As long as Master is muted/slient, Slave works fine. Now do a DML on master, observe that Slave stops. MariaDB [test]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: localhost Master_User: root Master_Port: 16000 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 842 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 658 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: test.t_ignored1 Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1950 Last_Error: An attempt was made to binlog GTID 0-1-3 which would create an out-of-order sequence number with existing GTID 0-2-3, and gtid strict mode is enabled. (edited) Slave stops with an error, upon processing the first GTID received from master, it doesn't have to reconnect to observe the discrepancy. Hence there is no bug in case where 'GTID_STRICT_MODE=ON'. Please let us know your thoughts.

Sujatha Sivakumar (Inactive) made changes - 2020-10-20 07:33

Assignee

Sujatha Sivakumar [ sujatha.sivakumar ]

Geoff Montee [ geoffmontee ]

Julien Fritsch made changes - 2020-10-20 07:50

Labels

need_feedback

Andrei Elkin added a comment - 2020-10-20 08:02

GeoffMontee, to add up to latest update from Sujatha on gtid_strict_mode, in your bug description the slave applier may not run, as the master is muted. In such scenario the strict mode error won't show up, so the slave reconnect would see the description error instead.
I'd rate this as a sort of inconvenience to me rather than a critical issue.

As to the non-strict mode I bet you would also never rate that as critical.

Andrei Elkin added a comment - 2020-10-20 08:02 GeoffMontee , to add up to latest update from Sujatha on gtid_strict_mode , in your bug description the slave applier may not run, as the master is muted. In such scenario the strict mode error won't show up, so the slave reconnect would see the description error instead. I'd rate this as a sort of inconvenience to me rather than a critical issue. As to the non-strict mode I bet you would also never rate that as critical.

Geoff Montee (Inactive) added a comment - 2020-10-20 20:11

Hi Elkin,

Personally I prefer this interpretation of a "dumb" simple IO that is not concerned with what gtids it carries in.

You know more than me about the GTID implementation, but I personally disagree. The IO thread currently seems a bit too dumb regarding GTIDs.

The IO thread doesn't seem quite so "dumb" in other areas. As far as I know, the IO thread filters out events that contain the slave's server_id. I think the IO thread also handles filtering for IGNORE_SERVER_IDS, DO_DOMAIN_IDS, and IGNORE_DOMAIN_IDS. If the IO thread already reads the server_id and gtid_domain_id from each event, it does not seem like it would be unreasonable to also read the GTID from the event, and then to compare that GTID to the local values.

Secondly, to learn by Slave about potential inconsistency might be useful though.
A watching mechanism should error log online changes to gtid_current_pos or gtid_slave_pos done
slave locally or through second source in the domains of concern.
E.g when a replication source is defined as
CHANGE MASTER ... do_domain_ids = (d1,d2) that would be domains d1 or d2.

I would limit this watcher to gtid_strict_mode = ON.

That sounds like it could be a useful way to solve problems like this.

feel free to remove the SI association if it's no longer relevant to the customer.

No comment on that. You'll have to ask nicklamb or ccalender.

Geoff Montee (Inactive) added a comment - 2020-10-20 20:11 Hi Elkin , Personally I prefer this interpretation of a "dumb" simple IO that is not concerned with what gtids it carries in. You know more than me about the GTID implementation, but I personally disagree. The IO thread currently seems a bit too dumb regarding GTIDs. The IO thread doesn't seem quite so "dumb" in other areas. As far as I know, the IO thread filters out events that contain the slave's server_id . I think the IO thread also handles filtering for IGNORE_SERVER_IDS , DO_DOMAIN_IDS , and IGNORE_DOMAIN_IDS . If the IO thread already reads the server_id and gtid_domain_id from each event, it does not seem like it would be unreasonable to also read the GTID from the event, and then to compare that GTID to the local values. Secondly, to learn by Slave about potential inconsistency might be useful though. A watching mechanism should error log online changes to gtid_current_pos or gtid_slave_pos done slave locally or through second source in the domains of concern. E.g when a replication source is defined as CHANGE MASTER ... do_domain_ids = (d1,d2) that would be domains d1 or d2. I would limit this watcher to gtid_strict_mode = ON. That sounds like it could be a useful way to solve problems like this. feel free to remove the SI association if it's no longer relevant to the customer. No comment on that. You'll have to ask nicklamb or ccalender .

Geoff Montee (Inactive) added a comment - 2020-10-20 20:21

Hi sujatha.sivakumar,

Your test case with gtid_strict_mode=ON proves that the slave raises an error in the case where an "out-of-order sequence number" is written to the binary log. However, this test case does not prove that setting gtid_strict_mode=ON can prevent the slave's GTIDs from getting out of sync with the master's GTIDs, because the slave's GTIDs can become inconsistent without raising an "out-of-order sequence number".

For example, if you had set gtid_domain_id=1 on the slave, then the slave's local transaction would have been written to the binary log with GTID 1-2-1. This would not raise an "out-of-order sequence number" error, so gtid_strict_mode would not notice the inconsistency. In this case, the slave would only notice the inconsistent GTID position after the IO thread is stopped and restarted.

Geoff Montee (Inactive) added a comment - 2020-10-20 20:21 Hi sujatha.sivakumar , Your test case with gtid_strict_mode=ON proves that the slave raises an error in the case where an "out-of-order sequence number" is written to the binary log. However, this test case does not prove that setting gtid_strict_mode=ON can prevent the slave's GTIDs from getting out of sync with the master's GTIDs, because the slave's GTIDs can become inconsistent without raising an "out-of-order sequence number". For example, if you had set gtid_domain_id=1 on the slave, then the slave's local transaction would have been written to the binary log with GTID 1-2-1 . This would not raise an "out-of-order sequence number" error, so gtid_strict_mode would not notice the inconsistency. In this case, the slave would only notice the inconsistent GTID position after the IO thread is stopped and restarted.

Geoff Montee (Inactive) made changes - 2020-10-20 20:21

Labels

need_feedback

Sujatha Sivakumar (Inactive) made changes - 2020-10-21 06:38

Assignee

Geoff Montee [ geoffmontee ]

Sujatha Sivakumar [ sujatha.sivakumar ]

Julien Fritsch made changes - 2020-10-22 13:54

Priority

Critical [ 2 ]

Major [ 3 ]

Julien Fritsch made changes - 2020-11-06 15:58

Fix Version/s

10.1 [ 16100 ]

Julien Fritsch made changes - 2020-11-30 15:57

Priority

Major [ 3 ]

Critical [ 2 ]

Andrei Elkin added a comment - 2020-12-01 13:08 - edited

GeoffMontee: to a correct mentioning by you
> IO thread filters out events

Notice that while doing so the IO thread is not concerned with out-of-order which
is left by the design to the applier thread. It's fair to say of what the IO thread does that it maintains integrity of replicated gtid domains configuration. (The consistency - to which the replication initial gtid [this bug's immediate worry] - imo - therefore is the applier's burden.)

By all possible I suggest we don't refine anything that relate to gtid_current_pos.

Andrei Elkin added a comment - 2020-12-01 13:08 - edited GeoffMontee : to a correct mentioning by you > IO thread filters out events Notice that while doing so the IO thread is not concerned with out-of-order which is left by the design to the applier thread. It's fair to say of what the IO thread does that it maintains integrity of replicated gtid domains configuration. (The consistency - to which the replication initial gtid [this bug's immediate worry] - imo - therefore is the applier's burden.) By all possible I suggest we don't refine anything that relate to gtid_current_pos .

Andrei Elkin made changes - 2020-12-07 13:32

Labels

gtid_current_pos

Julien Fritsch made changes - 2020-12-21 17:21

Labels

gtid_current_pos

gtid_current_pos need_feedback

Julien Fritsch made changes - 2021-03-05 19:08

Labels

gtid_current_pos need_feedback

gtid_current_pos

Andrei Elkin made changes - 2021-03-19 10:37

Assignee

Sujatha Sivakumar [ sujatha.sivakumar ]

Andrei Elkin [ elkin ]

Andrei Elkin added a comment - 2021-05-27 13:02 - edited

julien.fritsch, GeoffMontee, (esa.korhonen) I suggest (have suggested in this comment)) to start deprecating CM..master_use_gtid=current_pos (and then the related gtid_current_pos) in 10.6 and that's what we'll do in this ticket.

Another task for 10.7 should be reported (myself) to complete deprecation which means to replace gtid_current_pos in all features that use it.

Andrei Elkin added a comment - 2021-05-27 13:02 - edited julien.fritsch , GeoffMontee , ( esa.korhonen ) I suggest (have suggested in this comment )) to start deprecating CM..master_use_gtid=current_pos (and then the related gtid_current_pos ) in 10.6 and that's what we'll do in this ticket. Another task for 10.7 should be reported (myself) to complete deprecation which means to replace gtid_current_pos in all features that use it.

Andrei Elkin made changes - 2021-05-27 13:02

Assignee

Andrei Elkin [ elkin ]

Geoff Montee [ geoffmontee ]

Andrei Elkin made changes - 2021-05-27 13:03

Labels

gtid_current_pos

gtid_current_pos need_feedback

Geoff Montee (Inactive) made changes - 2021-06-03 00:00

Assignee

Geoff Montee [ geoffmontee ]

Andrei Elkin [ elkin ]

Geoff Montee (Inactive) made changes - 2021-06-03 00:00

Labels

gtid_current_pos need_feedback

gtid_current_pos

Andrei Elkin added a comment - 2021-06-10 17:52 - edited

GeoffMontee, esa.korhonen,knielsen: While deprecating the current behaviour of dynamic (START SLAVE time) computation of the effective slave's gtid state by CM..master_use_gtid=current_pos option we could salvage the syntax part.
What would think of turning master_use_gtid=current_pos to compute the new value to gtid_slave_pos
at the time of executing CHANGE MASTER?
That is the CM's option would imply a SET GLOBAL gtid_slave_pos = value, where value is computed according to the current specification as a "constrained" union of the slave and binlog gtid states.

I'd be great to decide on this step in order to formulate a meaningful deprecation message.

Also START SLAVE would regard gtid_slave_pos as the single source of the slave gtid state.

My use case is obviously an ex-master that is demoted to the slave role.
As you can see with this change we also cover this issue's complaint.
Once CM..master_use_gtid=current_pos is done so the server settles its slave's gtid state, the server is free to create local gtids which won't mess with the slave gtid state anymore.

Andrei Elkin added a comment - 2021-06-10 17:52 - edited GeoffMontee , esa.korhonen , knielsen : While deprecating the current behaviour of dynamic (START SLAVE time) computation of the effective slave's gtid state by CM..master_use_gtid=current_pos option we could salvage the syntax part. What would think of turning master_use_gtid=current_pos to compute the new value to gtid_slave_pos at the time of executing CHANGE MASTER ? That is the CM's option would imply a SET GLOBAL gtid_slave_pos = value , where value is computed according to the current specification as a "constrained" union of the slave and binlog gtid states. I'd be great to decide on this step in order to formulate a meaningful deprecation message. Also START SLAVE would regard gtid_slave_pos as the single source of the slave gtid state. My use case is obviously an ex-master that is demoted to the slave role. As you can see with this change we also cover this issue's complaint. Once CM..master_use_gtid=current_pos is done so the server settles its slave's gtid state, the server is free to create local gtids which won't mess with the slave gtid state anymore.

Andrei Elkin made changes - 2021-06-10 17:55

Assignee

Andrei Elkin [ elkin ]

Geoff Montee [ geoffmontee ]

Geoff Montee (Inactive) added a comment - 2021-06-10 18:37

Hi Elkin,

What would think of turning master_use_gtid=current_pos to compute the new value to gtid_slave_pos at the time of executing CHANGE MASTER? That is the CM's option would imply a SET GLOBAL gtid_slave_pos = value, where value is computed according to the current specification as a "constrained" union of the slave and binlog gtid states.

That sounds good to me. It simplifies how the slave threads handle GTID tracking, but it still maintains the advantages of the master_use_gtid=current_pos syntax.

Geoff Montee (Inactive) added a comment - 2021-06-10 18:37 Hi Elkin , What would think of turning master_use_gtid=current_pos to compute the new value to gtid_slave_pos at the time of executing CHANGE MASTER? That is the CM's option would imply a SET GLOBAL gtid_slave_pos = value, where value is computed according to the current specification as a "constrained" union of the slave and binlog gtid states. That sounds good to me. It simplifies how the slave threads handle GTID tracking, but it still maintains the advantages of the master_use_gtid=current_pos syntax.

Kristian Nielsen added a comment - 2021-06-11 07:45 - edited

Hi Andrei,

The idea has a lot of appeal, it feels like a much nicer semantics for master_use_gtid=current_pos. That it means that as a host changes role from master to slave, it will use its master position (with local changes) as the starting point for replicating as a slave. That's a much better semantics of what current_pos was intended to do when I originally implemented it.

I see a problem with the proposal as stated (if I understood it correctly). The problem is that "host changes role from master to slave" is not always what a CHANGE MASTER command means.
CHANGE MASTER is used to switch a master to become a slave, but it is also used in many other situations - to change a slave (that was never a master) to another master, to change the credentials on the master, to configure ssl, etc. etc.

If any CHANGE MASTER command was to magically change the current gtid_position with local transactions, we are back to the problems that START SLAVE had in this respect.

I'm not sure there currently is a well-defined way - from the point of view of the server - to know that the user is switching a master to become a slave.

One possibility is to add an explicit option to CHANGE MASTER that says "this is a master becoming a slave". CHANGE MASTER TO master_demote_to_slave=1 or something (can't think of a better name at the top of my head). This could then imply the master_use_gtid=current_pos semantics you suggested, and possibly imply other unrelated semantics that is useful for the "master becomes a slave" case.

I think that's one way to keep the much better semantics of your proposal and avoid magic gtid_pos changes on unrelated CHANGE MASTER command. Though it's not as clean as the server just doing the right thing (ie. if user forgets the option to CHANGE MASTER, then the slave just starts from the wrong position).

Kristian.

Kristian Nielsen added a comment - 2021-06-11 07:45 - edited Hi Andrei, The idea has a lot of appeal, it feels like a much nicer semantics for master_use_gtid=current_pos. That it means that as a host changes role from master to slave, it will use its master position (with local changes) as the starting point for replicating as a slave. That's a much better semantics of what current_pos was intended to do when I originally implemented it. I see a problem with the proposal as stated (if I understood it correctly). The problem is that "host changes role from master to slave" is not always what a CHANGE MASTER command means. CHANGE MASTER is used to switch a master to become a slave, but it is also used in many other situations - to change a slave (that was never a master) to another master, to change the credentials on the master, to configure ssl, etc. etc. If any CHANGE MASTER command was to magically change the current gtid_position with local transactions, we are back to the problems that START SLAVE had in this respect. I'm not sure there currently is a well-defined way - from the point of view of the server - to know that the user is switching a master to become a slave. One possibility is to add an explicit option to CHANGE MASTER that says "this is a master becoming a slave". CHANGE MASTER TO master_demote_to_slave=1 or something (can't think of a better name at the top of my head). This could then imply the master_use_gtid=current_pos semantics you suggested, and possibly imply other unrelated semantics that is useful for the "master becomes a slave" case. I think that's one way to keep the much better semantics of your proposal and avoid magic gtid_pos changes on unrelated CHANGE MASTER command. Though it's not as clean as the server just doing the right thing (ie. if user forgets the option to CHANGE MASTER, then the slave just starts from the wrong position). Kristian.

Julien Fritsch made changes - 2021-06-15 09:35

Assignee

Geoff Montee [ geoffmontee ]

Andrei Elkin [ elkin ]

Andrei Elkin added a comment - 2021-06-15 17:01 - edited

knielsen, howdy! Yours is a nice refinement.
Indeed, a new option that states the user's intent explicitly has
a clear advantage. I'll proceed from here to see through all major use cases of the role transition. Thank you!

As this task is concerned the agreement is reached then.
master_use_gtid = current_pos is to be deprecated.
Its purpose to facilitate failover will be captured by a new master_demote_to_slave = <bool> option.

Andrei Elkin added a comment - 2021-06-15 17:01 - edited knielsen , howdy! Yours is a nice refinement. Indeed, a new option that states the user's intent explicitly has a clear advantage. I'll proceed from here to see through all major use cases of the role transition. Thank you! As this task is concerned the agreement is reached then. master_use_gtid = current_pos is to be deprecated. Its purpose to facilitate failover will be captured by a new master_demote_to_slave = <bool> option.

Andrei Elkin made changes - 2021-06-15 17:02

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]

Sujatha Sivakumar (Inactive) made changes - 2021-08-11 09:56

Assignee

Andrei Elkin [ elkin ]

Sujatha Sivakumar [ sujatha.sivakumar ]

Sujatha Sivakumar (Inactive) added a comment - 2021-09-15 05:08

Hello julien.fritsch

The deprecation warning is implemented. Will request for review.

Sujatha Sivakumar (Inactive) added a comment - 2021-09-15 05:08 Hello julien.fritsch The deprecation warning is implemented. Will request for review.

Sujatha Sivakumar (Inactive) added a comment - 2021-09-15 12:04

Hello Andrei,

Please review the following changes.

https://github.com/MariaDB/server/commit/47476b09638f6c3a57ee40d318be7a98cda9c83d

http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-sujatha

Thank you.

Sujatha Sivakumar (Inactive) added a comment - 2021-09-15 12:04 Hello Andrei, Please review the following changes. https://github.com/MariaDB/server/commit/47476b09638f6c3a57ee40d318be7a98cda9c83d http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-sujatha Thank you.

Sujatha Sivakumar (Inactive) made changes - 2021-09-15 12:04

Assignee	Sujatha Sivakumar [ sujatha.sivakumar ]	Andrei Elkin [ elkin ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Andrei Elkin added a comment - 2021-09-27 11:28

The patch looks good though the warning should be made starting in 10.7.
I am pushing the commit after double-checking about 10.7 with serg

Andrei Elkin added a comment - 2021-09-27 11:28 The patch looks good though the warning should be made starting in 10.7. I am pushing the commit after double-checking about 10.7 with serg

Andrei Elkin made changes - 2021-09-27 11:28

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Andrei Elkin made changes - 2021-09-27 12:08

Fix Version/s		10.8 [ 26121 ]
Fix Version/s	10.6 [ 24028 ]

Andrei Elkin added a comment - 2021-09-27 16:41

ralf.gebhardt@mariadb.com, according to Serg no

Regarding to deprecation policies, is the upcoming 10.7.1 good enough for us to deprecate CHANGE MASTER TO ... master_use_gtid = an-enum-value, that is we're to deprecate current_pos?

serg 2:40 PM
no, there was no preview release with this deprecation, so it cannot be in 10.7.1 anymore

Andrei Elkin added a comment - 2021-09-27 16:41 ralf.gebhardt@mariadb.com , according to Serg no Regarding to deprecation policies, is the upcoming 10.7.1 good enough for us to deprecate CHANGE MASTER TO ... master_use_gtid = an-enum-value, that is we're to deprecate current_pos? serg 2:40 PM no, there was no preview release with this deprecation, so it cannot be in 10.7.1 anymore

Sergei Golubchik made changes - 2021-10-29 11:55

Priority

Critical [ 2 ]

Major [ 3 ]

Sergei Golubchik made changes - 2021-12-06 21:35

Workflow

MariaDB v3 [ 98419 ]

MariaDB v4 [ 143600 ]

Julien Fritsch made changes - 2021-12-14 13:26

Priority

Major [ 3 ]

Critical [ 2 ]

Andrei Elkin made changes - 2022-02-16 17:05

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Andrei Elkin made changes - 2022-04-25 15:18

Fix Version/s		10.10 [ 27530 ]
Fix Version/s	10.8 [ 26121 ]

Andrei Elkin made changes - 2022-04-25 15:18

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Brandon Nesterenko added a comment - 2022-06-06 21:38

Howdy Andrei!

I have updated Sujatha's patch which deprecates master_use_gtid=current_pos for 10.10 and it is ready for review:
Patch 57a7c5c
BB bb-10.10-MDEV-20122-deprecate-current-pos

Brandon Nesterenko added a comment - 2022-06-06 21:38 Howdy Andrei! I have updated Sujatha's patch which deprecates master_use_gtid=current_pos for 10.10 and it is ready for review: Patch 57a7c5c BB bb-10.10-MDEV-20122-deprecate-current-pos

Andrei Elkin added a comment - 2022-06-08 12:58

The deprecation part of a two part work is requested.

Andrei Elkin added a comment - 2022-06-08 12:58 The deprecation part of a two part work is requested.

Andrei Elkin made changes - 2022-06-08 12:58

Status

Stalled [ 10000 ]

In Review [ 10002 ]

Andrei Elkin added a comment - 2022-06-08 13:04

Review is done as a commit to the feature branch:
57a7c5c4ee6..172508da770 HEAD ~~> bb-10.10MDEV-20122~~-deprecate-current-pos

(The review commits may become Irrelevant to the feature after the eventual approval, so to be discarded)

Andrei Elkin added a comment - 2022-06-08 13:04 Review is done as a commit to the feature branch: 57a7c5c4ee6..172508da770 HEAD > bb-10.10 MDEV-20122 -deprecate-current-pos (The review commits may become Irrelevant to the feature after the eventual approval, so to be discarded)

Andrei Elkin made changes - 2022-06-08 13:04

Assignee	Andrei Elkin [ elkin ]	Brandon Nesterenko [ JIRAUSER48702 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Brandon Nesterenko made changes - 2022-06-13 20:19

Summary

With MASTER_USE_GTID=current_pos, slave's I/O thread only checks gtid_current_pos when thread is first started

Deprecate MASTER_USE_GTID=Current_Pos to favor new MASTER_DEMOTE_TO_SLAVE option

Brandon Nesterenko made changes - 2022-06-13 23:27

Description

When a slave is configured to replicate with "MASTER_USE_GTID=current_pos", the slave uses its value of gtid_current_pos to replicate from the master.

https://mariadb.com/kb/en/library/change-master-to/#master_use_gtid

https://mariadb.com/kb/en/library/gtid/#gtid_current_pos

The value of gtid_current_pos includes GTIDs from both gtid_slave_pos and gtid_binlog_pos:

https://mariadb.com/kb/en/library/gtid/#gtid_slave_pos

https://mariadb.com/kb/en/library/gtid/#gtid_binlog_pos

Since both gtid_slave_pos and gtid_binlog_pos are used, this means that the position takes into account both local transactions and replicated transactions. This can be somewhat problematic, since it means that executing a single local transaction on the slave can end up breaking replication, due to the fact that the local transaction would cause the slave's GTID position to become inconsistent with the master's GTID position. However, in my opinion, this makes sense, given the design of the GTID functionality. To prevent this specific issue, if a slave is using "MASTER_USE_GTID=current_pos", then it should have read_only=ON set.

However, the more problematic issue is that MariaDB will not alert users to the inconsistent GTID position until the slave threads are restarted. If the slave is running smoothly, then the slave threads may not be restarted for weeks or months.

The root cause of this appears to be that the slave's I/O thread only initializes its local value of gtid_current_pos when the thread is first started in start_slave_threads():

https://github.com/MariaDB/server/blob/mariadb-10.4.6/sql/slave.cc#L1400

This means that if a local transaction is executed on the slave, then the slave won't notice that its GTID position is inconsistent with the master until the slave threads are restarted.

For example, let's say that I have a master and a slave.

The master's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 3 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

The slave's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

And let's say that the slave is configured to use "MASTER_USE_GTID=current_pos":

{noformat}
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='172.30.0.105', MASTER_USER='maxscale', MASTER_PASSWORD='password', MASTER_USE_GTID=current_pos;
Query OK, 0 rows affected (0.009 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.045 sec)
{noformat}

And the slave is initially replicating normally:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

But then let's say that we execute a local transaction on the slave. We can see that the slave's gtid_binlog_pos changes:

{noformat}
MariaDB [(none)]> CREATE DATABASE slave_db;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-2 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

But at first, the slave doesn't actually notice that its position is inconsistent with the master:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

The slave only notices when the slave threads are restarted:

{noformat}
MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.002 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.005 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State:
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000001
                 Relay_Log_Pos: 4
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: No
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 296
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 1236
                 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 4-2-2, which is not in the master's binlog'
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-2,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

I think the slave should warn the user about this, so that users can be aware of inconsistent positions, even when the slave threads are not restarted.

For example, here's one potential fix:

If a slave has "MASTER_USE_GTID=current_pos" set, then the slave's I/O thread could periodically compare the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos) to the slave's global value of gtid_binlog_pos. If the global value of gtid_binlog_pos contains GTIDs that are greater than the GTIDs in the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos), then the slave could write a warning to the error log. If gtid_strict_mode were enabled, then maybe the warning could be changed to an error.

Description update after problem discussion:

This work deprecates Current_Pos as an option to CHANGE MASTER TO MASTER_USE_GTID while also adding a safe replacement option MASTER_DEMOTE_TO_SLAVE=<bool>. Specifically, the use case of Current_Pos is to transition a master to become a slave; however, this can break replication state due to actively updating gtid_current_pos with gtid_binlog_pos and gtid_slave_pos.

MASTER_DEMOTE_TO_SLAVE changes this use case by forcing users to set Using_Gtid=Slave_Pos and merging gtid_binlog_pos into gtid_slave_pos once at CHANGE MASTER TO time. Note that if gtid_slave_pos is more recent than gtid_binlog_pos (as in the case of chain replication), the replication state should be preserved.

Then, MASTER_USE_GTID=Current_Pos is deprecated in favor of using Slave_Pos in combination with MASTER_DEMOTE_TO_SLAVE=1.

==========================
Original Description:
==========================

When a slave is configured to replicate with "MASTER_USE_GTID=current_pos", the slave uses its value of gtid_current_pos to replicate from the master.

https://mariadb.com/kb/en/library/change-master-to/#master_use_gtid

https://mariadb.com/kb/en/library/gtid/#gtid_current_pos

The value of gtid_current_pos includes GTIDs from both gtid_slave_pos and gtid_binlog_pos:

https://mariadb.com/kb/en/library/gtid/#gtid_slave_pos

https://mariadb.com/kb/en/library/gtid/#gtid_binlog_pos

Since both gtid_slave_pos and gtid_binlog_pos are used, this means that the position takes into account both local transactions and replicated transactions. This can be somewhat problematic, since it means that executing a single local transaction on the slave can end up breaking replication, due to the fact that the local transaction would cause the slave's GTID position to become inconsistent with the master's GTID position. However, in my opinion, this makes sense, given the design of the GTID functionality. To prevent this specific issue, if a slave is using "MASTER_USE_GTID=current_pos", then it should have read_only=ON set.

However, the more problematic issue is that MariaDB will not alert users to the inconsistent GTID position until the slave threads are restarted. If the slave is running smoothly, then the slave threads may not be restarted for weeks or months.

The root cause of this appears to be that the slave's I/O thread only initializes its local value of gtid_current_pos when the thread is first started in start_slave_threads():

https://github.com/MariaDB/server/blob/mariadb-10.4.6/sql/slave.cc#L1400

This means that if a local transaction is executed on the slave, then the slave won't notice that its GTID position is inconsistent with the master until the slave threads are restarted.

For example, let's say that I have a master and a slave.

The master's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 3 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

The slave's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

And let's say that the slave is configured to use "MASTER_USE_GTID=current_pos":

{noformat}
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='172.30.0.105', MASTER_USER='maxscale', MASTER_PASSWORD='password', MASTER_USE_GTID=current_pos;
Query OK, 0 rows affected (0.009 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.045 sec)
{noformat}

And the slave is initially replicating normally:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

But then let's say that we execute a local transaction on the slave. We can see that the slave's gtid_binlog_pos changes:

{noformat}
MariaDB [(none)]> CREATE DATABASE slave_db;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-2 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

But at first, the slave doesn't actually notice that its position is inconsistent with the master:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

The slave only notices when the slave threads are restarted:

{noformat}
MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.002 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.005 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State:
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000001
                 Relay_Log_Pos: 4
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: No
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 296
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 1236
                 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 4-2-2, which is not in the master's binlog'
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-2,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

I think the slave should warn the user about this, so that users can be aware of inconsistent positions, even when the slave threads are not restarted.

For example, here's one potential fix:

If a slave has "MASTER_USE_GTID=current_pos" set, then the slave's I/O thread could periodically compare the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos) to the slave's global value of gtid_binlog_pos. If the global value of gtid_binlog_pos contains GTIDs that are greater than the GTIDs in the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos), then the slave could write a warning to the error log. If gtid_strict_mode were enabled, then maybe the warning could be changed to an error.

Brandon Nesterenko made changes - 2022-06-13 23:28

Description

Description update after problem discussion:

This work deprecates Current_Pos as an option to CHANGE MASTER TO MASTER_USE_GTID while also adding a safe replacement option MASTER_DEMOTE_TO_SLAVE=<bool>. Specifically, the use case of Current_Pos is to transition a master to become a slave; however, this can break replication state due to actively updating gtid_current_pos with gtid_binlog_pos and gtid_slave_pos.

MASTER_DEMOTE_TO_SLAVE changes this use case by forcing users to set Using_Gtid=Slave_Pos and merging gtid_binlog_pos into gtid_slave_pos once at CHANGE MASTER TO time. Note that if gtid_slave_pos is more recent than gtid_binlog_pos (as in the case of chain replication), the replication state should be preserved.

Then, MASTER_USE_GTID=Current_Pos is deprecated in favor of using Slave_Pos in combination with MASTER_DEMOTE_TO_SLAVE=1.

==========================
Original Description:
==========================

When a slave is configured to replicate with "MASTER_USE_GTID=current_pos", the slave uses its value of gtid_current_pos to replicate from the master.

https://mariadb.com/kb/en/library/change-master-to/#master_use_gtid

https://mariadb.com/kb/en/library/gtid/#gtid_current_pos

The value of gtid_current_pos includes GTIDs from both gtid_slave_pos and gtid_binlog_pos:

https://mariadb.com/kb/en/library/gtid/#gtid_slave_pos

https://mariadb.com/kb/en/library/gtid/#gtid_binlog_pos

Since both gtid_slave_pos and gtid_binlog_pos are used, this means that the position takes into account both local transactions and replicated transactions. This can be somewhat problematic, since it means that executing a single local transaction on the slave can end up breaking replication, due to the fact that the local transaction would cause the slave's GTID position to become inconsistent with the master's GTID position. However, in my opinion, this makes sense, given the design of the GTID functionality. To prevent this specific issue, if a slave is using "MASTER_USE_GTID=current_pos", then it should have read_only=ON set.

However, the more problematic issue is that MariaDB will not alert users to the inconsistent GTID position until the slave threads are restarted. If the slave is running smoothly, then the slave threads may not be restarted for weeks or months.

The root cause of this appears to be that the slave's I/O thread only initializes its local value of gtid_current_pos when the thread is first started in start_slave_threads():

https://github.com/MariaDB/server/blob/mariadb-10.4.6/sql/slave.cc#L1400

This means that if a local transaction is executed on the slave, then the slave won't notice that its GTID position is inconsistent with the master until the slave threads are restarted.

For example, let's say that I have a master and a slave.

The master's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 3 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

The slave's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

And let's say that the slave is configured to use "MASTER_USE_GTID=current_pos":

{noformat}
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='172.30.0.105', MASTER_USER='maxscale', MASTER_PASSWORD='password', MASTER_USE_GTID=current_pos;
Query OK, 0 rows affected (0.009 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.045 sec)
{noformat}

And the slave is initially replicating normally:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

But then let's say that we execute a local transaction on the slave. We can see that the slave's gtid_binlog_pos changes:

{noformat}
MariaDB [(none)]> CREATE DATABASE slave_db;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-2 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

But at first, the slave doesn't actually notice that its position is inconsistent with the master:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

The slave only notices when the slave threads are restarted:

{noformat}
MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.002 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.005 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State:
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000001
                 Relay_Log_Pos: 4
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: No
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 296
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 1236
                 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 4-2-2, which is not in the master's binlog'
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-2,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

I think the slave should warn the user about this, so that users can be aware of inconsistent positions, even when the slave threads are not restarted.

For example, here's one potential fix:

If a slave has "MASTER_USE_GTID=current_pos" set, then the slave's I/O thread could periodically compare the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos) to the slave's global value of gtid_binlog_pos. If the global value of gtid_binlog_pos contains GTIDs that are greater than the GTIDs in the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos), then the slave could write a warning to the error log. If gtid_strict_mode were enabled, then maybe the warning could be changed to an error.

======================================
Description update after problem discussion:
======================================

This work deprecates Current_Pos as an option to CHANGE MASTER TO MASTER_USE_GTID while also adding a safe replacement option MASTER_DEMOTE_TO_SLAVE=<bool>. Specifically, the use case of Current_Pos is to transition a master to become a slave; however, this can break replication state due to actively updating gtid_current_pos with gtid_binlog_pos and gtid_slave_pos.

MASTER_DEMOTE_TO_SLAVE changes this use case by forcing users to set Using_Gtid=Slave_Pos and merging gtid_binlog_pos into gtid_slave_pos once at CHANGE MASTER TO time. Note that if gtid_slave_pos is more recent than gtid_binlog_pos (as in the case of chain replication), the replication state should be preserved.

Then, MASTER_USE_GTID=Current_Pos is deprecated in favor of using Slave_Pos in combination with MASTER_DEMOTE_TO_SLAVE=1.

==========================
Original Description:
==========================

When a slave is configured to replicate with "MASTER_USE_GTID=current_pos", the slave uses its value of gtid_current_pos to replicate from the master.

https://mariadb.com/kb/en/library/change-master-to/#master_use_gtid

https://mariadb.com/kb/en/library/gtid/#gtid_current_pos

The value of gtid_current_pos includes GTIDs from both gtid_slave_pos and gtid_binlog_pos:

https://mariadb.com/kb/en/library/gtid/#gtid_slave_pos

https://mariadb.com/kb/en/library/gtid/#gtid_binlog_pos

Since both gtid_slave_pos and gtid_binlog_pos are used, this means that the position takes into account both local transactions and replicated transactions. This can be somewhat problematic, since it means that executing a single local transaction on the slave can end up breaking replication, due to the fact that the local transaction would cause the slave's GTID position to become inconsistent with the master's GTID position. However, in my opinion, this makes sense, given the design of the GTID functionality. To prevent this specific issue, if a slave is using "MASTER_USE_GTID=current_pos", then it should have read_only=ON set.

However, the more problematic issue is that MariaDB will not alert users to the inconsistent GTID position until the slave threads are restarted. If the slave is running smoothly, then the slave threads may not be restarted for weeks or months.

The root cause of this appears to be that the slave's I/O thread only initializes its local value of gtid_current_pos when the thread is first started in start_slave_threads():

https://github.com/MariaDB/server/blob/mariadb-10.4.6/sql/slave.cc#L1400

This means that if a local transaction is executed on the slave, then the slave won't notice that its GTID position is inconsistent with the master until the slave threads are restarted.

For example, let's say that I have a master and a slave.

The master's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 3 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

The slave's GTID position:

{noformat}
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-1 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-1 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

And let's say that the slave is configured to use "MASTER_USE_GTID=current_pos":

{noformat}
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='172.30.0.105', MASTER_USER='maxscale', MASTER_PASSWORD='password', MASTER_USE_GTID=current_pos;
Query OK, 0 rows affected (0.009 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.045 sec)
{noformat}

And the slave is initially replicating normally:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

But then let's say that we execute a local transaction on the slave. We can see that the slave's gtid_binlog_pos changes:

{noformat}
MariaDB [(none)]> CREATE DATABASE slave_db;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE '%gtid%';
+------------------------+--------------------+
| Variable_name | Value |
+------------------------+--------------------+
| gtid_binlog_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_binlog_state | 1-1-95,3-1-1,4-2-2 |
| gtid_current_pos | 1-1-95,3-1-1,4-2-2 |
| gtid_domain_id | 4 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 1-1-95,3-1-1 |
| gtid_strict_mode | OFF |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+--------------------+
10 rows in set (0.001 sec)
{noformat}

But at first, the slave doesn't actually notice that its position is inconsistent with the master:

{noformat}
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000002
                 Relay_Log_Pos: 717
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 1035
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-1,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

The slave only notices when the slave threads are restarted:

{noformat}
MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.002 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.005 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State:
                   Master_Host: 172.30.0.105
                   Master_User: maxscale
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 376
                Relay_Log_File: ip-172-30-0-96-relay-bin.000001
                 Relay_Log_Pos: 4
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: No
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 376
               Relay_Log_Space: 296
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 1236
                 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 4-2-2, which is not in the master's binlog'
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Current_Pos
                   Gtid_IO_Pos: 1-1-95,4-2-2,3-1-1
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
{noformat}

I think the slave should warn the user about this, so that users can be aware of inconsistent positions, even when the slave threads are not restarted.

For example, here's one potential fix:

If a slave has "MASTER_USE_GTID=current_pos" set, then the slave's I/O thread could periodically compare the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos) to the slave's global value of gtid_binlog_pos. If the global value of gtid_binlog_pos contains GTIDs that are greater than the GTIDs in the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos), then the slave could write a warning to the error log. If gtid_strict_mode were enabled, then maybe the warning could be changed to an error.

Brandon Nesterenko made changes - 2022-06-14 03:31

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Brandon Nesterenko made changes - 2022-06-14 03:40

Status

In Progress [ 3 ]

In Testing [ 10301 ]

Brandon Nesterenko made changes - 2022-06-14 03:40

Status

In Testing [ 10301 ]

Stalled [ 10000 ]

Brandon Nesterenko added a comment - 2022-06-14 03:42

Hi Andrei! The latest commits in PR-2155 are ready for review.

Brandon Nesterenko added a comment - 2022-06-14 03:42 Hi Andrei! The latest commits in PR-2155 are ready for review.

Brandon Nesterenko made changes - 2022-06-14 03:42

Assignee	Brandon Nesterenko [ JIRAUSER48702 ]	Andrei Elkin [ elkin ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Andrei Elkin made changes - 2022-06-14 11:25

Link

This issue relates to ~~MDEV-28839~~ [ ~~MDEV-28839~~ ]

Andrei Elkin added a comment - 2022-06-14 13:10

Approved, as the latest patch implements the requirements.

Andrei Elkin added a comment - 2022-06-14 13:10 Approved, as the latest patch implements the requirements.

Andrei Elkin made changes - 2022-06-14 13:10

Assignee	Andrei Elkin [ elkin ]	Brandon Nesterenko [ JIRAUSER48702 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Brandon Nesterenko made changes - 2022-06-15 00:14

Status

Stalled [ 10000 ]

In Testing [ 10301 ]

Brandon Nesterenko made changes - 2022-06-15 00:14

Assignee

Brandon Nesterenko [ JIRAUSER48702 ]

Angelique Sklavounos [ JIRAUSER50741 ]

Brandon Nesterenko added a comment - 2022-06-15 00:15 - edited

Hi angelique.sklavounos!

I am also re-assigning this ticket to you for testing. The preview branch is preview-10.10-gtid.

Brandon Nesterenko added a comment - 2022-06-15 00:15 - edited Hi angelique.sklavounos ! I am also re-assigning this ticket to you for testing. The preview branch is preview-10.10-gtid .

Brandon Nesterenko made changes - 2022-06-21 21:48

Link

This issue relates to TODO-3496 [ TODO-3496 ]

Angelique Sklavounos (Inactive) added a comment - 2022-07-21 12:12

OK to push

Angelique Sklavounos (Inactive) added a comment - 2022-07-21 12:12 OK to push

Angelique Sklavounos (Inactive) made changes - 2022-07-21 12:12

Status

In Testing [ 10301 ]

Stalled [ 10000 ]

Angelique Sklavounos (Inactive) made changes - 2022-07-21 12:12

Assignee

Angelique Sklavounos [ JIRAUSER50741 ]

Brandon Nesterenko [ JIRAUSER48702 ]

Brandon Nesterenko made changes - 2022-07-26 02:42

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Brandon Nesterenko added a comment - 2022-07-26 02:42

Howdy Andrei!

This is ready for a final round of review before pushing into 10.10

https://github.com/MariaDB/server/pull/2199

Brandon Nesterenko added a comment - 2022-07-26 02:42 Howdy Andrei! This is ready for a final round of review before pushing into 10.10 https://github.com/MariaDB/server/pull/2199

Brandon Nesterenko made changes - 2022-07-26 02:42

Assignee	Brandon Nesterenko [ JIRAUSER48702 ]	Andrei Elkin [ elkin ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Andrei Elkin added a comment - 2022-07-26 17:26

Approved on GH.

Andrei Elkin added a comment - 2022-07-26 17:26 Approved on GH.

Andrei Elkin made changes - 2022-07-26 17:26

Assignee	Andrei Elkin [ elkin ]	Brandon Nesterenko [ JIRAUSER48702 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Brandon Nesterenko added a comment - 2022-07-30 14:12

pushed into 10.10 as 90c3b28

Brandon Nesterenko added a comment - 2022-07-30 14:12 pushed into 10.10 as 90c3b28

Brandon Nesterenko made changes - 2022-07-30 14:12

Fix Version/s		10.10.0 [ 27912 ]
Fix Version/s	10.10 [ 27530 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Brandon Nesterenko made changes - 2023-02-13 14:04

Link

This issue relates to ~~MDEV-30647~~ [ ~~MDEV-30647~~ ]

Ralf Gebhardt made changes - 2023-03-21 10:44

Labels

gtid_current_pos

Preview_10.10 gtid_current_pos

Andrey Khizhnyakov added a comment - 2023-06-13 09:01

Good afternoon Please tell me, is this bug present in the version of mariadb 10.4.12?

Andrey Khizhnyakov added a comment - 2023-06-13 09:01 Good afternoon Please tell me, is this bug present in the version of mariadb 10.4.12?

Alice Sherepa added a comment - 2023-06-13 10:26

andreitech this was added in 10.10.0, so this feature is not present on all earlier versions, 10.3+,10.4+,etc (so also 10.4.12)

Alice Sherepa added a comment - 2023-06-13 10:26 andreitech this was added in 10.10.0, so this feature is not present on all earlier versions, 10.3+,10.4+,etc (so also 10.4.12)

Ralf Gebhardt made changes - 2023-06-13 12:43

Affects Version/s	10.2.25 [ 23408 ]
Affects Version/s	10.3.16 [ 23410 ]
Affects Version/s	10.4.6 [ 23412 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Kfir Itzhak added a comment - 2023-12-05 13:00

Hi,

Please do not deprecate master_use_gtid=current_pos. I use it for Active<->Active replication and i believe many others as well, so please do not remove that feature.

Kfir Itzhak added a comment - 2023-12-05 13:00 Hi, Please do not deprecate master_use_gtid=current_pos. I use it for Active<->Active replication and i believe many others as well, so please do not remove that feature.

Brandon Nesterenko made changes - 2023-12-08 17:27

Link

This issue split to ~~MDEV-32976~~ [ ~~MDEV-32976~~ ]

Brandon Nesterenko added a comment - 2023-12-08 17:29

Hi mastertheknife!

Thanks for your input here. We've discussed it and filed ~~MDEV-32976~~ to remove the deprecation status of the option.

Brandon Nesterenko added a comment - 2023-12-08 17:29 Hi mastertheknife ! Thanks for your input here. We've discussed it and filed MDEV-32976 to remove the deprecation status of the option.

Ralf Gebhardt made changes - 2024-02-20 13:35

Link

This issue causes ~~MDEV-31768~~ [ ~~MDEV-31768~~ ]

Roel Van de Paar made changes - 2024-05-03 08:24

Link

This issue causes MDEV-34064 [ MDEV-34064 ]

Roel Van de Paar made changes - 2024-05-03 08:27

Link

This issue relates to ~~MDEV-31768~~ [ ~~MDEV-31768~~ ]

Jira Automation (IT) made changes - 2024-07-04 03:50

Zendesk Related Tickets

125514

Brandon Nesterenko made changes - 2024-07-25 13:17

Link

This issue is duplicated by ~~MDEV-16800~~ [ ~~MDEV-16800~~ ]

MariaDB Server

Deprecate MASTER_USE_GTID=Current_Pos to favor new MASTER_DEMOTE_TO_SLAVE option

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration