[MDEV-20122] Deprecate MASTER_USE_GTID=Current_Pos to favor new MASTER_DEMOTE_TO_SLAVE option Created: 2019-07-22 Updated: 2023-12-08 Resolved: 2022-07-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Fix Version/s: | 10.10.0 |
| Type: | Task | Priority: | Critical |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Brandon Nesterenko |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | Preview_10.10, gtid_current_pos | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Description |
|
====================================== This work deprecates Current_Pos as an option to CHANGE MASTER TO MASTER_USE_GTID while also adding a safe replacement option MASTER_DEMOTE_TO_SLAVE=<bool>. Specifically, the use case of Current_Pos is to transition a master to become a slave; however, this can break replication state due to actively updating gtid_current_pos with gtid_binlog_pos and gtid_slave_pos. MASTER_DEMOTE_TO_SLAVE changes this use case by forcing users to set Using_Gtid=Slave_Pos and merging gtid_binlog_pos into gtid_slave_pos once at CHANGE MASTER TO time. Note that if gtid_slave_pos is more recent than gtid_binlog_pos (as in the case of chain replication), the replication state should be preserved. Then, MASTER_USE_GTID=Current_Pos is deprecated in favor of using Slave_Pos in combination with MASTER_DEMOTE_TO_SLAVE=1. ========================== When a slave is configured to replicate with "MASTER_USE_GTID=current_pos", the slave uses its value of gtid_current_pos to replicate from the master. https://mariadb.com/kb/en/library/change-master-to/#master_use_gtid https://mariadb.com/kb/en/library/gtid/#gtid_current_pos The value of gtid_current_pos includes GTIDs from both gtid_slave_pos and gtid_binlog_pos: https://mariadb.com/kb/en/library/gtid/#gtid_slave_pos https://mariadb.com/kb/en/library/gtid/#gtid_binlog_pos Since both gtid_slave_pos and gtid_binlog_pos are used, this means that the position takes into account both local transactions and replicated transactions. This can be somewhat problematic, since it means that executing a single local transaction on the slave can end up breaking replication, due to the fact that the local transaction would cause the slave's GTID position to become inconsistent with the master's GTID position. However, in my opinion, this makes sense, given the design of the GTID functionality. To prevent this specific issue, if a slave is using "MASTER_USE_GTID=current_pos", then it should have read_only=ON set. However, the more problematic issue is that MariaDB will not alert users to the inconsistent GTID position until the slave threads are restarted. If the slave is running smoothly, then the slave threads may not be restarted for weeks or months. The root cause of this appears to be that the slave's I/O thread only initializes its local value of gtid_current_pos when the thread is first started in start_slave_threads(): https://github.com/MariaDB/server/blob/mariadb-10.4.6/sql/slave.cc#L1400 This means that if a local transaction is executed on the slave, then the slave won't notice that its GTID position is inconsistent with the master until the slave threads are restarted. For example, let's say that I have a master and a slave. The master's GTID position:
The slave's GTID position:
And let's say that the slave is configured to use "MASTER_USE_GTID=current_pos":
And the slave is initially replicating normally:
But then let's say that we execute a local transaction on the slave. We can see that the slave's gtid_binlog_pos changes:
But at first, the slave doesn't actually notice that its position is inconsistent with the master:
The slave only notices when the slave threads are restarted:
I think the slave should warn the user about this, so that users can be aware of inconsistent positions, even when the slave threads are not restarted. For example, here's one potential fix: If a slave has "MASTER_USE_GTID=current_pos" set, then the slave's I/O thread could periodically compare the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos) to the slave's global value of gtid_binlog_pos. If the global value of gtid_binlog_pos contains GTIDs that are greater than the GTIDs in the thread's local value of gtid_current_pos (i.e. mi->gtid_current_pos), then the slave could write a warning to the error log. If gtid_strict_mode were enabled, then maybe the warning could be changed to an error. |
| Comments |
| Comment by Andrei Elkin [ 2019-07-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GeoffMontee, thanks for the report and analysis done. We might consider your proposals. However, let me first copy-paste a mail pertaining to Could we work this case around with switching to slave_pos instead of elaborating Quote, unquote: CHANGE MASTER TO master_host=new_promoted_master If not using gtid_current_pos (ie master_use_gtid=slave_pos), then to let SET GLOBAL gtid_slave_pos=@@gtid_binlog_pos This is because for efficiency reasons, the master doesn't update the So now we can see why only GTIDs with the servers own server_id should Normally, every GTID in the binlog with a different server id than our own Thus, in the normal case, where user did not play tricks with the binlog and And in case the user deliberately modified the state, it should be up to the Another problem is that the server cannot reliably compare GTIDs with I don't see the value in *And finally, let me reiterate: I consider the @@gtid_current_pos a design | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2019-07-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elkin,
Thanks for the response! This issue isn't really urgent. I hope you enjoy your vacation!
I'm not quite sure how Regardless, I understand the purpose of current_pos, and I understand the semantics. I also understand that it is very easy for users to accidentally break a slave using current_pos. Currently, if a slave is using current_pos, then the slave doesn't do anything to try to detect if the user has done any unsafe operations that may cause the slave to break. If we want to continue to support current_pos, then I am just suggesting that the slave should try to detect if the user has done any unsafe operations that may cause the slave to break. Maybe it could write a warning to the error log. Maybe the warning should suggest that the user may want to switch to slave_pos instead.
Yes, I always recommend to use slave_pos, rather than current_pos. Our Mariabackup documentation on how to build a slave also recommends to use slave_pos. https://mariadb.com/kb/en/library/setting-up-a-replication-slave-with-mariabackup/#gtids However, a lot of users are already using current_pos for whatever reason.
Do we have plans to remove current_pos or change the way it works? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2020-10-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GeoffMontee, howdy. Let's first settle our opinions on the Semantics of MASTER_USE_GTID=current_pos. Like slave_pos it's a form of connection mode that presents a When later, after the successful validation is done, Notice too, that the preferred slave_pos mode is also vulnerable Personally I prefer this interpretation of a "dumb" simple IO that Secondly, to learn by Slave about potential inconsistency might be useful though. I would limit this watcher to gtid_strict_mode = ON. We're considering its technical implementation as IO:s would do GeoffMontee, feel free to remove the SI association if it's no longer relevant to the customer. Cheers, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2020-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello GeoffMontee Current issue is observed in case of "GTID_STRICT_MODE=off". Enable circular replication between master-slave. Master:
Slave:
Now do 'STOP SLAVE' on 'Server_2'. Execute 'CHANGE MASTER TO' with 'MASTER_USE_GTID=current_pos' Case 1: Case 2: [No circular replication between master and slave. i.e slave becomes new 'master' and its 'slave' is using 'current_pos'
As long as Master is muted/slient, Slave works fine.
Slave stops with an error, upon processing the first GTID received from master, it doesn't have to reconnect to observe the discrepancy. Please let us know your thoughts. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2020-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GeoffMontee, to add up to latest update from Sujatha on gtid_strict_mode, in your bug description the slave applier may not run, as the master is muted. In such scenario the strict mode error won't show up, so the slave reconnect would see the description error instead. As to the non-strict mode I bet you would also never rate that as critical. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2020-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elkin,
You know more than me about the GTID implementation, but I personally disagree. The IO thread currently seems a bit too dumb regarding GTIDs. The IO thread doesn't seem quite so "dumb" in other areas. As far as I know, the IO thread filters out events that contain the slave's server_id. I think the IO thread also handles filtering for IGNORE_SERVER_IDS, DO_DOMAIN_IDS, and IGNORE_DOMAIN_IDS. If the IO thread already reads the server_id and gtid_domain_id from each event, it does not seem like it would be unreasonable to also read the GTID from the event, and then to compare that GTID to the local values.
That sounds like it could be a useful way to solve problems like this.
No comment on that. You'll have to ask nicklamb or ccalender. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2020-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Your test case with gtid_strict_mode=ON proves that the slave raises an error in the case where an "out-of-order sequence number" is written to the binary log. However, this test case does not prove that setting gtid_strict_mode=ON can prevent the slave's GTIDs from getting out of sync with the master's GTIDs, because the slave's GTIDs can become inconsistent without raising an "out-of-order sequence number". For example, if you had set gtid_domain_id=1 on the slave, then the slave's local transaction would have been written to the binary log with GTID 1-2-1. This would not raise an "out-of-order sequence number" error, so gtid_strict_mode would not notice the inconsistency. In this case, the slave would only notice the inconsistent GTID position after the IO thread is stopped and restarted. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2020-12-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GeoffMontee: to a correct mentioning by you Notice that while doing so the IO thread is not concerned with out-of-order which By all possible I suggest we don't refine anything that relate to gtid_current_pos. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-05-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
julien.fritsch, GeoffMontee, (esa.korhonen) I suggest (have suggested in this comment)) to start deprecating CM..master_use_gtid=current_pos (and then the related gtid_current_pos) in 10.6 and that's what we'll do in this ticket. Another task for 10.7 should be reported (myself) to complete deprecation which means to replace gtid_current_pos in all features that use it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GeoffMontee, esa.korhonen,knielsen: While deprecating the current behaviour of dynamic (START SLAVE time) computation of the effective slave's gtid state by CM..master_use_gtid=current_pos option we could salvage the syntax part. I'd be great to decide on this step in order to formulate a meaningful deprecation message. Also START SLAVE would regard gtid_slave_pos as the single source of the slave gtid state. My use case is obviously an ex-master that is demoted to the slave role. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2021-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elkin,
That sounds good to me. It simplifies how the slave threads handle GTID tracking, but it still maintains the advantages of the master_use_gtid=current_pos syntax. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2021-06-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Andrei, The idea has a lot of appeal, it feels like a much nicer semantics for master_use_gtid=current_pos. That it means that as a host changes role from master to slave, it will use its master position (with local changes) as the starting point for replicating as a slave. That's a much better semantics of what current_pos was intended to do when I originally implemented it. I see a problem with the proposal as stated (if I understood it correctly). The problem is that "host changes role from master to slave" is not always what a CHANGE MASTER command means. If any CHANGE MASTER command was to magically change the current gtid_position with local transactions, we are back to the problems that START SLAVE had in this respect. I'm not sure there currently is a well-defined way - from the point of view of the server - to know that the user is switching a master to become a slave. One possibility is to add an explicit option to CHANGE MASTER that says "this is a master becoming a slave". CHANGE MASTER TO master_demote_to_slave=1 or something (can't think of a better name at the top of my head). This could then imply the master_use_gtid=current_pos semantics you suggested, and possibly imply other unrelated semantics that is useful for the "master becomes a slave" case. I think that's one way to keep the much better semantics of your proposal and avoid magic gtid_pos changes on unrelated CHANGE MASTER command. Though it's not as clean as the server just doing the right thing (ie. if user forgets the option to CHANGE MASTER, then the slave just starts from the wrong position).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-06-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
knielsen, howdy! Yours is a nice refinement. As this task is concerned the agreement is reached then. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2021-09-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello julien.fritsch The deprecation warning is implemented. Will request for review. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2021-09-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Andrei, Please review the following changes. https://github.com/MariaDB/server/commit/47476b09638f6c3a57ee40d318be7a98cda9c83d http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-sujatha Thank you. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-09-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The patch looks good though the warning should be made starting in 10.7. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-09-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ralf.gebhardt@mariadb.com, according to Serg no
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brandon Nesterenko [ 2022-06-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Howdy Andrei! I have updated Sujatha's patch which deprecates master_use_gtid=current_pos for 10.10 and it is ready for review: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2022-06-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The deprecation part of a two part work is requested. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2022-06-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Review is done as a commit to the feature branch: (The review commits may become Irrelevant to the feature after the eventual approval, so to be discarded) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brandon Nesterenko [ 2022-06-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Andrei! The latest commits in PR-2155 are ready for review. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2022-06-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Approved, as the latest patch implements the requirements. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brandon Nesterenko [ 2022-06-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am also re-assigning this ticket to you for testing. The preview branch is preview-10.10-gtid. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Angelique Sklavounos (Inactive) [ 2022-07-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
OK to push | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brandon Nesterenko [ 2022-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Howdy Andrei! This is ready for a final round of review before pushing into 10.10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2022-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Approved on GH. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brandon Nesterenko [ 2022-07-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
pushed into 10.10 as 90c3b28 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrey Khizhnyakov [ 2023-06-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Good afternoon Please tell me, is this bug present in the version of mariadb 10.4.12? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2023-06-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
andreitech this was added in 10.10.0, so this feature is not present on all earlier versions, 10.3+,10.4+,etc (so also 10.4.12) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kfir Itzhak [ 2023-12-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, Please do not deprecate master_use_gtid=current_pos. I use it for Active<->Active replication and i believe many others as well, so please do not remove that feature. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brandon Nesterenko [ 2023-12-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi mastertheknife! Thanks for your input here. We've discussed it and filed MDEV-32976 to remove the deprecation status of the option. |