Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29796

Auto-generated DELETE for MEMORY/HEAP table can break GTID-based replication

Details

    Description

      It's known (see MDEV-25607) even if not properly documented (see MDEV-29795) that After server restart, a DELETE query is written into the binary log for every HEAP table, at the moment when it is accessed in any way, to reflect the restart emptying them for replicas. It is written unconditionally, and if we restart a replica with, say, server_id = 2 in the domain_id = 0, this extra transaction (logged nowhere else) generates out of order GTID for the domain and then causes replication break in strict GTID mode. Moreover, replication can be broken any time later after restart, by simple SELECT from low-privileged user (like SELECT from information_schema.tables that touches any MEMORY table), even on slave with read_only = ON.

      While it seems to be by design, this behaviour breaks simple single level replication setups with strict GTIDs. For the end user this looks like a bug hard to prevent and troubleshoot. We have to think about some other solution, maybe do NOT write these DELETEs to the binary log if read_only = ON (and the table is accessed by the user without SUPER privilege?), use some special separate domain_id, something else?

      Attachments

        Issue Links

          Activity

            bnestere Brandon Nesterenko added a comment - - edited

            ParadoxV5, thanks for coming back to this!

            Imagine a Primary–Primary setup where both primaries write (with separate Domain IDs).

            This use case wouldn't lead to the same problem, as each primary would have a separate domain-id (as you note), and when one server would restart, it would log the DELETE using its unique domain-id and replication wouldn't break.

            I think the MDEV description still highlights the main use case:

            this behaviour breaks simple single level replication setups with strict GTIDs

            which would be any simple master->slave[->slave]+ topology where the slave isn't configured with its own domain_id (and uses strict GTID mode). And in such cases, simply viewing some memory table after restarting the slave would then break replication.

            Taking a quote from Serg's comment:

            We've established that if the MEMORY table on the slave is cleared because the server was restarted — this table cannot be part of the replication set, must've been ignored by the replication. Otherwise the replication will break because table's content on the slave will differ from the master's.

            I'd extend this to two cases:
            1. When the memory table is part of the active replication set, replication will naturally break because the content will differ.
            2. When the memory table is not part of the active replication set, then running a `SELECT` against this table will break replication when we don't actually know if any updates to that table would ever come in.

            I think we need to fix the second case here, where just because we are viewing some table, we shouldn't then binlog some DELETE DDL. I mean, it may even be some query to check if the master and slave table are in sync in the first place.

            Another consideration is the case about what if the server was shut-down as a slave but restarted to be a primary. Then we would want to binlog the DELETE.

            Now to the second part of Serg's comment:

            But this slave might be a master for some other slave or at least can write its own binlog (for fail-over or backup). In this case it can put the auto-delete into the next existing GTID that would've used this table anyway. Like INSERT...SELECT in SBR or any row event that uses this table in RBR.

            Here, I agree with Jimmy

            I doubt hiding the desync is correct.

            I'd imagine we want to fail fast. Taking Serg's example of "But this slave might be a master for some other slave", if we hide a DELETE/TRUNCATE in the next modifying GTID and it gets replicated by the next slave, then both slaves are subjected to the de-sync. On the other hand, if we don't log it at all on the restarting slave, then when some replication error does arise, we can switch the 2nd slave in line to be the 1st slave in line so replication can continue.

            Additionally, the point "can write its own binlog (for fail-over or backup)", if we induce some DELETE in some future transaction, that may further break fail-over/backup, as it could take some restored server state which has a correct memory table state, and then replay the binlog to DELETE/TRUNCATE the table.

            Ultimately, I think Elkin's suggestion (done in a meeting, no quote ) of using the startup option --init-rpl-role would suffice to determine whether or not we want to log the DELETE. A master would want to binlog the DELETE, whereas a slave would not. Then, if this table is updated on the master, the slave would error due to data divergence. And if a server is to function as both master-and-slave it would have its own domain-id, and we would want to binlog the DELETE, and so we would fix this via documentation by mentioning users should use option --init-rpl-role=MASTER.

            Though note, if a server is configured with --init-rpl-role=SLAVE and we skip binlogging the DELETE, we should write a warning into the error log so it is easier to track down for users.

            bnestere Brandon Nesterenko added a comment - - edited ParadoxV5 , thanks for coming back to this! Imagine a Primary–Primary setup where both primaries write (with separate Domain IDs). This use case wouldn't lead to the same problem, as each primary would have a separate domain-id (as you note), and when one server would restart, it would log the DELETE using its unique domain-id and replication wouldn't break. I think the MDEV description still highlights the main use case: this behaviour breaks simple single level replication setups with strict GTIDs which would be any simple master->slave [->slave] + topology where the slave isn't configured with its own domain_id (and uses strict GTID mode). And in such cases, simply viewing some memory table after restarting the slave would then break replication. Taking a quote from Serg's comment : We've established that if the MEMORY table on the slave is cleared because the server was restarted — this table cannot be part of the replication set, must've been ignored by the replication. Otherwise the replication will break because table's content on the slave will differ from the master's. I'd extend this to two cases: 1. When the memory table is part of the active replication set, replication will naturally break because the content will differ. 2. When the memory table is not part of the active replication set, then running a `SELECT` against this table will break replication when we don't actually know if any updates to that table would ever come in. I think we need to fix the second case here, where just because we are viewing some table, we shouldn't then binlog some DELETE DDL. I mean, it may even be some query to check if the master and slave table are in sync in the first place. Another consideration is the case about what if the server was shut-down as a slave but restarted to be a primary. Then we would want to binlog the DELETE . Now to the second part of Serg's comment: But this slave might be a master for some other slave or at least can write its own binlog (for fail-over or backup). In this case it can put the auto-delete into the next existing GTID that would've used this table anyway. Like INSERT...SELECT in SBR or any row event that uses this table in RBR. Here, I agree with Jimmy I doubt hiding the desync is correct. I'd imagine we want to fail fast. Taking Serg's example of "But this slave might be a master for some other slave", if we hide a DELETE/TRUNCATE in the next modifying GTID and it gets replicated by the next slave, then both slaves are subjected to the de-sync. On the other hand, if we don't log it at all on the restarting slave, then when some replication error does arise, we can switch the 2nd slave in line to be the 1st slave in line so replication can continue. Additionally, the point "can write its own binlog (for fail-over or backup)", if we induce some DELETE in some future transaction, that may further break fail-over/backup, as it could take some restored server state which has a correct memory table state, and then replay the binlog to DELETE/TRUNCATE the table. Ultimately, I think Elkin 's suggestion (done in a meeting, no quote ) of using the startup option --init-rpl-role would suffice to determine whether or not we want to log the DELETE . A master would want to binlog the DELETE , whereas a slave would not. Then, if this table is updated on the master, the slave would error due to data divergence. And if a server is to function as both master-and-slave it would have its own domain-id, and we would want to binlog the DELETE , and so we would fix this via documentation by mentioning users should use option --init-rpl-role=MASTER . Though note, if a server is configured with --init-rpl-role=SLAVE and we skip binlogging the DELETE , we should write a warning into the error log so it is easier to track down for users.
            ParadoxV5 Jimmy HĂș added a comment -

            Imagine a Primary–Primary setup where both primaries write (with separate Domain IDs).

            This use case wouldn't lead to the same problem, [
]

            The problem is not about GTID going out of order, but that a difference between the primary and the intermediate is inherently unsafe for replication.
            This is independent of whether the data loss from restarting is binlogged or not.

            In contrast, it’s beneficial that the GTID-binlogged TRUNCATE causes GTID Strict Mode to fail replication fast.
            The alternative would lead to, after who knows how many successful transactions, mysterious SQL errors like this ER_KEY_NOT_FOUND mentioned a couple of years ago.

            After all were it to receive a replication event on a post-restart empty table it would just stop with a typical ER_KEY_NOT_FOUND or such. When Slave also serves a master role then the user would have to organize gtid domains in her replication chain.


            I think we need to fix the second case here, where just because we are viewing some table, we shouldn't then binlog some DELETE DDL. I mean, it may even be some query to check if the master and slave table are in sync in the first place.

            As my previous comment expressed, I prefer logging this TRUNCATE as soon as the MEMORY recreates rather than delaying as a side effect of a SELECT or INSERT.
            The SQL state would then synchronize with its binlog and downstream replicas promptly.


            A master would want to binlog the DELETE, whereas a slave would not.

            I believe we should log the TRUNCATE no matter the role.

            • my previous comment:

              Rebooting a MEMORY should be treated like any other DELETE or TRUNCATE.

            • Should a day come for this ‘slave’ to become a primary, whether by daisy-chaining or outright promotion, it might need a record of TRUNCATion around to synchronize its new replicas with.

            [
], we should write a warning into the error log so it is easier to track down for users.

            Do the MEMORYs log any warnings when they’re reconstructed empty after a server restart?

            Some users reported this problem because they were unaware that restarting a MEMORY erases its data and binlogs a matching DELETE/TRUNCATE.
            After all, nothing solves user negligence.

            ParadoxV5 Jimmy HĂș added a comment - Imagine a Primary–Primary setup where both primaries write (with separate Domain IDs). This use case wouldn't lead to the same problem, [
] The problem is not about GTID going out of order, but that a difference between the primary and the intermediate is inherently unsafe for replication. This is independent of whether the data loss from restarting is binlogged or not. In contrast, it’s beneficial that the GTID-binlogged TRUNCATE causes GTID Strict Mode to fail replication fast. The alternative would lead to, after who knows how many successful transactions, mysterious SQL errors like this ER_KEY_NOT_FOUND mentioned a couple of years ago. After all were it to receive a replication event on a post-restart empty table it would just stop with a typical ER_KEY_NOT_FOUND or such. When Slave also serves a master role then the user would have to organize gtid domains in her replication chain. I think we need to fix the second case here, where just because we are viewing some table, we shouldn't then binlog some DELETE DDL. I mean, it may even be some query to check if the master and slave table are in sync in the first place. As my previous comment expressed, I prefer logging this TRUNCATE as soon as the MEMORY recreates rather than delaying as a side effect of a SELECT or INSERT. The SQL state would then synchronize with its binlog and downstream replicas promptly. A master would want to binlog the DELETE, whereas a slave would not. I believe we should log the TRUNCATE no matter the role. my previous comment: Rebooting a MEMORY should be treated like any other DELETE or TRUNCATE. Should a day come for this ‘slave’ to become a primary, whether by daisy-chaining or outright promotion, it might need a record of TRUNCATion around to synchronize its new replicas with. [
], we should write a warning into the error log so it is easier to track down for users. Do the MEMORYs log any warnings when they’re reconstructed empty after a server restart? Some users reported this problem because they were unaware that restarting a MEMORY erases its data and binlogs a matching DELETE/TRUNCATE. After all, nothing solves user negligence.
            ParadoxV5 Jimmy HĂș added a comment -

            Do the MEMORYs log any warnings when they’re reconstructed empty after a server restart?

            ran a quick MTR test; doesn’t look like there is

            ParadoxV5 Jimmy HĂș added a comment - Do the MEMORYs log any warnings when they’re reconstructed empty after a server restart? ran a quick MTR test; doesn’t look like there is
            Elkin Andrei Elkin added a comment -

            ParadoxV5 writes >

            The problem is not about GTID going out of order


            This is (must be) the main issue indeed.

            That is an vandalistic automatic binlogging on the pure slave role server should not happen regardless of what (replicated therefore as it's pure slave) object causes that
            (Valerii just to remind makes a good point that similar requirement must hold for read-only master).

            Thanks bnestere for considering the role-based approach. The chain topology complication makes me think of init-rpl-role as a key-value pair. The key points to data over which the value assigns a replication role to the server. A key could be a gtid domain id or arbitrary data elements (set of db:s, table:s etc).

            To put some examples, consider a middle slave in a chain.
            First let it be replicating along a single gtid domain 99. This server would obviously specify its role simply as <99, slave> (the fact that it appears as master is irrelevant).
            Now what if a middle slave replicates some other domain 7 on its own? It would have additionally the master role specified as <7,master>, right?
            So when it comes to the mixed slave-master middle slave restart, as the master would not binlog any DELETE on memory tables immediately (as discussed/suggested), waiting for a first modification access which would be binlogged on the domain 7.

            Elkin Andrei Elkin added a comment - ParadoxV5 writes > The problem is not about GTID going out of order This is (must be) the main issue indeed. That is an vandalistic automatic binlogging on the pure slave role server should not happen regardless of what ( replicated therefore as it's pure slave) object causes that (Valerii just to remind makes a good point that similar requirement must hold for read-only master) . Thanks bnestere for considering the role-based approach. The chain topology complication makes me think of init-rpl-role as a key-value pair. The key points to data over which the value assigns a replication role to the server. A key could be a gtid domain id or arbitrary data elements (set of db:s, table:s etc). To put some examples, consider a middle slave in a chain. First let it be replicating along a single gtid domain 99 . This server would obviously specify its role simply as <99, slave> (the fact that it appears as master is irrelevant). Now what if a middle slave replicates some other domain 7 on its own? It would have additionally the master role specified as <7,master> , right? So when it comes to the mixed slave-master middle slave restart, as the master would not binlog any DELETE on memory tables immediately (as discussed/suggested), waiting for a first modification access which would be binlogged on the domain 7 .
            ParadoxV5 Jimmy HĂș added a comment -

            First let it be replicating along a single gtid domain 99. This server would obviously specify its role simply as <99, slave> (the fact that it appears as master is irrelevant).
            Now what if a middle slave replicates some other domain 7 on its own? It would have additionally the master role specified as <7,master>, right?
            âžș Elkin

            This sound like configuring the intermediate’s replication to filter do_domain_ids=99 and binlog with --gtid-domain-id=7.
            And indeed, if the user configures this intermediate a different Domain ID than the primary’s, then GTID replication is compatible and strict mode won’t complain. (Though they’ll have to cover conflicts themself.)
            This applies for all modifications, including this auto-TRUNCATE.


            When the memory table is not part of the active replication set, then running a `SELECT` against this table will break replication when we don't actually know if any updates to that table would ever come in.
            âžș bnestere

            Let’s put the specifics of this MDEV (MEMORY, auto-TRUNCATE timing) aside for now, and this limitation becomes:

            • When a table is not part of the active replication set, then modifying this table will break replication when we don't actually know if any updates to that table would ever come in.

            Shall we file a separate feature request about this?
            Is @@binlog_ignore_table := this.table sufficient, or should we start recognizing @@replicate_ignore_table := this.table as well?

            ParadoxV5 Jimmy HĂș added a comment - First let it be replicating along a single gtid domain 99 . This server would obviously specify its role simply as <99, slave> (the fact that it appears as master is irrelevant). Now what if a middle slave replicates some other domain 7 on its own? It would have additionally the master role specified as <7,master> , right? âžș Elkin This sound like configuring the intermediate’s replication to filter do_domain_ids=99 and binlog with --gtid-domain-id=7 . And indeed, if the user configures this intermediate a different Domain ID than the primary’s, then GTID replication is compatible and strict mode won’t complain. (Though they’ll have to cover conflicts themself.) This applies for all modifications, including this auto-TRUNCATE. When the memory table is not part of the active replication set, then running a `SELECT` against this table will break replication when we don't actually know if any updates to that table would ever come in. âžș bnestere Let’s put the specifics of this MDEV (MEMORY, auto-TRUNCATE timing) aside for now, and this limitation becomes: When a table is not part of the active replication set, then modifying this table will break replication when we don't actually know if any updates to that table would ever come in. Shall we file a separate feature request about this? Is @@binlog_ignore_table := this.table sufficient, or should we start recognizing @@replicate_ignore_table := this.table as well?

            People

              ParadoxV5 Jimmy HĂș
              valerii Valerii Kravchuk
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.