[MDEV-29796] Auto-generated DELETE for MEMORY/HEAP table can break GTID-based replication Created: 2022-10-14 Updated: 2023-11-28 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - Memory |
| Affects Version/s: | 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9 |
| Fix Version/s: | 10.4, 10.5, 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Valerii Kravchuk | Assignee: | Sergei Golubchik |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Description |
|
It's known (see MDEV-25607) even if not properly documented (see While it seems to be by design, this behaviour breaks simple single level replication setups with strict GTIDs. For the end user this looks like a bug hard to prevent and troubleshoot. We have to think about some other solution, maybe do NOT write these DELETEs to the binary log if read_only = ON (and the table is accessed by the user without SUPER privilege?), use some special separate domain_id, something else? |
| Comments |
| Comment by Andrei Elkin [ 2022-10-18 ] |
|
valerii, I agree with the description. We should study how to block the unconditional DELETE. The read-only server may be always exempted from doing that. Let me redirect the question to serg (thanks in advance!). |
| Comment by Andrei Elkin [ 2022-10-18 ] |
|
Just a "binary" question is asked. |
| Comment by Sergei Golubchik [ 2022-10-18 ] |
|
What about this logic: if auto-DELETE on MEMORY tables won't be replicated, replication will go out of sync. It will not go out of sync only if these tables are not replicated (--replicate-ignore-tables or something). So, if MEMORY table is ignored, its cleanup should not be binlogged, otherwise it should be. |
| Comment by Andrei Elkin [ 2022-10-19 ] |
|
valerii, arguably the DELETE action is a kind of SUPER-user one. And it also makes sense to register in binlog any data change. So an isolated gtid domain for local updates deems the only way forward. How to make it error-free? Also notice a separate gtid domain may need to be assigned for updates over the MEMORY tables. |
| Comment by Sergei Golubchik [ 2022-10-20 ] |
|
--read-only cannot affect it. If the server is read-only, one shouldn't be able to create slave-only MEMORY tables in the first place. If CREATE TABLE was allowed, then DELETE should be too. If those tables are slave-side only and should not be replicated, they should be ignored, and auto-DELETE should respect it. new domain-id makes sense if these changes need to be replicated. If they don't — they need to be ignored, and then they don't need a domain-id |
| Comment by Andrei Elkin [ 2022-11-09 ] |
|
serg, valerii: it must be clear the current DELETE binlogging policy makes sense only for restarted masters. (A pure) Slave should not log DELETE on its own. After all were it to receive a replication event on a post-restart empty table it would just stop with a typical ER_KEY_NOT_FOUND or such. When Slave also serves a master role then the user would have to organize gtid domains in her replication chain. Sure this measure alone won't help if a memory table is replicated throughout the chain while a slave-master server in the middle gets restarted so ultimately the restarter should receive a partial memory-table backup at the handshake with master (I have not tried to find one, but we should have a separate ticket for that). --read-only then indeed is irrelevant to the pure slave which is supposed to never execute updates over replicated data/domains or if it does, at least it *must* not binlog them with gtids of the replication domains. Practically then, instead of deciding whether a server is the pure slave, why won't we extend the read-access-causes-no-DELETE over any server and only binlog DELETE when a server is about to write something into binlog (attempting to update a memory table or reads from it to update other tables)? Assigning Sergei to reply to this. |
| Comment by Sergei Golubchik [ 2022-11-11 ] |
|
I'm not sure I understand. Of course, DELETE is written to binlog only when a server is supposed to write to binlog. If it should never write anything to binlog, it has binlogging disabled, and then it won't write DELETE either. |
| Comment by Andrei Elkin [ 2022-11-11 ] |
|
serg, I verified the description that claims a mere SELECT from a MEMORY table causes My suggestion is to binlog DELETE before the very first write access, on both master and slave. |
| Comment by Sergei Golubchik [ 2022-11-17 ] |
|
What use case will it solve? An empty MEMORY table that is only selected from but is never written to? |
| Comment by Andrei Elkin [ 2022-11-18 ] |
|
(After self-reviewing to improve on few points) serg, first not just 'selected from' but merely touched, like with COM_FIELD_LIST. I have only one case in my mind. A listener/learned/slave process is subscribed for changes in the table.
So if it does so this replication setup is safe, else (the first event after the restart is not the DELETE) slave would error out volunteerly at once with a message like "Replicated event to the restarted slave is not an expected DELETE" (nothing gets binlogged of course for the table). |
| Comment by Andrei Elkin [ 2023-01-09 ] |
|
valerii,serg, (sorry for couple of typos, corrected) while the docs are made to discourage Memory table replication, we could do more to enforce As there might exist users who would insist to replicate it, @@skip_replication could be converted into an enum with the default automatic or smth value with its meaning to automatically evaluate to OFF. The taking risk users would have to set it ON explicitly. |
| Comment by Andrei Elkin [ 2023-01-10 ] |
|
maxmether, MDEV-25607 has a master side flavor. It's fine to auto-delete from Memory table on the restarting master, but the deletion can't be a query that may trigger a trigger (so TRUNCATE is safe). |
| Comment by Sergei Golubchik [ 2023-02-05 ] |
|
We've established that if the MEMORY table on the slave is cleared because the server was restarted — this table cannot be part of the replication set, must've been ignored by the replication. Otherwise the replication will break because table's content on the slave will differ from the master's. But this slave might be a master for some other slave or at least can write its own binlog (for fail-over or backup). In this case it can put the auto-delete into the next existing GTID that would've used this table anyway. Like INSERT...SELECT in SBR or any row event that uses this table in RBR. Meaning the auto-delete binlog write would happen not on the first MEMORY table access (like, SELECT) but on the first binlog-relevant table access. And it would not need a GTID |