[MDEV-32032] Is a server crash something that needs to be replicated? Created: 2023-08-28  Updated: 2023-09-01  Resolved: 2023-09-01

Status: Closed
Project: MariaDB Server
Component/s: Replication, Server
Affects Version/s: 10.11.3, 10.3.39
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Gilles Reeves Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: crash_repair_table, repair, replication
Environment:

Linux Debian 10 (Primary), Linux Debian 12 (Replica)



 Description   

I just got a major server crash, needing to reboot everything.

This server is hosting a MariaDB primary database replicated elsewhere.
After the crash, three database tables got corrupted, needing REPAIR TABLE.
When repairing, the process list shows “Repair by sorting” during the repair.
This needed time, but it worked.

What I found very surprising was that the Replicas also began “Repair by sorting”, likely for the same tables.

Is it normal that the replication is so closely linked to the original that the damaged tables in the Primary are also considered damaged in the Replicas?



 Comments   
Comment by Alice Sherepa [ 2023-08-30 ]

yes, it is expected behavior that table statements are replicated. It is possible to switch off binary logging for the session and then do repair there and it will not be replicated then. (https://mariadb.com/kb/en/set-sql_log_bin/)

Comment by Gilles Reeves [ 2023-08-30 ]

Ok, of course, 'REPAIR TABLE' is a table statement.

But I'd say it is not a “normal” statement, meaning I don't think that any MariaDB/MySQL administrator would have any reason to include it in any regular processing, and it is not expected to make even a tiny change into the table contents.
I'd have thought that the 'REPAIR TABLE' statement could be simply excluded from replication.
Especially knowing that on very large table, it may take hours to complete (which was my case), even on a replicated database.

Maybe, as you said, I could disable binary logging during the repair.
But why having to do that where it could be simply avoided for the 'REPAIR TABLE' instruction?
This highly increases the risk to make mistakes that can damage replication, and this is an additional useless thing one need to think about, which could be avoided.

Comment by Sergei Golubchik [ 2023-09-01 ]

You can also do

REPAIR LOCAL TABLE ...

LOCAL keyword tells the server not to replicate the REPAIR, https://mariadb.com/kb/en/repair-table/

The main reason for replicating this is that REPAIR might recalculate table statistics and one used to want it to happen on the master and on the slave in sync, so that UPDATE/DELETE statements that use LIMIT without ORDER BY would produce the same result on both ends. This is the explanation from 2003, when REPAIR became replicated.

Now I'd say that UPDATE/DELETE with LIMIT without ORDER in the statement based replication is very fragile anyway, mixed or row-based replication would be a proper fix for it. But it's very difficult to change a 20yr old behavior, lots of users know and probably rely on it.

Comment by Gilles Reeves [ 2023-09-01 ]

Yes, it is, and many such statements could break replication.

If I really think about it, I think that actually, statement-based replication just shouldn't exist.

Or at least be reserved for experimental configurations, and certainly not be the default configuration.
Even the mixed configuration seems to me to be very questionable in nature.

Another situation I encountered is the following statement:

CREATE TABLE this_replicated_database.foobar
LIKE another_not_replicated_database.foobar;

...in case this_replicated_database is part of the replication and another_not_replicated_database IS NOT PART of the replication.

I had hoped to be able to do this in order to easily introduce external data into a new table in replication, when this external data already existed in another database.

But that didn't work, because apparently even in row-based replication this statement is replicated as is. As a result my replication was broken.

However, it would have seemed more logical to me, and more in the spirit of row-based replication, to replicate this statement as if the table's creation had been entered from scratch.

Now I have to say that I did this with MariaDB 10.3 as Primary server and MariaDB 10.11 as Replica server.

Maybe if I update all my MariaDB servers to version 10.11 it will work differently?

Generated at Thu Feb 08 10:28:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.