Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32032

Is a server crash something that needs to be replicated?

Details

    Description

      I just got a major server crash, needing to reboot everything.

      This server is hosting a MariaDB primary database replicated elsewhere.
      After the crash, three database tables got corrupted, needing REPAIR TABLE.
      When repairing, the process list shows “Repair by sorting” during the repair.
      This needed time, but it worked.

      What I found very surprising was that the Replicas also began “Repair by sorting”, likely for the same tables.

      Is it normal that the replication is so closely linked to the original that the damaged tables in the Primary are also considered damaged in the Replicas?

      Attachments

        Activity

          alice Alice Sherepa added a comment -

          yes, it is expected behavior that table statements are replicated. It is possible to switch off binary logging for the session and then do repair there and it will not be replicated then. (https://mariadb.com/kb/en/set-sql_log_bin/)

          alice Alice Sherepa added a comment - yes, it is expected behavior that table statements are replicated. It is possible to switch off binary logging for the session and then do repair there and it will not be replicated then. ( https://mariadb.com/kb/en/set-sql_log_bin/ )
          Gingko Gilles Reeves added a comment - - edited

          Ok, of course, 'REPAIR TABLE' is a table statement.

          But I'd say it is not a “normal” statement, meaning I don't think that any MariaDB/MySQL administrator would have any reason to include it in any regular processing, and it is not expected to make even a tiny change into the table contents.
          I'd have thought that the 'REPAIR TABLE' statement could be simply excluded from replication.
          Especially knowing that on very large table, it may take hours to complete (which was my case), even on a replicated database.

          Maybe, as you said, I could disable binary logging during the repair.
          But why having to do that where it could be simply avoided for the 'REPAIR TABLE' instruction?
          This highly increases the risk to make mistakes that can damage replication, and this is an additional useless thing one need to think about, which could be avoided.

          Gingko Gilles Reeves added a comment - - edited Ok, of course, 'REPAIR TABLE' is a table statement. But I'd say it is not a “normal” statement, meaning I don't think that any MariaDB/MySQL administrator would have any reason to include it in any regular processing, and it is not expected to make even a tiny change into the table contents. I'd have thought that the 'REPAIR TABLE' statement could be simply excluded from replication. Especially knowing that on very large table, it may take hours to complete (which was my case), even on a replicated database. Maybe, as you said, I could disable binary logging during the repair. But why having to do that where it could be simply avoided for the 'REPAIR TABLE' instruction? This highly increases the risk to make mistakes that can damage replication, and this is an additional useless thing one need to think about, which could be avoided.

          You can also do

          REPAIR LOCAL TABLE ...
          

          LOCAL keyword tells the server not to replicate the REPAIR, https://mariadb.com/kb/en/repair-table/

          The main reason for replicating this is that REPAIR might recalculate table statistics and one used to want it to happen on the master and on the slave in sync, so that UPDATE/DELETE statements that use LIMIT without ORDER BY would produce the same result on both ends. This is the explanation from 2003, when REPAIR became replicated.

          Now I'd say that UPDATE/DELETE with LIMIT without ORDER in the statement based replication is very fragile anyway, mixed or row-based replication would be a proper fix for it. But it's very difficult to change a 20yr old behavior, lots of users know and probably rely on it.

          serg Sergei Golubchik added a comment - You can also do REPAIR LOCAL TABLE ... LOCAL keyword tells the server not to replicate the REPAIR , https://mariadb.com/kb/en/repair-table/ The main reason for replicating this is that REPAIR might recalculate table statistics and one used to want it to happen on the master and on the slave in sync, so that UPDATE/DELETE statements that use LIMIT without ORDER BY would produce the same result on both ends. This is the explanation from 2003, when REPAIR became replicated. Now I'd say that UPDATE/DELETE with LIMIT without ORDER in the statement based replication is very fragile anyway, mixed or row-based replication would be a proper fix for it. But it's very difficult to change a 20yr old behavior, lots of users know and probably rely on it.
          Gingko Gilles Reeves added a comment -

          Yes, it is, and many such statements could break replication.

          If I really think about it, I think that actually, statement-based replication just shouldn't exist.

          Or at least be reserved for experimental configurations, and certainly not be the default configuration.
          Even the mixed configuration seems to me to be very questionable in nature.

          Another situation I encountered is the following statement:

          CREATE TABLE this_replicated_database.foobar
          LIKE another_not_replicated_database.foobar;
          

          ...in case this_replicated_database is part of the replication and another_not_replicated_database IS NOT PART of the replication.

          I had hoped to be able to do this in order to easily introduce external data into a new table in replication, when this external data already existed in another database.

          But that didn't work, because apparently even in row-based replication this statement is replicated as is. As a result my replication was broken.

          However, it would have seemed more logical to me, and more in the spirit of row-based replication, to replicate this statement as if the table's creation had been entered from scratch.

          Now I have to say that I did this with MariaDB 10.3 as Primary server and MariaDB 10.11 as Replica server.

          Maybe if I update all my MariaDB servers to version 10.11 it will work differently?

          Gingko Gilles Reeves added a comment - Yes, it is, and many such statements could break replication. If I really think about it, I think that actually, statement-based replication just shouldn't exist. Or at least be reserved for experimental configurations, and certainly not be the default configuration. Even the mixed configuration seems to me to be very questionable in nature. Another situation I encountered is the following statement: CREATE TABLE this_replicated_database.foobar LIKE another_not_replicated_database.foobar; ...in case this_replicated_database is part of the replication and another_not_replicated_database IS NOT PART of the replication. I had hoped to be able to do this in order to easily introduce external data into a new table in replication, when this external data already existed in another database. But that didn't work, because apparently even in row-based replication this statement is replicated as is. As a result my replication was broken. However, it would have seemed more logical to me, and more in the spirit of row-based replication, to replicate this statement as if the table's creation had been entered from scratch. Now I have to say that I did this with MariaDB 10.3 as Primary server and MariaDB 10.11 as Replica server. Maybe if I update all my MariaDB servers to version 10.11 it will work differently?

          People

            Unassigned Unassigned
            Gingko Gilles Reeves
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.