Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25395

server recovery hits replication event checksum error

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
    • 10.5
    • Replication, Server
    • None

    Description

      In unlike case of a crash when @@global.binlog_checksum is changing from none to
      crc32 and that only the first of two Binlog_checkpoint_log_event gets written to
      the crc32 rotated binlog file, the following recovery faces a checksum verification error.

      How to repeat:

      set @@global.binlog_checksum=none; 
      set @@global.debug_dbug='d,crash_before_write_second_checkpoint_event';
      set @@global.binlog_checksum=crc32; # => CRASH
      

      Now at the server restart having --master-verify-checksum=1 the error log
      receives the following

       [ERROR] Replication event checksum verification failed while reading from a log file
       [ERROR] Error in Log_event::read_log_event(): 'Replication event checksum verification failed while reading from a log file', data_len: 25, event_type: 163
      

      nevertheless the server proceeds to ignore them and finishes initialization.

      The simulation label is defined as

      --- a/sql/log.cc
      +++ b/sql/log.cc
      @@ -6784,6 +6784,11 @@ void MYSQL_BIN_LOG::purge()
       
       void MYSQL_BIN_LOG::checkpoint_and_purge(ulong binlog_id)
       {
      +  DBUG_EXECUTE_IF("crash_before_write_second_checkpoint_event",
      +                  flush_io_cache(&log_file);
      +                  mysql_file_sync(log_file.file, MYF(MY_WME));
      +                  DBUG_SUICIDE(););
      +
         do_checkpoint_request(binlog_id);
         purge();
       }
      
      

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            A patch is made in MDEV-21117 branch to be updated shortly with few more commits
            dealing with that issue.

            Elkin Andrei Elkin added a comment - A patch is made in MDEV-21117 branch to be updated shortly with few more commits dealing with that issue.
            Elkin Andrei Elkin added a comment -

            serg: the patch had to be refined to satisfy existing tests which were benevolent to checksum errors
            at recovery. 412e696fd2b implements a plan discussed on slack.

            The server stops now when master-checksum-verify = 1 and error messages contain
            binlog-offset of the corrupted event.

            Elkin Andrei Elkin added a comment - serg : the patch had to be refined to satisfy existing tests which were benevolent to checksum errors at recovery. 412e696fd2b implements a plan discussed on slack. The server stops now when master-checksum-verify = 1 and error messages contain binlog-offset of the corrupted event.
            Elkin Andrei Elkin added a comment -

            To,
            > fdle changes - ok, I've seen them in 21117.
            cur_log/etc - clear.

            > What are the changes around prev_event_pos for?

            I replied:
            _

            I think I moved it as Recovery_context member out to satisfy
            builds compiled without HAVE_REPLICATION.
             
            Notice
             
            int TC_LOG_BINLOG::recover()
            ...
            #ifdef HAVE_REPLICATION
              Recovery_context ctx;
            #endif
            
            

            _

            Elkin Andrei Elkin added a comment - To, > fdle changes - ok, I've seen them in 21117. cur_log/etc - clear. > What are the changes around prev_event_pos for? I replied: _ I think I moved it as Recovery_context member out to satisfy builds compiled without HAVE_REPLICATION.   Notice   int TC_LOG_BINLOG::recover() ... #ifdef HAVE_REPLICATION Recovery_context ctx; #endif _

            412e696fd2bc is ok to push

            serg Sergei Golubchik added a comment - 412e696fd2bc is ok to push

            People

              Elkin Andrei Elkin
              Elkin Andrei Elkin
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.