Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29816

rpl.rpl_parallel_29322 occasionally fails in BB with [ERROR] I/O error reading the header from the binary log, errno=175, io cache code=0

Details

    Description

      10.6 44fd2c4b2

      rpl.rpl_parallel_29322 'mix'             w3 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2022-09-20 17:13:53
      line
      2022-09-20 17:13:52 12 [ERROR] I/O error reading the header from the binary log, errno=175, io cache code=0
      ^ Found warnings in /dev/shm/var/3/log/mysqld.1.err
      ok
      

      Attachments

        Activity

          knielsen Kristian Nielsen added a comment - - edited

          The root cause appears to be as follows:

          • The dump thread very rarely survives on the master some time after STOP SLAVE on the slave.
          • The test case removes the old master-bin.000002, then copies in a new one.
          • If the old dump thread reads the master-bin.000002 just at the point where it is created but still of size 0, we get this error in the log
          • The test is otherwise unaffected, because the slave connection to the old dump thread is already closed at this point.

          I was not able to easily reproduce the condition where the dump thread survives for longer. But it seems clear that this can happen. The dump thread terminates when it tries to send an event to the slave on a TCP connection that is closed. But the close on the TCP socket (TCP RESET packet) could be seen with some delay, which can then delay stop of the dump thread.

          So I think the solution is to ensure the dump thread is gone before manipulating binlog files. Or alternatively just suppress this error in the log with a suitable comment.

          knielsen Kristian Nielsen added a comment - - edited The root cause appears to be as follows: The dump thread very rarely survives on the master some time after STOP SLAVE on the slave. The test case removes the old master-bin.000002, then copies in a new one. If the old dump thread reads the master-bin.000002 just at the point where it is created but still of size 0, we get this error in the log The test is otherwise unaffected, because the slave connection to the old dump thread is already closed at this point. I was not able to easily reproduce the condition where the dump thread survives for longer. But it seems clear that this can happen. The dump thread terminates when it tries to send an event to the slave on a TCP connection that is closed. But the close on the TCP socket (TCP RESET packet) could be seen with some delay, which can then delay stop of the dump thread. So I think the solution is to ensure the dump thread is gone before manipulating binlog files. Or alternatively just suppress this error in the log with a suitable comment.

          Pushed to 10.5.

          knielsen Kristian Nielsen added a comment - Pushed to 10.5.
          Elkin Andrei Elkin added a comment -

          Thanks, knielsen for working on this one!

          As some future enhancement in the area of handling the state "zombie" dump thread, an idea arose at time of MDEV-32551 analysis to engage the semi-sync ack thread. As it accepts the slave-stop message its handling just needs extending to translate the message into actions, like to kill a respective dump thread.
          bnestere ^

          Elkin Andrei Elkin added a comment - Thanks, knielsen for working on this one! As some future enhancement in the area of handling the state "zombie" dump thread, an idea arose at time of MDEV-32551 analysis to engage the semi-sync ack thread. As it accepts the slave-stop message its handling just needs extending to translate the message into actions, like to kill a respective dump thread. bnestere ^

          People

            knielsen Kristian Nielsen
            angelique.sklavounos Angelique Sklavounos (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.