Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29816

rpl.rpl_parallel_29322 occasionally fails in BB with [ERROR] I/O error reading the header from the binary log, errno=175, io cache code=0

Details

    Description

      10.6 44fd2c4b2

      rpl.rpl_parallel_29322 'mix'             w3 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2022-09-20 17:13:53
      line
      2022-09-20 17:13:52 12 [ERROR] I/O error reading the header from the binary log, errno=175, io cache code=0
      ^ Found warnings in /dev/shm/var/3/log/mysqld.1.err
      ok
      

      Attachments

        Activity

          angelique.sklavounos Angelique Sklavounos (Inactive) created issue -
          angelique.sklavounos Angelique Sklavounos (Inactive) made changes -
          Field Original Value New Value
          Fix Version/s 10.5 [ 23123 ]
          Fix Version/s 10.6 [ 24028 ]
          Fix Version/s 10.7 [ 24805 ]
          Fix Version/s 10.8 [ 26121 ]
          Fix Version/s 10.9 [ 26905 ]
          Fix Version/s 10.10 [ 27530 ]
          Fix Version/s 10.11 [ 27614 ]
          Affects Version/s 10.5 [ 23123 ]
          Affects Version/s 10.6 [ 24028 ]
          Affects Version/s 10.7 [ 24805 ]
          Affects Version/s 10.8 [ 26121 ]
          Affects Version/s 10.9 [ 26905 ]
          Affects Version/s 10.10 [ 27530 ]
          Affects Version/s 10.11 [ 27614 ]
          Assignee Angelique Sklavounos [ JIRAUSER50741 ] Andrei Elkin [ elkin ]
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 10.7 [ 24805 ]
          angelique.sklavounos Angelique Sklavounos (Inactive) made changes -
          Fix Version/s 11.0 [ 28320 ]
          Affects Version/s 11.0 [ 28320 ]
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 10.8 [ 26121 ]
          Elkin Andrei Elkin made changes -
          Assignee Andrei Elkin [ elkin ] Brandon Nesterenko [ JIRAUSER48702 ]
          Elkin Andrei Elkin made changes -
          Component/s Replication [ 10100 ]
          Component/s Tests [ 10800 ]
          knielsen Kristian Nielsen added a comment - - edited

          The root cause appears to be as follows:

          • The dump thread very rarely survives on the master some time after STOP SLAVE on the slave.
          • The test case removes the old master-bin.000002, then copies in a new one.
          • If the old dump thread reads the master-bin.000002 just at the point where it is created but still of size 0, we get this error in the log
          • The test is otherwise unaffected, because the slave connection to the old dump thread is already closed at this point.

          I was not able to easily reproduce the condition where the dump thread survives for longer. But it seems clear that this can happen. The dump thread terminates when it tries to send an event to the slave on a TCP connection that is closed. But the close on the TCP socket (TCP RESET packet) could be seen with some delay, which can then delay stop of the dump thread.

          So I think the solution is to ensure the dump thread is gone before manipulating binlog files. Or alternatively just suppress this error in the log with a suitable comment.

          knielsen Kristian Nielsen added a comment - - edited The root cause appears to be as follows: The dump thread very rarely survives on the master some time after STOP SLAVE on the slave. The test case removes the old master-bin.000002, then copies in a new one. If the old dump thread reads the master-bin.000002 just at the point where it is created but still of size 0, we get this error in the log The test is otherwise unaffected, because the slave connection to the old dump thread is already closed at this point. I was not able to easily reproduce the condition where the dump thread survives for longer. But it seems clear that this can happen. The dump thread terminates when it tries to send an event to the slave on a TCP connection that is closed. But the close on the TCP socket (TCP RESET packet) could be seen with some delay, which can then delay stop of the dump thread. So I think the solution is to ensure the dump thread is gone before manipulating binlog files. Or alternatively just suppress this error in the log with a suitable comment.
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 10.9 [ 26905 ]
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 10.10 [ 27530 ]
          knielsen Kristian Nielsen made changes -
          Assignee Brandon Nesterenko [ JIRAUSER48702 ] Kristian Nielsen [ knielsen ]

          Pushed to 10.5.

          knielsen Kristian Nielsen added a comment - Pushed to 10.5.
          knielsen Kristian Nielsen made changes -
          Fix Version/s 10.5.24 [ 29517 ]
          Fix Version/s 10.6.17 [ 29518 ]
          Fix Version/s 10.11.7 [ 29519 ]
          Fix Version/s 11.0.5 [ 29520 ]
          Fix Version/s 11.1.4 [ 29024 ]
          Fix Version/s 11.2.3 [ 29521 ]
          Fix Version/s 11.3.2 [ 29522 ]
          Fix Version/s 10.5 [ 23123 ]
          Fix Version/s 10.6 [ 24028 ]
          Fix Version/s 10.11 [ 27614 ]
          Fix Version/s 11.0 [ 28320 ]
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]
          Elkin Andrei Elkin added a comment -

          Thanks, knielsen for working on this one!

          As some future enhancement in the area of handling the state "zombie" dump thread, an idea arose at time of MDEV-32551 analysis to engage the semi-sync ack thread. As it accepts the slave-stop message its handling just needs extending to translate the message into actions, like to kill a respective dump thread.
          bnestere ^

          Elkin Andrei Elkin added a comment - Thanks, knielsen for working on this one! As some future enhancement in the area of handling the state "zombie" dump thread, an idea arose at time of MDEV-32551 analysis to engage the semi-sync ack thread. As it accepts the slave-stop message its handling just needs extending to translate the message into actions, like to kill a respective dump thread. bnestere ^

          People

            knielsen Kristian Nielsen
            angelique.sklavounos Angelique Sklavounos (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.