Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33178

Random slave replication error, others continues without error

Details

    Description

      Hello,

      we have 1 master, 9 slaves. All slaves connects from the same master. All slaves started from same snapshot, completely same process. Probably after upgrade from 10.6 to 10.11 every few days random slave errors with

      Could not execute Write_rows_v1 event on table ??????.??????; Duplicate entry '8-7' for key 'id_section_2', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysqld-bin.001693, end_log_pos 899067800
      

      We use MIXED replication mode, no parallel replication, no wsrep.

      I do believe it is slave bug, because all other slaves continues without problem. I'm unable to reproduce this on purpose.

      Usually the error happens on the same database, same table, but probably this is just the result of another bug in replication that happened earlier.

      I understand that this bug report does not contains all the required information, but I need your guidance to provide more relevant data. I can provide binlog or relay log for example.

      Attachments

        Issue Links

          Activity

            cuchac Cuchac added a comment -

            Hi Marko,

            both innodb_defragment and innodb_optimize_fulltext_only are OFF. So rebuild should happen and theory with corrupted index seems less probable now.

            Maybe the crash I experienced in MDEV-33260 can have the same source as this bug - memory corruption. I will try to reproduce MDEV-33260 and get some decent stack trace. Here we are quite blind it seems.

            The only idea I have is to issue same SELECT instead of DELETE (SELECT * FROM sections_relation WHERE id_section = '12'), force using the index that does not check correct values and issue it every second and wait if there is any moment that returns invalid row that get deleted (440, 7, 17) instead of correct output. What do you think? Find the exact timestamp when database starts to return invalid row on `WHERE id_section = '12'`.

            cuchac Cuchac added a comment - Hi Marko, both innodb_defragment and innodb_optimize_fulltext_only are OFF. So rebuild should happen and theory with corrupted index seems less probable now. Maybe the crash I experienced in MDEV-33260 can have the same source as this bug - memory corruption. I will try to reproduce MDEV-33260 and get some decent stack trace. Here we are quite blind it seems. The only idea I have is to issue same SELECT instead of DELETE (SELECT * FROM sections_relation WHERE id_section = '12'), force using the index that does not check correct values and issue it every second and wait if there is any moment that returns invalid row that get deleted (440, 7, 17) instead of correct output. What do you think? Find the exact timestamp when database starts to return invalid row on `WHERE id_section = '12'`.

            While it is possible that some code outside InnoDB could cause corruption of InnoDB data structures, I would really like you to try innodb_adaptive_flushing=OFF as well. As far as InnoDB is concerned, I think that it is a rather low-hanging fruit with unpredictable and potentially devastating consequences.

            marko Marko Mäkelä added a comment - While it is possible that some code outside InnoDB could cause corruption of InnoDB data structures, I would really like you to try innodb_adaptive_flushing=OFF as well. As far as InnoDB is concerned, I think that it is a rather low-hanging fruit with unpredictable and potentially devastating consequences.
            cuchac Cuchac added a comment -

            OK, I turned off adaptive flushing. Will see.

            cuchac Cuchac added a comment - OK, I turned off adaptive flushing. Will see.
            cuchac Cuchac added a comment -

            Hello, 5 days without the problem. So far so good. innodb_adaptive_flushing=OFF seems to fixed the issue.

            cuchac Cuchac added a comment - Hello, 5 days without the problem. So far so good. innodb_adaptive_flushing=OFF seems to fixed the issue.

            cuchac, great, thank you. I am closing this as a duplicate of MDEV-33275.

            marko Marko Mäkelä added a comment - cuchac , great, thank you. I am closing this as a duplicate of MDEV-33275 .

            People

              Roel Roel Van de Paar
              cuchac Cuchac
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.