Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4726

Race in mysql-test/suite/rpl/t/rpl_gtid_stop_start.test

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.3
    • 10.0.9
    • None
    • None

    Description

      There's a race in mysql-test/suite/rpl/t/rpl_gtid_stop_start.test as follows:

      The rpl_gtid_stop_start test used SELECT COUNT(*) as a wait_condition when waiting for rows to appear from the replication stream. Since the table it is waiting on is MyISAM, rows appearing there do not guarantee in any way that the GTID information will have been updated by the slave thread (and can't guarantee that). This causes a race in the test which is especially exacerbated by making the gtid_slave_pos table InnoDB, since then the COMMIT that is done as part of recording the GTID is bound to take some tens of milliseconds.

      Attachments

        Issue Links

          Activity

            jeremycole Jeremy Cole added a comment -

            I attached a patch which changes the wait_condition to wait on the proper GTID to appear instead of waiting on a row count. I believe this to be the only actual way to wait without a race when using MyISAM for gtid_slave_pos; nonetheless this is safe and effective anyway.

            jeremycole Jeremy Cole added a comment - I attached a patch which changes the wait_condition to wait on the proper GTID to appear instead of waiting on a row count. I believe this to be the only actual way to wait without a race when using MyISAM for gtid_slave_pos; nonetheless this is safe and effective anyway.
            jeremycole Jeremy Cole added a comment -

            We've had another failing test this morning which is also due to using SELECT COUNT as a wait_condition. Really, all of the tests in the rpl suite should be reviewed to not use SELECT COUNT as a wait_condition, lest they may all flake eventually.

            jeremycole Jeremy Cole added a comment - We've had another failing test this morning which is also due to using SELECT COUNT as a wait_condition. Really, all of the tests in the rpl suite should be reviewed to not use SELECT COUNT as a wait_condition, lest they may all flake eventually.
            elenst Elena Stepanova added a comment - - edited

            I'm not quite sure Kristian meant it to be a MyISAM table, maybe it just happened this way. I'll check with him before modifying tests.

            elenst Elena Stepanova added a comment - - edited I'm not quite sure Kristian meant it to be a MyISAM table, maybe it just happened this way. I'll check with him before modifying tests.

            Hi Kristian,

            See the above – did you mean the tables to be MyISAM?
            If you did, I can modify the tests as Jeremy suggested. If you didn't, I can add explicit engine=InnoDB in all cases when there is no engine in the DDL.

            elenst Elena Stepanova added a comment - Hi Kristian, See the above – did you mean the tables to be MyISAM? If you did, I can modify the tests as Jeremy suggested. If you didn't, I can add explicit engine=InnoDB in all cases when there is no engine in the DDL.

            Thanks Jeremy for the nice analysis!

            I do not explicitly need the table to be MyISAM, on the other hand I did try
            in the various test cases to use a mix of MyISAM and InnoDB, to improve
            coverage of different combinations.

            The wait_condition.inc with COUNT was anyway meant as a workaround for not
            yet having proper MASTER_GTID_WAIT(), so would be good to fix.

            The best solution is probably to implement include/save_master_gtid_pos.inc
            and include/sync_with_master_gtid.inc, doing basically what Jeremy
            suggests. And then rewrite include/sync_with_master_gtid.inc when
            MASTER_GTID_WAIT() is implemented to use it. If noone beats me to it I can
            look into that after the parallel replication / MDEV-4506 stuff (which
            currently has priority).

            knielsen Kristian Nielsen added a comment - Thanks Jeremy for the nice analysis! I do not explicitly need the table to be MyISAM, on the other hand I did try in the various test cases to use a mix of MyISAM and InnoDB, to improve coverage of different combinations. The wait_condition.inc with COUNT was anyway meant as a workaround for not yet having proper MASTER_GTID_WAIT(), so would be good to fix. The best solution is probably to implement include/save_master_gtid_pos.inc and include/sync_with_master_gtid.inc, doing basically what Jeremy suggests. And then rewrite include/sync_with_master_gtid.inc when MASTER_GTID_WAIT() is implemented to use it. If noone beats me to it I can look into that after the parallel replication / MDEV-4506 stuff (which currently has priority).

            Should be fixed now. The synchronisation is now made with MASTER_GTID_WAIT() in the places where SELECT COUNT=N was used before. Let me know if I missed some places.

            knielsen Kristian Nielsen added a comment - Should be fixed now. The synchronisation is now made with MASTER_GTID_WAIT() in the places where SELECT COUNT =N was used before. Let me know if I missed some places.

            People

              knielsen Kristian Nielsen
              jeremycole Jeremy Cole
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.