Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19076

rpl_parallel_temptable result mismatch '-33 optimistic'

Details

    Description

      The test now fails with the following trace:

      CURRENT_TEST: rpl.rpl_parallel_temptable
      --- /mariadb/10.4/mysql-test/suite/rpl/r/rpl_parallel_temptable.result    2018-05-28 09:18:44.574554596 +0300
      +++ /mariadb/10.4/mysql-test/suite/rpl/r/rpl_parallel_temptable.reject    2019-03-28 19:20:00.054911931 +0200
      @@ -194,7 +194,6 @@
       30    conservative
       31    conservative
       32    optimistic
      -33    optimistic
      
      

      which is noticed in 10.3 and 10.4.
      Whether it's pure test failure it's unclear though possible.

      Howtofind: http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=&dt=&bbnum=&typ=&info=&fail_name=rpl_parallel_temptable&fail_variant=&fail_info_short=&fail_info_full=&limit=100

      Attachments

        Activity

          Reproduced the issue on 10.4 with following command:

          perl mysql-test-run.pl --verbose-restart --force --retry=3 --max-save-core=0 --max-save-datadir=1 --mem --parallel=4 --ps-protocol rpl.rpl_parallel_temptable --repeat=30

          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Reproduced the issue on 10.4 with following command: perl mysql-test-run.pl --verbose-restart --force --retry=3 --max-save-core=0 --max-save-datadir=1 --mem --parallel=4 --ps-protocol rpl.rpl_parallel_temptable --repeat=30

          Root cause :

          In general temporary tables are not safe for parallel replication. Please find the following comments from

          Function: bool THD::open_temporary_table(TABLE_LIST *tl)

          /*
          Temporary tables are not safe for parallel replication. They were
          designed to be visible to one thread only, so have no table locking.
          Thus there is no protection against two conflicting transactions
          committing in parallel and things like that.

          So for now, anything that uses temporary tables will be serialised
          with anything before it, when using parallel replication.
          */

          if (rgi_slave &&
          rgi_slave->is_parallel_exec &&
          find_temporary_table(tl) &&
          wait_for_prior_commit())
          DBUG_RETURN(true);

          /*
          First check if there is a reusable open table available in the
          open table list.
          */
          if (find_and_use_tmp_table(tl, &table))

          { DBUG_RETURN(true); /* Error */ }

          /*
          No reusable table was found. We will have to open a new instance.
          */
          if (!table && (share= find_tmp_table_share(tl)))

          { table= open_temporary_table(share, tl->get_table_name(), true); }

          When ever there a temporary tables in use "wait_for_prior_commit" needs to be invoked to serialize them.

          At present the "wait_for_prior_commit" is done only in this case

          if (rgi_slave &&
          rgi_slave->is_parallel_exec &&
          find_temporary_table(tl) &&
          wait_for_prior_commit())

          As per the above code the wait happens in cases where "find_temporary_table(tl)" is successful.

          But in some cases the condition fails in those cases code looks for reusable tables , if they are not found it does "open_temporary_table" as shown below.

          /*
          No reusable table was found. We will have to open a new instance.
          */
          if (!table && (share= find_tmp_table_share(tl)))
          { table= open_temporary_table(share, tl->get_table_name(), true); }

          After a successful open there is no call for "wait_for_prior_commit". Adding a wait seems to solve the problem.

          Still more analysis needs to be done with respect to when code reaches above "open_temporary_table".

          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - - edited Root cause : In general temporary tables are not safe for parallel replication. Please find the following comments from Function: bool THD::open_temporary_table(TABLE_LIST *tl) /* Temporary tables are not safe for parallel replication. They were designed to be visible to one thread only, so have no table locking. Thus there is no protection against two conflicting transactions committing in parallel and things like that. So for now, anything that uses temporary tables will be serialised with anything before it, when using parallel replication. */ if (rgi_slave && rgi_slave->is_parallel_exec && find_temporary_table(tl) && wait_for_prior_commit()) DBUG_RETURN(true); /* First check if there is a reusable open table available in the open table list. */ if (find_and_use_tmp_table(tl, &table)) { DBUG_RETURN(true); /* Error */ } /* No reusable table was found. We will have to open a new instance. */ if (!table && (share= find_tmp_table_share(tl))) { table= open_temporary_table(share, tl->get_table_name(), true); } When ever there a temporary tables in use "wait_for_prior_commit" needs to be invoked to serialize them. At present the "wait_for_prior_commit" is done only in this case if (rgi_slave && rgi_slave->is_parallel_exec && find_temporary_table(tl) && wait_for_prior_commit()) As per the above code the wait happens in cases where "find_temporary_table(tl)" is successful. But in some cases the condition fails in those cases code looks for reusable tables , if they are not found it does "open_temporary_table" as shown below. /* No reusable table was found. We will have to open a new instance. */ if (!table && (share= find_tmp_table_share(tl))) { table= open_temporary_table(share, tl->get_table_name(), true); } After a successful open there is no call for "wait_for_prior_commit". Adding a wait seems to solve the problem. Still more analysis needs to be done with respect to when code reaches above "open_temporary_table".

          The above issue is present from 10.2 onwards

          CURRENT_TEST: rpl.rpl_parallel_temptable
          — /home/sujatha/bug_repo/test-10.2/mysql-test/suite/rpl/r/rpl_parallel_temptable.result 2019-05-01 19:18:39.399654778 +0530
          +++ /home/sujatha/bug_repo/test-10.2/mysql-test/suite/rpl/r/rpl_parallel_temptable.reject 2019-05-02 16:23:57.635450840 +0530
          @@ -194,7 +194,6 @@
          30 conservative
          31 conservative
          32 optimistic
          -33 optimistic
          include/stop_slave.inc
          SET GLOBAL slave_parallel_mode=@old_mode;
          include/start_slave.inc

          mysqltest: Result length mismatch

          • saving '/home/sujatha/bug_repo/test-10.2/bld/mysql-test/var/4/log/rpl.rpl_parallel_temptable-innodb,stmt/' to '/home/sujatha/bug_repo/test-10.2/bld/mysql-test/var/log/rpl.rpl_parallel_temptable-innodb,stmt/'

          Only 1 of 6 completed.

          Hence updating the version affected to 10.2

          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - The above issue is present from 10.2 onwards CURRENT_TEST: rpl.rpl_parallel_temptable — /home/sujatha/bug_repo/test-10.2/mysql-test/suite/rpl/r/rpl_parallel_temptable.result 2019-05-01 19:18:39.399654778 +0530 +++ /home/sujatha/bug_repo/test-10.2/mysql-test/suite/rpl/r/rpl_parallel_temptable.reject 2019-05-02 16:23:57.635450840 +0530 @@ -194,7 +194,6 @@ 30 conservative 31 conservative 32 optimistic -33 optimistic include/stop_slave.inc SET GLOBAL slave_parallel_mode=@old_mode; include/start_slave.inc mysqltest: Result length mismatch saving '/home/sujatha/bug_repo/test-10.2/bld/mysql-test/var/4/log/rpl.rpl_parallel_temptable-innodb,stmt/' to '/home/sujatha/bug_repo/test-10.2/bld/mysql-test/var/log/rpl.rpl_parallel_temptable-innodb,stmt/' Only 1 of 6 completed. Hence updating the version affected to 10.2
          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Hello Andrei, Please review the fix for MDEV-19076 Patch is available at the following link: https://github.com/MariaDB/server/commit/84b928b9ed112a92c03c7aa06431fc8f9a7a1f2f http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha
          Elkin Andrei Elkin added a comment - - edited

          sujatha.sivakumar Thanks for a good piece of analysis!
          While accepting your fixes to the slave side, please consider to cease marking on master as transactional Query_log_event whose query refers to a temp table, here

          else if (is_transactional)
              flags2|= FL_TRANSACTIONAL;
          

          The master fixes alone are clearly not enough due to OLD -> NEW replication would be still vulnerable. Weigh yourself if the refinement is feasible, and when yes we shall do it now as well.

          Elkin Andrei Elkin added a comment - - edited sujatha.sivakumar Thanks for a good piece of analysis! While accepting your fixes to the slave side, please consider to cease marking on master as transactional Query_log_event whose query refers to a temp table, here else if (is_transactional) flags2|= FL_TRANSACTIONAL; The master fixes alone are clearly not enough due to OLD -> NEW replication would be still vulnerable. Weigh yourself if the refinement is feasible, and when yes we shall do it now as well.

          Hello Andrei,

          Thank you for the review comments.
          I have implemented the master side changes. Please review the new changes.
          I have tested the latest patch on buildbot.

          http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha

          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Hello Andrei, Thank you for the review comments. I have implemented the master side changes. Please review the new changes. I have tested the latest patch on buildbot. http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha
          Elkin Andrei Elkin added a comment -

          Sujatha,

          (copying my reply from bb-10.2 commit github page)

          The patch looks good. Please don't forge to formulate the commit message.

          Thanks for this piece of nice work!

          Andrei

          Elkin Andrei Elkin added a comment - Sujatha, (copying my reply from bb-10.2 commit github page) The patch looks good. Please don't forge to formulate the commit message. Thanks for this piece of nice work! Andrei

          The fix for this issue is pushed into 10.2.25.

          Tested the 10.3 specific changes in build bot.

          10.3 changes are:
          https://github.com/MariaDB/server/commit/701e8bcff2c21979477bac8e64386d5fa4657cb9

          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - The fix for this issue is pushed into 10.2.25. Tested the 10.3 specific changes in build bot. 10.3 changes are: https://github.com/MariaDB/server/commit/701e8bcff2c21979477bac8e64386d5fa4657cb9

          People

            sujatha.sivakumar Sujatha Sivakumar (Inactive)
            Elkin Andrei Elkin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.