[MDEV-19076] rpl_parallel_temptable result mismatch '-33 optimistic' Created: 2019-03-28  Updated: 2019-05-21  Resolved: 2019-05-20

Status: Closed
Project: MariaDB Server
Component/s: Replication, Tests
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.2.25, 10.3.16, 10.4.5

Type: Bug Priority: Major
Reporter: Andrei Elkin Assignee: Sujatha Sivakumar (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

The test now fails with the following trace:

CURRENT_TEST: rpl.rpl_parallel_temptable
--- /mariadb/10.4/mysql-test/suite/rpl/r/rpl_parallel_temptable.result    2018-05-28 09:18:44.574554596 +0300
+++ /mariadb/10.4/mysql-test/suite/rpl/r/rpl_parallel_temptable.reject    2019-03-28 19:20:00.054911931 +0200
@@ -194,7 +194,6 @@
 30    conservative
 31    conservative
 32    optimistic
-33    optimistic

which is noticed in 10.3 and 10.4.
Whether it's pure test failure it's unclear though possible.

Howtofind: http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=&dt=&bbnum=&typ=&info=&fail_name=rpl_parallel_temptable&fail_variant=&fail_info_short=&fail_info_full=&limit=100



 Comments   
Comment by Sujatha Sivakumar (Inactive) [ 2019-04-04 ]

Reproduced the issue on 10.4 with following command:

perl mysql-test-run.pl --verbose-restart --force --retry=3 --max-save-core=0 --max-save-datadir=1 --mem --parallel=4 --ps-protocol rpl.rpl_parallel_temptable --repeat=30

Comment by Sujatha Sivakumar (Inactive) [ 2019-04-30 ]

Root cause :

In general temporary tables are not safe for parallel replication. Please find the following comments from

Function: bool THD::open_temporary_table(TABLE_LIST *tl)

/*
Temporary tables are not safe for parallel replication. They were
designed to be visible to one thread only, so have no table locking.
Thus there is no protection against two conflicting transactions
committing in parallel and things like that.

So for now, anything that uses temporary tables will be serialised
with anything before it, when using parallel replication.
*/

if (rgi_slave &&
rgi_slave->is_parallel_exec &&
find_temporary_table(tl) &&
wait_for_prior_commit())
DBUG_RETURN(true);

/*
First check if there is a reusable open table available in the
open table list.
*/
if (find_and_use_tmp_table(tl, &table))

{ DBUG_RETURN(true); /* Error */ }

/*
No reusable table was found. We will have to open a new instance.
*/
if (!table && (share= find_tmp_table_share(tl)))

{ table= open_temporary_table(share, tl->get_table_name(), true); }

When ever there a temporary tables in use "wait_for_prior_commit" needs to be invoked to serialize them.

At present the "wait_for_prior_commit" is done only in this case

if (rgi_slave &&
rgi_slave->is_parallel_exec &&
find_temporary_table(tl) &&
wait_for_prior_commit())

As per the above code the wait happens in cases where "find_temporary_table(tl)" is successful.

But in some cases the condition fails in those cases code looks for reusable tables , if they are not found it does "open_temporary_table" as shown below.

/*
No reusable table was found. We will have to open a new instance.
*/
if (!table && (share= find_tmp_table_share(tl)))
{ table= open_temporary_table(share, tl->get_table_name(), true); }

After a successful open there is no call for "wait_for_prior_commit". Adding a wait seems to solve the problem.

Still more analysis needs to be done with respect to when code reaches above "open_temporary_table".

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-02 ]

The above issue is present from 10.2 onwards

CURRENT_TEST: rpl.rpl_parallel_temptable
— /home/sujatha/bug_repo/test-10.2/mysql-test/suite/rpl/r/rpl_parallel_temptable.result 2019-05-01 19:18:39.399654778 +0530
+++ /home/sujatha/bug_repo/test-10.2/mysql-test/suite/rpl/r/rpl_parallel_temptable.reject 2019-05-02 16:23:57.635450840 +0530
@@ -194,7 +194,6 @@
30 conservative
31 conservative
32 optimistic
-33 optimistic
include/stop_slave.inc
SET GLOBAL slave_parallel_mode=@old_mode;
include/start_slave.inc

mysqltest: Result length mismatch

  • saving '/home/sujatha/bug_repo/test-10.2/bld/mysql-test/var/4/log/rpl.rpl_parallel_temptable-innodb,stmt/' to '/home/sujatha/bug_repo/test-10.2/bld/mysql-test/var/log/rpl.rpl_parallel_temptable-innodb,stmt/'

Only 1 of 6 completed.

Hence updating the version affected to 10.2

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-02 ]

Hello Andrei,

Please review the fix for MDEV-19076

Patch is available at the following link:

https://github.com/MariaDB/server/commit/84b928b9ed112a92c03c7aa06431fc8f9a7a1f2f

http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha

Comment by Andrei Elkin [ 2019-05-07 ]

sujatha.sivakumar Thanks for a good piece of analysis!
While accepting your fixes to the slave side, please consider to cease marking on master as transactional Query_log_event whose query refers to a temp table, here

else if (is_transactional)
    flags2|= FL_TRANSACTIONAL;

The master fixes alone are clearly not enough due to OLD -> NEW replication would be still vulnerable. Weigh yourself if the refinement is feasible, and when yes we shall do it now as well.

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-13 ]

Hello Andrei,

Thank you for the review comments.
I have implemented the master side changes. Please review the new changes.
I have tested the latest patch on buildbot.

http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha

Comment by Andrei Elkin [ 2019-05-15 ]

Sujatha,

(copying my reply from bb-10.2 commit github page)

The patch looks good. Please don't forge to formulate the commit message.

Thanks for this piece of nice work!

Andrei

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-20 ]

The fix for this issue is pushed into 10.2.25.

Tested the 10.3 specific changes in build bot.

10.3 changes are:
https://github.com/MariaDB/server/commit/701e8bcff2c21979477bac8e64386d5fa4657cb9

Generated at Thu Feb 08 08:48:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.