[MDEV-20707] Missing memory barrier in parallel replication error handler in wait_for_prior_commit() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.4(EOL)
Fix Version/s: 10.4.11
Component/s: Replication
Labels:
- lock-free
- parallelslave

Description

The 'wakeup' and 'wait code' in parallel replication looks as shown below.

// Wakeup code in wait_for_commit::wakeup():
mysql_mutex_lock(&LOCK_wait_commit);
waitee= NULL;
this->wakeup_error= wakeup_error;

// Wait code in wait_for_prior_commit():
if (waitee)
return wait_for_prior_commit2(thd);
else
{
if (wakeup_error)
my_error(ER_PRIOR_COMMIT_FAILED, MYF(0));

So the waiter code runs a "fast path" without locks. It is ok if we race on
the assignment of NULL to wait_for_commit::waitee variable, because then the
waiter will take the slow path and do proper locking.

But it looks like there is a race as follows:

1. wakeup() sets waitee= NULL
2. wait_for_prior_commit() sees waitee==NULL and wakeup_error==0, and
incorrectly returns without error.
3. wakeup() too late sets wait_for_commit::wakeup_error.

It is not enough of course to swap the assignments in wakeup(). A
write-write memory barrier is needed between them in wakeup(), and a
corresponding read-read barrier is needed in wait_for_prior_commit().

With proper barriers, the waiter cannot see the write of waitee=NULL without
also seeing the write to wakeup_error. So it will either return with
non-zero wakeup_error or take the slow path with proper locking. Both of
which are fine.

Attachments

Activity

Ascending order - Click to sort in descending order

Sujatha Sivakumar (Inactive) added a comment - 2019-10-14 11:00

Hello Andrei,

Please review the contributed changes for ~~MDEV-20707~~.

Patch: https://github.com/MariaDB/server/commit/4eb6ea77e2051bf50b68567f82862f5726fd4bd7

BuildBot Testing: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.4-sujatha

Thank you.

Sujatha Sivakumar (Inactive) added a comment - 2019-10-14 11:00 Hello Andrei, Please review the contributed changes for MDEV-20707 . Patch: https://github.com/MariaDB/server/commit/4eb6ea77e2051bf50b68567f82862f5726fd4bd7 BuildBot Testing: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.4-sujatha Thank you.

Andrei Elkin added a comment - 2019-11-13 09:53

Thanks for working on it!

Andrei Elkin added a comment - 2019-11-13 09:53 Thanks for working on it!

Sujatha Sivakumar (Inactive) added a comment - 2019-11-14 10:28

Fix is implemented in 10.4.11.

Sujatha Sivakumar (Inactive) added a comment - 2019-11-14 10:28 Fix is implemented in 10.4.11.

People

Assignee:: Sujatha Sivakumar (Inactive)

Reporter:: Sujatha Sivakumar (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-10-01 09:33

Updated:: 2020-06-15 08:51

Resolved:: 2019-11-14 10:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server