[MDEV-16690] node hang due to conflicting inserts into foreign key child table - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.2.16, 10.3.8
Fix Version/s: 10.3.11, 10.2.19
Component/s: Galera, Storage Engine - InnoDB
Labels:
None

Description

Cluster node may enter in unresolved conflict state when there are two inserts, with same primary key, into a table which has foreign key constraint for a parent table. These inserts must be issued in separate cluster nodes, and there has to be simultaneous writes (updates or deletes), for the referenced parent row.
As a result of such scenario, replication applier thread may end in unresolved conflict state, And error log will be filled by messages of type:

"WSREP: BF lock wait long"

followed by InnoDB monitor outputs

Attachments

Issue Links

relates to

MDEV-17541 KILL QUERY during lock wait in FOREIGN KEY check causes hang

Closed

MDEV-18174 Galera node terminated due to foreign key constraint

Closed

Activity

Ascending order - Click to sort in descending order

Seppo Jaakola added a comment - 2018-07-05 07:09

Submitted a pull request, which has a mtr test for reproducing this issue with 10.2 and 10.3 HEAD versions
Pull request fixes a race condition in row0ins.cc, assigning this for review

Seppo Jaakola added a comment - 2018-07-05 07:09 Submitted a pull request, which has a mtr test for reproducing this issue with 10.2 and 10.3 HEAD versions Pull request fixes a race condition in row0ins.cc, assigning this for review

Seppo Jaakola added a comment - 2018-07-05 07:31

Please take a look at the fix in row0ins.cc

This is the earliest point in execution which originates the over write of hard error code in trx::error_state with DB_LOCK_WAIT code. If 'err' remains here having value DB_LOCK_WAIT, it will be returned through a few function call stacks, and finally blindly assigned to trx::error_state in row_ins_step() / error_handling:

The fix here is protected with trx mutex, this may be redundant.

Seppo Jaakola added a comment - 2018-07-05 07:31 Please take a look at the fix in row0ins.cc This is the earliest point in execution which originates the over write of hard error code in trx::error_state with DB_LOCK_WAIT code. If 'err' remains here having value DB_LOCK_WAIT, it will be returned through a few function call stacks, and finally blindly assigned to trx::error_state in row_ins_step() / error_handling: The fix here is protected with trx mutex, this may be redundant.

Marko Mäkelä added a comment - 2018-07-05 09:38

I like the solution, but I think that it can be cleaned up a little.

Marko Mäkelä added a comment - 2018-07-05 09:38 I like the solution, but I think that it can be cleaned up a little.

Marko Mäkelä added a comment - 2018-07-06 14:39

thiru, please check if trx->error_state can be modified by other threads than the one that is executing trx (I think not), and then merge (or cherry-pick) the fix to 10.2.

Marko Mäkelä added a comment - 2018-07-06 14:39 thiru , please check if trx->error_state can be modified by other threads than the one that is executing trx (I think not), and then merge (or cherry-pick) the fix to 10.2.

Marko Mäkelä added a comment - 2019-01-15 07:10

It looks like this has been fixed in ~~MDEV-17541~~.

Marko Mäkelä added a comment - 2019-01-15 07:10 It looks like this has been fixed in MDEV-17541 .

Marko Mäkelä added a comment - 2019-01-15 08:14

This issue was fixed as part of ~~MDEV-17541~~.

Marko Mäkelä added a comment - 2019-01-15 08:14 This issue was fixed as part of MDEV-17541 .

People

Assignee:: Marko Mäkelä

Reporter:: Seppo Jaakola

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2018-07-04 19:48

Updated:: 2019-01-15 08:14

Resolved:: 2019-01-15 08:14

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server