[MDEV-7929] record_gtid() for non-transactional event group calls wakeup_subsequent_commits() too early, causing slave hang - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.0.17, 10.1.3
Fix Version/s: 10.0.18, 10.1.4
Component/s: Replication
Labels:
- parallelslave
- replication

Description

This was found together with ~~MDEV-7888~~, but it is a logically different bug,
so filing separately.

The parallel replication worker threads can hang in some cases with
non-transactional event groups. The symptom is that worker threads are stuck
in "waiting for prior transaction to start commit".

The problem is when record_gtid() runs at the end of the non-transactional
update. Then it needs to create its own transaction to update the
mysql.gtid_slave_pos table. This causes ha_commit_trans() to call
wakeup_subsequent_commits(). But this is wrong, it is too early.

The hang then occurs because a following transaction things the prior
non-transactional event group is already done - so it deallocates the
corresponding group_commit_orderer object. Then a following worker thread does
not get its wakeup, and the slave gets stuck.

Attachments

Issue Links

relates to

MDEV-7888 ANALYZE TABLE does wakeup_subsequent_commits(), causing wrong binlog order and parallel replication hang

Closed

Activity

People

Assignee:: Kristian Nielsen

Reporter:: Kristian Nielsen

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 2015-04-07 17:16

Updated:: 2015-04-08 15:05

Resolved:: 2015-04-08 15:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.