[MDEV-10067] Replication thread sporadically hanging on 'Table lock' state - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.0.22
Fix Version/s: N/A
Component/s: Locking, Replication
Labels:
- reproduction_missing

Bug Category:
Can result in hang or crash
Sprint:
10.2.1-3, 10.2.1-4, 10.2.1-5, 10.0.26

Description

From support issue.

A customer sporadically experiences an issue where the replication thread gets in state 'Table lock'. Other threads trying to access this table will get into the 'Waiting for table level lock' or 'Waiting for table lock' state. This lock seams to last indefinitely.
When this occurs, a normal shutdown of the server is no longer possible.

The setup is quite complicated, the same setup as described in the related issues:

Master A < - > Master B < - > Master C
Master B and C each have 3 slaves.

The table lock issue occured on master B. Master A receives most write traffic.

The SHOW OPEN TABLES command shows there is 1 table open:
Time: 2016-05-13 02:12:56

Database: customer_project
Table: LockedTableName
In_use: 1
Name_locked: 0

The processlist shows the SQL thread in 'Table lock' state:
18438 system user NULL Connect 4270 Table lock NULL 0.000

Sometimes the thread state is 'Waiting for table level lock' or it could be that the thread state changes from 'Waiting for table level lock' to 'Table lock' after it has been killed.

All tables are InnoDB. If a thread is waiting for a MDL the thread state is different.

Hopefully the next time we can generate a coredump. Adding the issue for now in case other people experience it as well. I don't expect this (for now) to give you enough information to work on the issue.

This issue occurs independently or possible before the related issues ~~MDEV-9952~~ and ~~MDEV-9670~~ occur. The issues do seam very much related (they occur in the same setup and usually not with much time in between). Hopefully this issue can provide some more ideas to the root cause.

Attachments

Issue Links

relates to

MDEV-23888 Potential server hang on replication with InnoDB

Closed

MDEV-9670 server_id mysteriously set to 0 in binlog causes GTID & multisource replication to break

Closed

MDEV-9952 Strange out-of-order sequence errors on slave, unreleased table locks and possibly SQL thread position corruption

Closed

Activity

People

Assignee:: Brandon Nesterenko

Reporter:: Michaël de groot

Votes:: 4 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2016-05-14 01:04

Updated:: 2025-10-26 17:44

Resolved:: 2025-10-26 17:44

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

10m

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.