[MDEV-31167] parallel replication gets deadlocked on v10.11.2 with innodb - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.11.2
Fix Version/s: 10.6.13, 10.8.8, 10.9.6, 10.10.4, 10.11.3, 11.0.2, 11.1.1
Component/s: Replication, Storage Engine - InnoDB
Labels:
- replication
Environment:
Ubuntu 22.04 LTS, upstream mariadb packages, NVME RAID10 storage

Description

For years we are using a master - slave parallel replication setup which always worked fine.
We were using previously v10.1, v10.4 and v10.6 of mariadb and never witnessed the issue.

However since upgrading to v10.11.2 the parallel replication process gets "stuck" every few days.
When this happens, the only solution is to `kill -9` the mariadb process.

We have 2 replicas, one which runs continuously without being interrupted, and that one doesn't have the issue.
The other one however we use for making daily backups. So we stop mariadb at midnight, then make the backup (which takes about 7 - 8 hours to complete) and then start mariadb again.
Of course this means that this server has to catch up with several hours worth of binlogs, which is what seems to trigger the deadlock.

This is the output of "show slave status":

https://dpaste.org/5axfT

This is the output of "show processlist":

https://dpaste.org/Ub10M

This is the output of "show engine innodb status":

https://dpaste.org/KmP1b

The full backtrace of all mariadb threads is attached as a txt file to this ticket.

These are my relevant mariadb settings:

slave_parallel_threads = 16
slave_parallel_mode = optimistic
innodb_compression_default = ON

I spoke to montywi and knielsen on #maria on liberachat about this and they recommended me to file a jira ticket here.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadbd_full_bt_all_threads.txt
217 kB
2023-05-02 09:14

Issue Links

duplicates

MDEV-29835 Partial server freeze

Closed

Activity

Marko Mäkelä added a comment - 2023-05-02 11:20

In mariadbd_full_bt_all_threads.txt there are Thread 18 and Thread 23 holding a shared latch on the block descriptor 0x7f7eec802e60, both also waiting for a latch on the block 0x7f7eec8021e0. Thread 12 is waiting on an exclusive latch on the former block and holding an exclusive latch on the latter block. Thread 12 is violating the design rules, as noted in ~~MDEV-29835~~. With the fix, it would have acquired an exclusive latch on the index, which would prevent other threads (such as Thread 18 and Thread 23 here) from acquiring any latches on non-leaf index pages.

Marko Mäkelä added a comment - 2023-05-02 11:20 In mariadbd_full_bt_all_threads.txt there are Thread 18 and Thread 23 holding a shared latch on the block descriptor 0x7f7eec802e60, both also waiting for a latch on the block 0x7f7eec8021e0. Thread 12 is waiting on an exclusive latch on the former block and holding an exclusive latch on the latter block. Thread 12 is violating the design rules, as noted in MDEV-29835 . With the fix, it would have acquired an exclusive latch on the index, which would prevent other threads (such as Thread 18 and Thread 23 here) from acquiring any latches on non-leaf index pages.

People

Assignee:: Marko Mäkelä

Reporter:: Jan Geboers

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2023-05-02 09:20

Updated:: 2023-05-02 11:21

Resolved:: 2023-05-02 11:21

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration