Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.11.2
-
Ubuntu 22.04 LTS, upstream mariadb packages, NVME RAID10 storage
Description
For years we are using a master - slave parallel replication setup which always worked fine.
We were using previously v10.1, v10.4 and v10.6 of mariadb and never witnessed the issue.
However since upgrading to v10.11.2 the parallel replication process gets "stuck" every few days.
When this happens, the only solution is to `kill -9` the mariadb process.
We have 2 replicas, one which runs continuously without being interrupted, and that one doesn't have the issue.
The other one however we use for making daily backups. So we stop mariadb at midnight, then make the backup (which takes about 7 - 8 hours to complete) and then start mariadb again.
Of course this means that this server has to catch up with several hours worth of binlogs, which is what seems to trigger the deadlock.
This is the output of "show slave status":
This is the output of "show processlist":
This is the output of "show engine innodb status":
The full backtrace of all mariadb threads is attached as a txt file to this ticket.
These are my relevant mariadb settings:
slave_parallel_threads = 16
slave_parallel_mode = optimistic
innodb_compression_default = ON
I spoke to montywi and knielsen on #maria on liberachat about this and they recommended me to file a jira ticket here.
Attachments
Issue Links
- duplicates
-
MDEV-29835 Partial server freeze
-
- Closed
-
In mariadbd_full_bt_all_threads.txt
there are Thread 18 and Thread 23 holding a shared latch on the block descriptor 0x7f7eec802e60, both also waiting for a latch on the block 0x7f7eec8021e0. Thread 12 is waiting on an exclusive latch on the former block and holding an exclusive latch on the latter block. Thread 12 is violating the design rules, as noted in
MDEV-29835. With the fix, it would have acquired an exclusive latch on the index, which would prevent other threads (such as Thread 18 and Thread 23 here) from acquiring any latches on non-leaf index pages.