[MDEV-31655] Parallel replication deadlock victim preference code errorneously removed Created: 2023-07-10  Updated: 2023-10-18  Resolved: 2023-08-15

Status: Closed
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Affects Version/s: 10.4.30
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.9.8, 10.10.7, 10.11.6, 11.0.4

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-15278 rpl.rpl_parallel_optimistic failed in... Closed
Problem/Incident
causes MDEV-28776 rpl.rpl_mark_optimize_tbl_ddl fails w... Closed
Relates
relates to MDEV-24948 thd_need_wait_reports() hurts perform... Open

 Description   

Somewhere during the work on VATS in InnoDB, the code that calls thd_deadlock_victim_preference() was errorneously removed; and later the function itself was also removed errorneously as dead code.

commit 2fd3af44830e8df9d60f2e8a955f9ed17e744986
Author: sensssz <hjmsens@gmail.com>
Date:   Thu Dec 1 13:45:23 2016 -0500
 
    MDEV-11168: InnoDB: Failing assertion: !other_lock || wsrep_thd_is_BF(lock->trx->mysql_thd, FALSE) || wsrep_thd_is_BF(other_lock->trx->mysql_thd, FALSE)

commit 1513630d302932a90c94fef6803877f37f0e0f22
Author: Eugene Kosov <claprix@yandex.ru>
Date:   Mon Apr 9 17:21:21 2018 +0300
 
    remove dead code

This may lead to significantly more transaction retries in in-order parallel replication, since InnoDB may choose the wrong deadlock victim (If T1 and T2 need to replicate in-order and deadlock with each other, then choosing T1 as deadlock victim is pointless as T2 will be killed and deadlocked anyway by parallel replication).

The lost code should be restored.
It looks like the InnoDB deadlock code in later MariaDB versions was changed substantially, so the restored code probably needs to be rewritten for merging to those later versions.



 Comments   
Comment by Marko Mäkelä [ 2023-08-03 ]

MDEV-24738 in MariaDB Server 10.6 indeed rewrote the InnoDB deadlock detector, and 10.6 included other substantial changes to locking too, such as MDEV-20612. Last but not least, the buggy innodb_lock_schedule_algorithm=VATS (MDEV-11039) was removed in MDEV-16664 due to correctness problems.

Can you please provide a 10.6 version of this as well?

Comment by Kristian Nielsen [ 2023-08-15 ]

Pushed to 10.4 (and another fix made appropriate for 10.6).

Comment by Ralf Gebhardt [ 2023-10-03 ]

Elkin, knielsen, as "Pushed to 10.4 (and another fix made appropriate for 10.6)." is mentioned as a comment, which MDEV is the one for MariaDB Server 10.6

Comment by Kristian Nielsen [ 2023-10-03 ]

ralf.gebhardt , The MDEV is the same for 10.4 and 10.6, MDEV-31655.
.
The 10.4 commit is this: 900c4d692073ae51413d8f739977216a56663cbf
The 10.6 commit is this:18acbaf416ea7a42edc0b2fc51084eacda4d074c

(There are two different commits because of changes in 10.6 that requires completely different code for this fix).

Hope this helps,

  • Kristian.
Generated at Thu Feb 08 10:25:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.