Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.6
Description
In MDEV-20612, MDEV-24671 and MDEV-24738, the concurrency of the InnoDB lock subsystem was improved, except for one aspect. Each time the refactored lock_wait() is invoked to handle a lock wait, a predicate must be evaluated. It is not only somewhat time-consuming to evaluate, but the predicate also unnecessarily holds even when replication is not set up, which causes another bottleneck. The code in question is highlighted in the following:
diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
|
--- a/storage/innobase/lock/lock0lock.cc
|
+++ b/storage/innobase/lock/lock0lock.cc
|
@@ -1758,7 +1758,7 @@ dberr_t lock_wait(que_thr_t *thr)
|
is not being used. We want to be allow the user to skip
|
lock_wait_rpl_report(). */
|
const bool rpl= !(type_mode & LOCK_AUTO_INC) && trx->mysql_thd &&
|
- innodb_deadlock_detect && thd_need_wait_reports(trx->mysql_thd);
|
+ innodb_deadlock_detect && false && thd_need_wait_reports(trx->mysql_thd);
|
#endif
|
timespec abstime;
|
set_timespec_time_nsec(abstime, suspend_time.val * 1000); |
If this condition holds (with the patch it will never hold), then lock_wait_rpl_report() will have to be invoked. That function will not only hold lock_sys.wait_mutex but also hold exclusive lock_sys.latch while traversing the lock graph. This prevents any concurrent registration or removal of locks, even if those locks were not part of any conflict.
To highlight the impact of this, I created an extreme benchmark where 20 concurrent connections are accessing a table that contains 10 or 20 rows. To alleviate contention on the buffer pool page latches, I padded the PRIMARY KEY with some extra bytes, so that there should be at most a few records per index page. Here are the results:
server | rows | log_bin=0/tps | log_bin=1/tps |
---|---|---|---|
plain | 10 | 824.57 | 819.14 |
patched | 10 | 832.04 | 830.63 |
plain | 20 | 3305.20 | 3324.75 |
patched | 20 | 3361.45 | 3398.25 |
There is some fluctuation in the numbers due to the random nature of the contention and due to the short 60-second run-time of the test. Most notably, both log_bin=1 cases for 20 rows on the unmodified (plain) server appeared faster than the log_bin=0 runs. There were faster log_bin=0 runs; I chose the slowest ones.
At 10 rows and 20 threads, the call to thd_need_wait_reports() seems to incur a clear overhead even when --skip-log-bin is in effect.
I believe that this API should move from 'push' notification to 'pull' notification. That is, the predicate thd_need_wait_reports() must be removed, and instead, whenever the replication layer actually needs to know the waits-for relations, it could invoke a new function handlerton::report_waits(), which InnoDB would basically implement with lock_wait_rpl_report(). We could still invoke thd_rpl_deadlock_check() to report the graph.
Attachments
Issue Links
- relates to
-
MDEV-25016 Race condition between lock_sys_t::cancel() and page split or merge
- Closed
-
MDEV-31655 Parallel replication deadlock victim preference code errorneously removed
- Closed
-
MDEV-20612 Improve InnoDB lock_sys scalability
- Closed
-
MDEV-24671 Assertion failure in lock_wait_table_reserve_slot()
- Closed
-
MDEV-24738 Improve the InnoDB deadlock checker
- Closed