[MDEV-24948] thd_need_wait_reports() hurts performance Created: 2021-02-23 Updated: 2023-09-06 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - InnoDB |
| Affects Version/s: | 10.6 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | performance | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
In
If this condition holds (with the patch it will never hold), then lock_wait_rpl_report() will have to be invoked. That function will not only hold lock_sys.wait_mutex but also hold exclusive lock_sys.latch while traversing the lock graph. This prevents any concurrent registration or removal of locks, even if those locks were not part of any conflict. To highlight the impact of this, I created an extreme benchmark where 20 concurrent connections are accessing a table that contains 10 or 20 rows. To alleviate contention on the buffer pool page latches, I padded the PRIMARY KEY with some extra bytes, so that there should be at most a few records per index page. Here are the results:
There is some fluctuation in the numbers due to the random nature of the contention and due to the short 60-second run-time of the test. Most notably, both log_bin=1 cases for 20 rows on the unmodified (plain) server appeared faster than the log_bin=0 runs. There were faster log_bin=0 runs; I chose the slowest ones. At 10 rows and 20 threads, the call to thd_need_wait_reports() seems to incur a clear overhead even when --skip-log-bin is in effect. I believe that this API should move from 'push' notification to 'pull' notification. That is, the predicate thd_need_wait_reports() must be removed, and instead, whenever the replication layer actually needs to know the waits-for relations, it could invoke a new function handlerton::report_waits(), which InnoDB would basically implement with lock_wait_rpl_report(). We could still invoke thd_rpl_deadlock_check() to report the graph. |
| Comments |
| Comment by Andrei Elkin [ 2021-08-04 ] | |||||||||||
|
Marko found out
| |||||||||||
| Comment by Marko Mäkelä [ 2021-08-04 ] | |||||||||||
|
MySQL Bug #32618301 INNODB SHOULD REPORT CHANGES IN WAIT-FOR GRAPH TO SERVER LAYER builds upon an earlier change Bug #29882690 UNDETECTED DEADLOCK WITH GAP AND INSERT INTENTION LOCKS IS POSSIBLE. The ‘push’ notification interface thd_report_row_lock_wait() (which I think is a performance problem) appears to be unaffected by those changes. | |||||||||||
| Comment by Marko Mäkelä [ 2023-08-07 ] | |||||||||||
|
| |||||||||||
| Comment by Kristian Nielsen [ 2023-08-07 ] | |||||||||||
|
Thanks Marko for doing the benchmark! Following up on the discussion on the developers list, here is some further background on this, and ideas for how we can improve. The thd_rpl_deadlock_check() is essential for optimistic and aggressive parallel replication. Here is how it works: We take a list of transactions to replicate T1, T2, T3, ... If T1 and T2 modify the same table row, we may end up with a deadlock where T1 is waiting for a row lock held by T2, and T2 is waiting for T1 to commit. To break this deadlock, thd_rpl_deadlock_check() is used to report all cases where one THD waits on another. Whenever there's a T_i that waits on T_j, i<j, we have a deadlock, and we resolve it by rolling back T_j so that T_i can commit and T_j can later be retried. This requires a push interface, as it is important for scaleability to break the deadlock as soon as possible and allow the earlier transactions to complete. The thd_need_wait_reports() function can be used to know which THDs need the call to thd_rpl_deadlock_check(). Ideally this could be used to only have the overhead for parallel replication threads where it is required. Note that thd_need_wait_reports() is a function of the THD, not of the transaction. It need only be called once for a THD and the result can then be cached across transactions. But in current code, thd_need_wait_reports() also returns true whenever the binlog is enabled, which will then apply to all THDs. I think the assumption was that if one thread anyway needs to wait for another, this is an expensive operation, and so the overhead of reporting it to the server layer would be negligible. If this is not true, maybe the additional use-cases should be re-considered, as they are not essential like for optimistic parallel replication. They are as follows: 1. When a transaction commits (to the binlog), it checks if it ever waited on a row lock. If so, a flag FL_WAITED is logged in the GTID event. This is used as a heuristic on the slave in optimistic (but not aggressive) mode to not apply that transaction in parallel, hoping to avoid a conflict. This heuristic was never really tested to see how beneficial it is, to my knowledge. I only remember the extensive benchmarks of Jean Francois Gagné, which I think found that agressive mode was faster anyway. It might be an idea to question if this FL_WAITED flag is still useful, or if we could simply remove the functionality. Even if we keep it, the information if a transaction waited during execution is only needed at binlog commit time. So this could be replaced by a pull interface, if InnoDB would set a flag in the trx whenever a lock wait is done. 2. The second use relates to conservative parallel replication, but in the binlog code on the master. The master marks in the GTID events in the binlog all transactions that group-committed together. The conservative mode slave runs in parallel (only) transactions that committed together and thus are (probably) without conflicts. The --binlog-commit-wait-usec option can be used on the master to make transactions wait a bit before committing, hoping to get more transactions to group-commit and improve conservative mode slave scalability. Then a heuristics exists so that if some transaction T1 is ready to commit but waiting for --binlog-commit-wait-usec, and another transaction T2 needs to wait for T1, then we cut short the wait and commit T1 immediately, hoping to reduce the scalability loss on the master from --binlog-commit-wait-usec. Again, I am not sure how much this heuristics has ever been tested for how effective it is. Conservative mode is also less important after optimistic mode was introduced, even more so the --binlog-commit-wait-usec option which must be hard to tune without negatively affecting master performance. So this use case could also be reconsidered if it is useful to have. Even if we keep it, we should at least make thd_need_wait_reports() return true only if --binlog-commit-wait-usec is non-zero, which will rarely be the case. So in summary:
(As an aside, one thing InnoDB could do to improve concurrency is to buffer the list of THDs waited for in an array (while holding lock_sys.wait_mutex and lock_sys.latch), and then pass them to thd_rpl_deadlock_check() only after releasing those mutexes. But better if we can avoid thd_rpl_deadlock_check() completely in the common case. Also ensuring THDs do not go away during thd_rpl_deadlock_check() might be tricky.) | |||||||||||
| Comment by Marko Mäkelä [ 2023-09-05 ] | |||||||||||
|
knielsen, thank you very much for the explanation. For what it is worth, I was wondering if fixing
|