[MDEV-32353] lock_release_on_prepare_try() does not release lock if supremum bit is set along with other bits set in lock's bitmap Created: 2023-10-05 Updated: 2023-10-13 Resolved: 2023-10-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, XA |
| Affects Version/s: | 10.6 |
| Fix Version/s: | 10.6.16 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Roel Van de Paar | Assignee: | Vladislav Lesin |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | debug, locking, regression | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
Leads to:
Bug confirmed present in: Bug (or feature/syntax) confirmed not present in: |
| Comments |
| Comment by Marko Mäkelä [ 2023-10-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can reproduce this with the merge of | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-10-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Some update. I am still analysing it, my goal is to create stable test case. But, more likely, it does not depend on What I see is that after "XA PREPARE" there is dict_stats_func() call by timer. It calls dict_stats_save_index_stat(), which, in turns, executes the following internal query:
During the query parsing node->has_clust_rec_x_lock is set to true in the following code:
Then row_upd_del_mark_clust_rec() is executed. It should set X-lock before delete-marking some record, but it did not do that, as node->has_clust_rec_x_lock was set:
node->has_clust_rec_x_lock is set because it's supposed that lock will be set during row_sel() execution. But, when row_sel() tries to set X-lock, it finds out that the transaction owns table X-lock, and skips record lock requesting. The table's X-lock is set in dict_stats_save() call with lock_table_for_trx(). It can be set, because "XA prepare" resets table's S and IS locks. Then, when INSERT statement of that internal procedure is executed, it invokes locking queue validation check during duplicates check. The validation sees that modified by one transaction record has explicit lock of another transaction. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
vlad.lesin, while analyzing | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-10-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko, yes, this is more likely the root case, "XA prepare" releases table's S and IS, but not X and IX locks. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Roel, as we can see from the stack trace from the description, the assertion failures in background thread. dict_stats_save() execution in background thread is triggered with dict_stats_update_if_needed() call during "INSERT INTO t VALUES (1);" execution. So, to make it stable, we have to wait until dict_stats_save() function is executed after "INSERT INTO t VALUES (1);" is finished. For this purpose I added some debug sync point:
And used it in mtr test case:
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
The fact that I can't repeat it on 10.6 without In the above comment I wrongly supposed that "INSERT…SELECT" requests X-locks during "SELECT" subquery execution. I supposed that because S-locks should be released on "XA PREPARE", and I did not check locks type during debugging. De-facto S-locks are requested. As Marko noted, "INSERT…SELECT" requests IS but not IX table lock. But, if we modify the request to force X record locks requesting, like "INSERT…SELECT...FOR UPDATE", table IX lock will be requested during row_search_mvcc() execution for "SELECT...FOR UPDATE" subquery:
As we can see, the type of table lock is either IS or IX. Both of them conflict with X-lock, requested from dict_stats_save() call with lock_table_for_trx(). So, my answer is wrong. Even if INSERT…SELECT is only acquiring an IS but not IX table lock, the "SELECT" subquery is either acquiring IS table and S record locks, which must be released on "XA PREPARE", or IX table lock. So either dict_stats_save() acquires X-lock as there are no IS table and S record locks, as "XA PREPARE" released them, or it ignores DB_LOCK_WAIT and the next dict_stats_save() call is rescheduled by timer. In any cases there can't be the lock queue validation crash, as no internal queries are executed by dict_stats_save() when the XA still holds conflicting record locks. But what is the reason of the crash? There is logical error in lock_release_on_prepare_try():
So, if supremum bit is set, we only reset it and rebuild waiting queue even if !lock->is_rec_granted_exclusive_not_gap(). The other bits in the lock bitmap are just ignored. The correct logic is the same as in lock_release_on_prepare() and the following:
So the issue is really caused by the error in | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
10.5 is not affected, but test cases backport would be useful. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
The fix looks good to me. Would this fix |