[MDEV-24786] Assertion `!have_x()' failed in sux_lock<srw>::s_lock() [with srw = ssux_lock_low] Created: 2021-02-04 Updated: 2021-03-30 Resolved: 2021-03-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Virtual Columns |
| Affects Version/s: | 10.2.35, 10.2.36, 10.2.37, 10.3.26, 10.3.27, 10.3.28, 10.4.16, 10.4.17, 10.4.18, 10.5.7, 10.5.8, 10.5.9 |
| Fix Version/s: | 10.2.38, 10.3.29, 10.4.19, 10.5.10, 10.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | affects-tests, hang, rr-profile | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
rr profile is available. FMI, this is the test I used, at the moment it fails reasonably well at least on two different machines:
|
| Comments |
| Comment by Marko Mäkelä [ 2021-02-04 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
nikitamalyavin, as far as I can tell, the exclusive root page latch was acquired in the thread, and we failed to invoke mtr_t::commit() as part of some error handling:
All this time, I had a watchpoint set on the lock word, which remained as WRITER==1U<<31. Later, the same thread would fail because it notices that it is already holding a conflicting latch on the page. Note: before | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
nikitamalyavin, in row_upd_clust_step() I am seeing several goto exit_func that are not preceded by a call to mtr_t::commit(). One of them is the following, which was added by you in
I think that we need an assertion
at the end of the function. Please check that each code path in this function is actually committing the mini-transaction and thus releasing the page latches. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2021-03-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
marko understood, thank you! I am not fluent in mtr stuff. Should I cover other goto's with mtr.commit() as well? Or maybe at exit_func add
? | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I do not think that mtr_t::is_active() should ever be used outside debug assertions. It is a little unfortunate that mtr_t is not implementing a proper RAII pattern. Maybe we could add a destructor with a debug assertion !is_active() to catch similar bugs. I think that it is better to avoid code duplication, i.e., add a new label, like this.
All goto exit_func code paths have to be carefully evaluated to ensure that they will be calling mtr_t::commit() after the mtr_t::start(). | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2021-03-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
marko please review https://github.com/MariaDB/server/commit/c45a637a79f891a0f98295b4ff310b09445d8d55 | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
At least one more code path in row_upd_clust_step() is missing a call to mtr_t::commit():
Inside row_upd_del_mark_clust_rec() we are missing at least one call to mtr_t::commit():
Please check all code paths, like I requested yesterday. And please try to not duplicate code. goto is a lesser evil. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2021-03-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I did, but missed that row_upd_del_mark_clust_rec has conditional returns. > And please try to not duplicate code. goto is a lesser evil | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2021-03-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I have fixed the return from row_upd_del_mark_clust_rec here: https://github.com/MariaDB/server/commit/9f032c19eb500068f82d3c49a97abf0f7b378b37 | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that we must write a test case that repeats these failures. We must also run all tests and check the code coverage in InnoDB. I will try to do that. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I repeated the bug with RQG and rr, but creating a test case could require major effort. I tried using MDEV-24582 as a guideline, but it was not successful. The virtual column evaluation failure that caused the bug is on the column DB_ROW_HASH1 (
We have seem to have a unique index on a virtual BLOB column that depends on another virtual column. I tried to blindly create a test case, using MDEV-24582 for reference, but I failed to reproduce the bug. Here is one variation that I tried:
In my test case, DELETE would proceed to innobase_get_computed_value(), but the evaluation would succeed. Because this failure probably depends on other bugs on virtual columns, it might not be meaningful to spend more effort on writing a regression test. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I decided to refactor the code so that we basically have all code paths going to a single mtr_t::commit() (and some subroutines are calling mtr_t::start() if they need to call mtr_t::commit()). While doing that, I only found the two code paths that were broken in | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2021-03-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Test case for the originally reported assertion failure (on 10.6):
|