[MDEV-20483] trx_lock_t::table_locks is not a subset of trx_lock_t::trx_locks Created: 2019-09-03 Updated: 2020-08-07 Resolved: 2019-09-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2, 10.3, 10.4, 10.5 |
| Fix Version/s: | 10.2.28, 10.3.19, 10.4.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Thirunarayanan Balathandayuthapani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | corruption, transactions | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
Testing a fix of We will need a test case for this, so that the bug can be found and fixed. To repeat the assertion failures, revert the following part of the
This should lead to occasional assertion failures on trx->lock.table_locks.empty() on transaction commit. |
| Comments |
| Comment by Matthias Leich [ 2019-09-03 ] | |||||||||||||||||||||||||||||||
|
Assertion `(trx)->lock.table_locks.empty()' failed | |||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2019-09-05 ] | |||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-09-06 ] | |||||||||||||||||||||||||||||||
|
This appears to be intentional, after all. While the trx_lock_t::table_locks is not empty, it is filled with NULL pointers. The test MDEV-20483-C.test
This patch checks that when trx_locks is empty and table_locks is not, then table_locks must consist of only NULL pointers. mleich, can you please retest the latest 10.3 with the above patch applied? If the assertion in this patch does not fail, then we can conclude that there is no bug after all. I am asking to retest, because only some callers of lock_table_dequeue() invoke also lock_trx_table_locks_remove(). So, I cannot rule out a bug yet. | |||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2019-09-09 ] | |||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2019-09-12 ] | |||||||||||||||||||||||||||||||
|
The following patch removes the lock from table_locks vector. Previously lock_wait_timeout_thread doesn't remove
Above patch based on 10.2 (df4dee4b84ddc34799fa3a9648c142f12564597f) version. | |||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2019-09-17 ] | |||||||||||||||||||||||||||||||
|
When testing 10.2 + the patch above with RQG I got the usual fraction of known+open bugs. | |||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-09-17 ] | |||||||||||||||||||||||||||||||
|
Thanks, thiru! You forgot to remove the work-around from trx_t::release_locks() as part of your patch. Please remove that as well. Indeed, you added the only missing call. The one in lock_release() is intentionally missing, as the only caller lock_trx_release_locks() is invoking trx->lock.table_locks.clear(). As far as I can tell, based on a comment in lock_table(), the primary motivation for table_locks to exist is that while trx_locks can be modified by other threads (which can be converting implicit locks to explicit ones), table_locks is only accessed by the thread where the transaction is running. Despite this, lock_table_create() is holding trx->mutex (and lock_sys->mutex) while appending to table_locks. Given that the only extra locks in table_locks were cancelled wait locks, and that the only non-debug read access of table_locks is lock_table_has(), which ignores lock waits, there does not seem to be any correctness impact of this bug. |