[MDEV-14589] InnoDB should not lock a delete-marked record Created: 2017-12-05 Updated: 2023-11-30 Due: 2017-12-15 Resolved: 2017-12-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | None |
| Fix Version/s: | 10.2.12, 10.3.3, 10.1.34 |
| Type: | Bug | Priority: | Major |
| Reporter: | Michael Widenius | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | upstream, upstream-5.5, upstream-wontfix | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
When a transaction is able to delete-mark a record and commit the change, no active transaction may possibly have seen (and locked) the record. The record can be removed by the purge thread at any time, and it is logically equivalent to a gap (non-existing record). However, currently InnoDB transactions will lock committed delete-marked records. How to repeat: trx (1) has obtained an X-lock on the delete-marked committed record Suggested fix: The solution we will try to implement is that when using isolation level is READ COMMITTED then DELETE will ignore any delete marked rows. This is same as https://bugs.mysql.com/bug.php?id=19762 |
| Comments |
| Comment by Marko Mäkelä [ 2017-12-08 ] | |||||||||||||||||||||||||||||||||||||||||
|
I think that we can easily fix one special case of this problem: At READ COMMITTED isolation level, skip delete-marked records. It may be the case that at READ UNCOMMITTED, we are already skipping delete-marked records. Fixing the full problem is challenging. It might be easier when we have a per-table partitioned undo log format that would allow out-of-order purge. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-12-11 ] | |||||||||||||||||||||||||||||||||||||||||
|
The following test case demonstrates the problem.
The test will not hang if the BEGIN before the DELETE is omitted. I think that both at READ COMMITTED and READ UNCOMMITTED, it is safe to ignore committed delete-marked records without placing locks on them. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-12-11 ] | |||||||||||||||||||||||||||||||||||||||||
|
This turns out to be an omission in MySQL Bug#3300: The function row_unlock_for_mysql() is only effective for clustered index records. It would seem to be more efficient to avoid the lock acquisition on delete-marked records in row_search_mvcc() née row_search_for_mysql() in the first place. For clustered index records, this is easy, because we can determine if the DB_TRX_ID in the delete-marked record is referring to an active transaction. For secondary index records, we look up find the matching clustered index record and then skip the lock if the clustered index record is not found or if it no longer refers to an active transaction. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-12-11 ] | |||||||||||||||||||||||||||||||||||||||||
|
On a related note, the ‘do not lock non-matching rows’ logic of MySQL Bug#3300 (also known as the ‘semi-consistent read’) only works on the clustered index. Lock conflicts will not be avoided on secondary index records.
| |||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-12-11 ] | |||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2017-12-11 ] | |||||||||||||||||||||||||||||||||||||||||
|
It would be nice if test would include a case where another transaction tries to insert to that gap using > READ_COMMITTED isolation level. But actual fix looks correct to me and ok to push. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-12-11 ] | |||||||||||||||||||||||||||||||||||||||||
|
The added test case innodb.lock_deleted now covers the READ UNCOMMITTED, READ COMMITTED, and REPEATABLE READ isolation levels. At the REPEATABLE READ isolation level, the committed delete-marked secondary index record will be locked and will prevent an INSERT of the same record. The gap preceding the delete-marked record will not be locked. This seems wrong at the SERIALIZABLE isolation level (we should lock all the visited records and prevent updates or inserts). But we already know that the SERIALIZABLE isolation level is broken by design in other ways: if records were committed by a later-started transaction, the InnoDB SERIALIZABLE level will happily lock those records and treat them as READ COMMITTED. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-30 ] | |||||||||||||||||||||||||||||||||||||||||
|
My claim on SERIALIZABLE not being quite serializable is my opinion. Microsoft SQL Server documentation seems to define SERIALIZABLE in the same way as InnoDB:
Below is a copy of my comment to a blog post from February 2018: I discussed the SERIALIZABLE behaviour with my colleagues at MariaDB today.
The transaction of connection rr would see the old data of the t1.a, independent of when the concurrent UPDATE was completed. Then, it would insert and commit this old data to t2.a, and the supposedly SERIALIZABLE transaction would see the old data in t2. This does not look serializable to me. Theoretically, we could introduce a new PARANOID_SERIALIZABLE isolation level that would return the ER_SNAPSHOT_TOO_OLD error (simply by looking up and read-locking the primary key record and checking if its DB_TRX_ID is greater than the current transaction ID). Unfortunately, I think that it could return this error also in a case where the later-started-and-earlier-committed transaction was at SERIALIZABLE isolation level and the execution would actually have been serializable. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-05-29 ] | |||||||||||||||||||||||||||||||||||||||||
|
On customer request, I backported this to MariaDB 10.1.34. |