[MDEV-11215] Several locks taken to same record inside a transaction. Created: 2016-11-02  Updated: 2020-02-17

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Fix Version/s: None

Type: Task Priority: Major
Reporter: Jan Lindström (Inactive) Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 1
Labels: upstream

Issue Links:
Relates
relates to MDEV-14479 Do not acquire InnoDB record locks wh... Closed
relates to MDEV-14638 Replace trx_sys_t::rw_trx_set with LF... Closed
relates to MDEV-14776 InnoDB Monitor output generated by sp... Closed
relates to MDEV-16232 Use fewer mini-transactions Stalled
relates to MDEV-16406 Refactor the InnoDB record locks Open
relates to MDEV-16617 After upgrading from 5.5.52 to 10.1.3... Closed
relates to MDEV-16675 Unnecessary explicit lock acquisition... Closed
relates to MDEV-17598 InnoDB index option for per-record tr... Open
Sprint: 5.5.55

 Description   

CREATE TABLE `tu`(`id` int(11), `a` int(11) DEFAULT NULL, `b` varchar(10) DEFAULT NULL, `c` varchar(10) DEFAULT NULL, PRIMARY KEY(`id`), UNIQUE KEY `u`(`a`,`b`)) ENGINE=InnoDB DEFAULT CHARSET=latin1 STATS_PERSISTENT=0;
 
insert into tu values(1,1,'a','a'),(2,9999,'xxxx','x'),(3,10000,'b','b'),(4,4,'c','c');
 
set global innodb_status_output=ON;
set global innodb_status_output_locks=ON;
 
start transaction;
delete from tu where a = 9999 and b = 'xxxx';
show engine innodb status\G

In the output you'll see:

------------
TRANSACTIONS
------------
Trx id counter 636202
Purge done for trx's n:o < 636201 undo n:o < 0 state: running but idle
History list length 37
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 636201, ACTIVE 4 sec
3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1
MySQL thread id 1, OS thread handle 0x7f9e513a7700, query id 90 localhost root init
show engine innodb status
TABLE LOCK table `test`.`tu` trx id 636201 lock mode IX
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 3 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
--------

Normal and expected so far, record X locks on the unique index and primary key, we deleted and could delete one and only one row. Now, repeat the same delete in the same transaction:

delete from tu where a = 9999 and b = 'xxxx';
show engine innodb status\G

and you'll see this beautiful set of locks:

LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 636201, ACTIVE 202 sec
5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 1
MySQL thread id 1, OS thread handle 0x7f9e513a7700, query id 92 localhost root init
show engine innodb status
TABLE LOCK table `test`.`tu` trx id 636201 lock mode IX
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 3 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks gap before rec
--------

Suggested fix:
Check if there is any good reason to ask for a gap X lock on a record in the secondary unique index when transaction already has the next key lock on it.

Alternatively, if this is just the way locks are reported, please, fix it so the output makes sense.

In any case, please, document in the manual what locks this kind of DELETE sets when it encountered a record already marked as deleted, and why.



 Comments   
Comment by Jan Lindström (Inactive) [ 2016-11-02 ]

http://bugs.mysql.com/bug.php?id=83640

Comment by Jan Lindström (Inactive) [ 2017-02-16 ]

Let's start from transaction:

CREATE TABLE `tu`(`id` int(11), `a` int(11) DEFAULT NULL, `b` varchar(10) DEFAULT NULL, `c` varchar(10) DEFAULT NULL, PRIMARY KEY(`id`), UNIQUE KEY `u`(`a`,`b`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
insert into tu values(1,1,'a','a'),(2,9999,'xxxx','x'),(3,10000,'b','b'),(4,4,'c','c');
 
start transaction;
delete from tu where a = 9999 and b = 'xxxx';
delete from tu where a = 9999 and b = 'xxxx';
show engine innodb status;

Last line will output (if lock monitor is on) following :

---TRANSACTION 303 OS thread 140129523119872, ACTIVE 0 sec
5 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 1
MySQL thread id 2, OS thread handle 0x7f727272b700, query id 27 localhost root
show engine innodb status
TABLE LOCK table `test`.`tu` trx id 303 lock mode IX
RECORD LOCKS space id 0 page no 309 n bits 72 index `u` of table `test`.`tu` trx id 303 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 8000270f; asc   ' ;;
 1: len 4; hex 78787878; asc xxxx;;
 2: len 4; hex 80000002; asc     ;;
 
RECORD LOCKS space id 0 page no 308 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 303 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 6; hex 000000000303; asc       ;;
 2: len 7; hex 04000001370110; asc     7  ;;
 3: len 4; hex 8000270f; asc   ' ;;
 4: len 4; hex 78787878; asc xxxx;;
 5: len 1; hex 78; asc x;;
 
RECORD LOCKS space id 0 page no 309 n bits 72 index `u` of table `test`.`tu` trx id 303 lock_mode X
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 8000270f; asc   ' ;;
 1: len 4; hex 78787878; asc xxxx;;
 2: len 4; hex 80000002; asc     ;;
 
RECORD LOCKS space id 0 page no 309 n bits 72 index `u` of table `test`.`tu` trx id 303 lock_mode X locks gap before rec
Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80002710; asc   ' ;;
 1: len 1; hex 62; asc b;;
 2: len 4; hex 80000003; asc     ;;

Lets investigate taken locks one by one:

  • First lock is taken to the index record when we search index 'u' for matching row.
  • Second lock is taken for clustered record matching the index record above.
  • Third X-lock is taken to the index record that we have already above marded to be deleted
  • Forth gap lock is taken for index record gap between xxxx and b on index 'u'

Now there will be no more lock records even if you add more delete-clauses to open transaction. Now first and second lock are trivially necessary and these are taken by the first delete. Rest of the locks are taken the second delete-clause.

Let's first assume that second delete-clause would be from different transaction.
Forth lock is needed for repeatable read to avoid concurrent transaction inserting a record to index 'u' between 9999,'xxxx' and 10000,'b'. Can you insert anything there, actually yes you can e.g. record 99999,'xxxxx' but that would not be result set of the delete clause. But this gap lock also protects from insert with record 9999,'xxxx'.

Finally, third lock is needed because we need to serialize after transaction marked the index-record to be deleted so we need to wait until it is committed.

Now, lets get back to original case where both delete-clauses are in the same transaction. Firstly, as secondary indexes do not contain TRX_ID system field we do not easily know which transaction has delete marked the secondary index record. Secondly, requested lock modes are either stronger (X-lock > X-lock rec not gap) or on different record (X-lock heap 3, gap lock heap 4).

Can we do better, absolutely we could add TRX_ID system field, note that we have already taken necessary index-record locks in the same transaction and not take additional locks. However, this is not feasible on GA-release. Additionally, we could investigate at second delete is there a possibility to insert a record that would cause result set to change, that would naturally decrease the performance of operations and decrease the concurrency as this investigation takes time and is done holding page latches. Again, not feasible on GA-release.

Comment by Jan Lindström (Inactive) [ 2017-02-16 ]

Both lock acquire and lock printout work's as designed, thus closing this issue. You may open a new bug for documenting the used lock modes in different SQL-clauses but doing that exhaustively while maintaining "easy" reading might not be very easy.

Comment by Jan Lindström (Inactive) [ 2017-02-16 ]

valerii please review my comment, is that clear enough?

Comment by Jan Lindström (Inactive) [ 2017-02-16 ]

greenman Can you check the documentation if we could improve it.

Comment by Marko Mäkelä [ 2017-02-16 ]

In addition to explicit locks, there are also implicit locks. A record is implicitly locked by a transaction if it was written or modified by that transaction. And if a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.

For clustered index records, determining whether a record is implicitly locked is easy: the hidden DB_TRX_ID column will belong to an active (or XA PREPARE; not committed) transaction. For secondary index records it is very expensive because there is no per-record DB_TRX_ID but only a PAGE_MAX_TRX_ID. Therefore the function row_vers_impl_x_locked() has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a ‘death spiral’ of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a blog post 2 years ago I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS. As far as I can tell, fixing the bug would require introducing a per-record transaction ID, which would require some data dictionary changes, which would best wait for MDEV-11655.

Comment by Valerii Kravchuk [ 2017-02-16 ]

Thank you for detailed explanation. I still miss some details about that heap no 4 vs heap no 3 for the last gap lock. Why do we need to lock different gap for second DELETE?

Comment by Marko Mäkelä [ 2017-11-23 ]

valerii, let me answer by quoting the previously pasted output:

RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 3 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks gap before rec

Page 3 is the clustered index root page in an .ibd file, and page 4 in this case is a secondary index root page. Because the table is so small, these root pages are also leaf pages (only one page in each index tree). Locks can only exist on leaf pages. And InnoDB does not have a concept of 'row lock'; it always is an 'index record lock'.

There may exist a performance problem that is demonstrated by the output. For page 4, there are 3 different lock bitmaps.
jplindst, please display the contents of these bitmaps. We would like to see the record heap numbers corresponding to the set bits in these bitmaps.

The first bitmap for page 4 "locks rec but not gap" would be for record-only locks (locking only the records whose heap_numbers correspond to the set bits in the bitmap).
The second bitmap for page 4 (with no extra qualifier) is for record-and-preceding-gap locks (locking the identified records, plus the gap between the preceding record and the identified record, excluding the said records).

Here is the first potential problem: If we have the same heap numbers in both bitmaps, then the "locks rec but not gap" bitmap is redundant and should have been removed. If it started as a single-record bitmap, it could even have been converted to the normal rec+gap bitmap when the gap lock was placed.

The third and last bitmap for page 4 ("locks gap before rec") is for gap-only locks. In this one, we could easily have the page supremum pseudo-record, which we should never have in the other bitmaps.

I think that we should consider the following alternatives:

  1. Introduce a separate lock object for "page supremum gap", and never add the supremum to the "locks gap before rec" bitmap.
  2. When placing a lock request, check if an existing lock bitmap can be converted to "locks rec but not gap"
  3. Have a single record lock bitmap, with 2 bits per heap number: one for "rec" and another for "preceding gap"
Comment by Marko Mäkelä [ 2017-11-23 ]

I am reopening this, because there certainly is some room for improvement.

Comment by Marko Mäkelä [ 2018-01-10 ]

During the development of MDEV-14638 we found out that apparently, if a transaction has acquired an implicit exclusive lock by inserting a record, on a subsequent locking read (such as updating the inserted record) it would convert that implicit lock into an explicit one. That can surely be avoided.

Comment by Marko Mäkelä [ 2018-03-08 ]

Another area for improvement is that InnoDB is unnecessarily creating explicit record locks when covering table locks already exist. That is, if the entire table is locked for reads, InnoDB would still acquire record locks for locking reads (such as INSERT…SELECT). Or if the entire table is locked for writes, UPDATE or SELECT…FOR UPDATE would still create record locks. For large transactions, the unnecessary record locks can fill the buffer pool and cause InnoDB to commit suicide:

2018-03-07 13:57:04 8000 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=128M. Starting the InnoDB Monitor to print diagnostics.
2018-03-07 13:57:09 8000 [ERROR] [FATAL] InnoDB: Over 95 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks, or review if innodb_buffer_pool_size=128M could be bigger.

Comment by Marko Mäkelä [ 2018-07-03 ]

In MDEV-16406 we could refactor the multiple record lock bitmaps per page into a single bitmap (multiple bits per record).

After MDEV-16232, also UPDATE and DELETE could mainly rely on implicit locks (page latch held between the lookup and the modification of a record), so the explicit locks would mostly be used when locking conflicts exist.

Comment by Marko Mäkelä [ 2018-07-03 ]

The unnecessarily created explicit lock objects can be causing warning messages in the error log, of the form "difficult to find free blocks".

Comment by Marko Mäkelä [ 2018-07-03 ]

Fixing MDEV-16675 Unnecessary explicit lock acquisition during UPDATE or DELETE
will reduce the amount of explicit locks, but not remove them completely.
We will still need fixes to MDEV-14479, MDEV-16232 and MDEV-16406.

Generated at Thu Feb 08 07:48:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.