[MDEV-18227] MyRocks-Gap-Lock: Lock escalation and updates to transaction's list of owned locks Created: 2019-01-14  Updated: 2019-01-15  Resolved: 2019-01-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-15603 Gap Lock support in MyRocks Stalled

 Description   

In TokuDB/PerconaFT range locking works as follows:

  • Each SQL table has a global "Lock Tree" (see locktree.h,cc, class locktree) which stores all locks that are currently held by all transactions.
  • Besides that, each transaction keeps a list of its own locks in each locktree
    in db_txn_struct_i(txn)->lt_map.

It is defined as follows:

struct txn_lt_key_ranges {
    toku::locktree *lt;
    toku::range_buffer *buffer;
};
 
...
    // maps a locktree to a buffer of key ranges that are locked.
    // it is protected by the txn_mutex, so hot indexing and a client
    // thread can concurrently operate on this txn.
    toku::omt<txn_lt_key_ranges> lt_map;

Lock escalation joins multiple locks into one in the global lock tree. Then it calls escalation callback (which points to toku_db_txn_escalate_callback()).

void toku_db_txn_escalate_callback(TXNID txnid, 
  const toku::locktree *lt, 
  const toku::range_buffer &buffer, 
  void *extra) 

The 3rd parameter of the function is a list of ranges that the transaction has locked after the escalation. toku_db_txn_escalate_callback replaces transaction's list of owned ranges with the provided list.
This way, lock escalation reduces memory usage in both the global lock table and in each transaction's list of owned locks.

One thing to care about is that lock escalation can happen in thread X, while the transaction operates in thread Y.

So, access to db_txn_struct_i(txn)->lt_map (or its equivalent) must be synchronized.



 Comments   
Comment by Sergei Petrunia [ 2019-01-14 ]

...and in TokuDB it is not fully synchronized.

Consider this example: apply this patch:
https://gist.github.com/spetrunia/b8d3d24acb957e772539af384a36d98a

Start the server.

create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t10 (
  pk int not null primary key,
  a int
) engine=tokudb;

Thread 2: insert something to disable STO:

begin;
insert into t10 values (1000*1000, 100500);

Thread 1: acquire 10 locks:

begin;
insert into t10 select a*10, a*10 from ten;

Start acquiring 11th lock. Freeze the execution after we've got the lock, but before db_txn_note_row_lock has added it into the transaction's list of owned locks.

!touch /tmp/dbg1-range-lock-wait
insert into t10 values (12,12); -- this hangs

Thread 2:

!rm /tmp/dbg1-range-lock-wait

b toku::locktree_manager::check_current_lock_constraints.

insert into t10 values (1000*1000+1, 100500);

return true from check_current_lock_constraints, observe this in debugger:

toku::locktree::escalate
  num_extracted=19  // total ranges 
  num_range_buffers= 2 // total transactions that own ranges
 
toku_db_txn_escalate_callback 
  ranges.buffer._num_ranges=10 // the list of owned locks has 10 entries
  buffer._num_ranges=1  //  they will be replaced with one 

Continue and let Thread2's insert to complete.
Unfreeze thread1's INSERT

b db_txn_note_row_lock
!touch /tmp/dbg1-range-lock-wait-done

and see how this call

    // add a new lock range to this txn's row lock buffer
    size_t old_mem_size = ranges.buffer->total_memory_size();
    ranges.buffer->append(left_key, right_key);

will add a lock on point "12"

(gdb) p *left_key
  $270 = {data = 0x7ffec40245d0, size = 5, ulen = 0, flags = 0}
(gdb) p *right_key
  $271 = {data = 0x7ffec40245d0, size = 5, ulen = 0, flags = 0}
(gdb) x/5cx left_key.data
  0x7ffec40245d0:	0x00	0x0c	0x00	0x00	0x00
(gdb) x/5cx right_key.data
  0x7ffec40245d0:	0x00	0x0c	0x00	0x00	0x00

into the post-escalation lock list:

(gdb) p *ranges.buffer
  $275 = {static MAX_KEY_SIZE = 65536, _arena = {_current_chunk = {buf = 0x7ffec4034d30 "", used = 18, size = 4096}, _other_chunks = 0x0, _n_other_chunks = 0, _size_of_other_chunks = 0, _footprint_of_other_chunks = 0}, _num_ranges = 1}
 
(gdb) set $ptr= ((char*)0x7ffec4034d30)
 
(gdb) x/10x ($ptr + sizeof(toku::range_buffer::record_header))
  0x7ffec4034d38:	0x00	0x00	0x00	0x00	0x00	0x00	0x5a	0x00
  0x7ffec4034d40:	0x00	0x00
(gdb) p 0x5a
  $293 = 90

So, this property exists, but is currently harmless.

Generated at Thu Feb 08 08:42:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.