Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15603

Gap Lock support in MyRocks




      (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

      Notes about how to use PerconaFT:

      1. Data structures
      1.1 A Global Lock Tree Manager object
      1.2 A separate Lock Tree for each table
      1.3 Each transaction keeps a track of ranges it is holding locks
      2. Functions
      2.1 Initializing the Lock Manager
      2.2 Create Lock Tree for a table
      2.3 Getting a lock
      2.4 Releasing a lock.
      2.5 Releasing all of the transaction's locks

      1. Data structures

      1.1 A Global Lock Tree Manager object

      There needs to be a global locktree_manager.

      See PerconaFT/src/ydb-internal.h,

        struct __toku_db_env_internal {
          toku::locktree_manager ltm;

      1.2 A separate Lock Tree for each table

      TokuDB uses a separate Lock Tree for each table db->i->lt.

      1.3 Each transaction keeps a track of ranges it is holding locks

      Each transaction has a list of ranges that it is holding locks on. It is referred to like so


      and is stored in this structure, together with a mutex to protect it:

        struct __toku_db_txn_internal {
            // maps a locktree to a buffer of key ranges that are locked.
            // it is protected by the txn_mutex, so hot indexing and a client
            // thread can concurrently operate on this txn.
            toku::omt<txn_lt_key_ranges> lt_map;
            toku_mutex_t txn_mutex;

      The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
      (See toku_txn_destroy for how to free this)

      2. Functions

      Most functions that are mentioned here are from storage/tokudb/PerconaFT/src/, ydb_txn.cc, ydb_row_lock.cc - this is TokuDB's layer above the Lock Tree.

      2.1 Initializing the Lock Manager


      2.2 Create Lock Tree for a table

      TokuDB does it when it opens a table's table_share. It is done like so:

              db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,

      Then, one needs to release it:


      after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

      (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

      2.3 Getting a lock

      This function has an example:

      // Get a range lock.
      // Return when the range lock is acquired or the default lock tree timeout has expired.  
      int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
              toku::lock_request::type lock_type) {

      It is also possible to start an asynchronous lock request and then wait for it (see toku_db_start_range_lock, toku_db_wait_range_lock). We don't have a use for this it seems

      Point locks are obtained by passing the same key as left_key and right_key.

      2.4 Releasing a lock.

      TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

      LockTree has a function to release locks from a specified range:

      locktree::release_locks(TXNID txnid, const range_buffer *ranges)

      Besides calling that, one will need to

      • wake up all waiting lock requests. release_locks doesn't wake them up. There is toku::lock_request::retry_all_lock_requests call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
      • Remove the released lock from the list of locks it is holding (which is in db_txn_struct_i(txn)->lt_map). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

      2.5 Releasing all of the transaction's locks

      See PerconaFT/src/ydb_txn.cc:

      static void toku_txn_release_locks(DB_TXN *txn) {
          // Prevent access to the locktree map while releasing.
          // It is possible for lock escalation to attempt to
          // modify this data structure while the txn commits.
          size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
          for (size_t i = 0; i < num_ranges; i++) {
              txn_lt_key_ranges ranges;
              int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
              toku_db_release_lt_key_ranges(txn, &ranges);


        1. screenshot-1.png
          51 kB
        2. screenshot-2.png
          36 kB
        3. screenshot-3.png
          22 kB

          Issue Links



              • Assignee:
                psergey Sergei Petrunia
                psergey Sergei Petrunia
              • Votes:
                1 Vote for this issue
                4 Start watching this issue


                • Created: