Details

    Description

      (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

      Notes about how to use PerconaFT:

      1. Data structures
      1.1 A Global Lock Tree Manager object
      1.2 A separate Lock Tree for each table
      1.3 Each transaction keeps a track of ranges it is holding locks
      2. Functions
      2.1 Initializing the Lock Manager
      2.2 Create Lock Tree for a table
      2.3 Getting a lock
      2.4 Releasing a lock.
      2.5 Releasing all of the transaction's locks

      1. Data structures

      1.1 A Global Lock Tree Manager object

      There needs to be a global locktree_manager.

      See PerconaFT/src/ydb-internal.h,

        struct __toku_db_env_internal {
          toku::locktree_manager ltm;
      

      1.2 A separate Lock Tree for each table

      TokuDB uses a separate Lock Tree for each table db->i->lt.

      1.3 Each transaction keeps a track of ranges it is holding locks

      Each transaction has a list of ranges that it is holding locks on. It is referred to like so

        db_txn_struct_i(txn)->lt_map
      

      and is stored in this structure, together with a mutex to protect it:

        struct __toku_db_txn_internal {
            // maps a locktree to a buffer of key ranges that are locked.
            // it is protected by the txn_mutex, so hot indexing and a client
            // thread can concurrently operate on this txn.
            toku::omt<txn_lt_key_ranges> lt_map;
            toku_mutex_t txn_mutex;
      

      The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
      (See toku_txn_destroy for how to free this)

      2. Functions

      Most functions that are mentioned here are from storage/tokudb/PerconaFT/src/, ydb_txn.cc, ydb_row_lock.cc - this is TokuDB's layer above the Lock Tree.

      2.1 Initializing the Lock Manager

      TODO

      2.2 Create Lock Tree for a table

      TokuDB does it when it opens a table's table_share. It is done like so:

              db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                   toku_ft_get_comparator(db->i->ft_handle),
                                                   &on_create_extra);
      

      Then, one needs to release it:

      db->dbenv->i->ltm.release_lt(db->i->lt);
      

      after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

      (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

      2.3 Getting a lock

      This function has an example:

      // Get a range lock.
      // Return when the range lock is acquired or the default lock tree timeout has expired.
      int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
              toku::lock_request::type lock_type) {
      

      It is also possible to start an asynchronous lock request and then wait for it (see toku_db_start_range_lock, toku_db_wait_range_lock). We don't have a use for this it seems

      Point locks are obtained by passing the same key as left_key and right_key.

      2.4 Releasing a lock.

      TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

      LockTree has a function to release locks from a specified range:

      locktree::release_locks(TXNID txnid, const range_buffer *ranges)
      

      Besides calling that, one will need to

      • wake up all waiting lock requests. release_locks doesn't wake them up. There is toku::lock_request::retry_all_lock_requests call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
      • Remove the released lock from the list of locks it is holding (which is in db_txn_struct_i(txn)->lt_map). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

      2.5 Releasing all of the transaction's locks

      See PerconaFT/src/ydb_txn.cc:

      static void toku_txn_release_locks(DB_TXN *txn) {
          // Prevent access to the locktree map while releasing.
          // It is possible for lock escalation to attempt to
          // modify this data structure while the txn commits.
          toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);
       
          size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
          for (size_t i = 0; i < num_ranges; i++) {
              txn_lt_key_ranges ranges;
              int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
              invariant_zero(r);
              toku_db_release_lt_key_ranges(txn, &ranges);
          }
       
          toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
      }
      

      Attachments

        1. screenshot-1.png
          screenshot-1.png
          51 kB
        2. screenshot-2.png
          screenshot-2.png
          36 kB
        3. screenshot-3.png
          screenshot-3.png
          22 kB

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -

            TokuDB's lock tree is here: storage/tokudb/PerconaFT/locktree. They lock
            ranges.

            (gdb) wher
              #0  toku::locktree::sto_try_acquire (this=0x7fff700342c0, prepared_lkr=0x7fffd4b6c390, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:291
              #1  0x00007ffff4d6eaa1 in toku::locktree::acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:380
              #2  0x00007ffff4d6eb73 in toku::locktree::try_acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:399
              #3  0x00007ffff4d6ec1a in toku::locktree::acquire_write_lock (this=0x7fff700342c0, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:412
              #4  0x00007ffff4d72dc4 in toku::lock_request::start (this=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/lock_request.cc:165
              #5  0x00007ffff4d603aa in toku_db_start_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE, request=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:211
              #6  0x00007ffff4d6022e in toku_db_get_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:182
              #7  0x00007ffff4e31643 in c_set_bounds (dbc=0x7fff7005f000, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, pre_acquire=true, out_of_range_error=-30989) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_cursor.cc:714
              #8  0x00007ffff4d195df in ha_tokudb::prelock_range (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:5978
              #9  0x00007ffff4d19a31 in ha_tokudb::read_range_first (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00, eq_range=false, sorted=true) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:6025
              #10 0x0000555555d761dc in handler::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:291
              #11 0x0000555555d763be in Mrr_simple_index_reader::get_next (this=0x7fff7002d3d8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:323
              #12 0x0000555555d7901a in DsMrr_impl::dsmrr_next (this=0x7fff7002d298, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:1399
              #13 0x00007ffff4d30b56 in ha_tokudb::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb_mrr_maria.cc:42
              #14 0x000055555601f3a2 in QUICK_RANGE_SELECT::get_next (this=0x7fff7002f800) at /home/psergey/dev-git/10.3-r2/sql/opt_range.cc:11454
              #15 0x0000555556030e64 in rr_quick (info=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.cc:366
              #16 0x0000555555b3b03b in READ_RECORD::read_record (this=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.h:73
              #17 0x0000555555c3e4a4 in join_init_read_record (tab=0x7fff700161e8) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:20227
              #18 0x0000555555c3c256 in sub_select (join=0x7fff700145b0, join_tab=0x7fff700161e8, end_of_records=false) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:19301
              #19 0x0000555555c3b821 in do_select (join=0x7fff700145b0, procedure=0x0) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:18844
            

            psergei Sergei Petrunia added a comment - TokuDB's lock tree is here: storage/tokudb/PerconaFT/locktree. They lock ranges. (gdb) wher #0 toku::locktree::sto_try_acquire (this=0x7fff700342c0, prepared_lkr=0x7fffd4b6c390, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:291 #1 0x00007ffff4d6eaa1 in toku::locktree::acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:380 #2 0x00007ffff4d6eb73 in toku::locktree::try_acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:399 #3 0x00007ffff4d6ec1a in toku::locktree::acquire_write_lock (this=0x7fff700342c0, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:412 #4 0x00007ffff4d72dc4 in toku::lock_request::start (this=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/lock_request.cc:165 #5 0x00007ffff4d603aa in toku_db_start_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE, request=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:211 #6 0x00007ffff4d6022e in toku_db_get_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:182 #7 0x00007ffff4e31643 in c_set_bounds (dbc=0x7fff7005f000, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, pre_acquire=true, out_of_range_error=-30989) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_cursor.cc:714 #8 0x00007ffff4d195df in ha_tokudb::prelock_range (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:5978 #9 0x00007ffff4d19a31 in ha_tokudb::read_range_first (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00, eq_range=false, sorted=true) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:6025 #10 0x0000555555d761dc in handler::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:291 #11 0x0000555555d763be in Mrr_simple_index_reader::get_next (this=0x7fff7002d3d8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:323 #12 0x0000555555d7901a in DsMrr_impl::dsmrr_next (this=0x7fff7002d298, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:1399 #13 0x00007ffff4d30b56 in ha_tokudb::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb_mrr_maria.cc:42 #14 0x000055555601f3a2 in QUICK_RANGE_SELECT::get_next (this=0x7fff7002f800) at /home/psergey/dev-git/10.3-r2/sql/opt_range.cc:11454 #15 0x0000555556030e64 in rr_quick (info=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.cc:366 #16 0x0000555555b3b03b in READ_RECORD::read_record (this=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.h:73 #17 0x0000555555c3e4a4 in join_init_read_record (tab=0x7fff700161e8) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:20227 #18 0x0000555555c3c256 in sub_select (join=0x7fff700145b0, join_tab=0x7fff700161e8, end_of_records=false) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:19301 #19 0x0000555555c3b821 in do_select (join=0x7fff700145b0, procedure=0x0) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:18844
            psergei Sergei Petrunia added a comment - Data collected so far: https://gist.github.com/spetrunia/c75b34d70aaea3b927e478557ff89ab5
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Description This task is for tracking https://github.com/facebook/mysql-5.6/issues/800
            This task is for tracking https://github.com/facebook/mysql-5.6/issues/800.


            == Data structures ==

            ==== A Global Lock Tree Manager ===
            PerconaFT/src/ydb-internal.h

              struct __toku_db_env_internal {
                toku::locktree_manager ltm;

            ==== Each table has its own Lock Tree ===
            See db->i->lt.

            ==== Each transaction has ranges it holds lock on ==

            Each transaction has a list of ranges that it is holding locks on:

              db_txn_struct_i(txn)->lt_map

              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;

              and a mutex to protect it:
                toku_mutex_t txn_mutex;

              init:
                db_txn_struct_i(result)->lt_map.create_no_array();
                (or create()?)

            psergei Sergei Petrunia made changes -
            Description This task is for tracking https://github.com/facebook/mysql-5.6/issues/800.


            == Data structures ==

            ==== A Global Lock Tree Manager ===
            PerconaFT/src/ydb-internal.h

              struct __toku_db_env_internal {
                toku::locktree_manager ltm;

            ==== Each table has its own Lock Tree ===
            See db->i->lt.

            ==== Each transaction has ranges it holds lock on ==

            Each transaction has a list of ranges that it is holding locks on:

              db_txn_struct_i(txn)->lt_map

              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;

              and a mutex to protect it:
                toku_mutex_t txn_mutex;

              init:
                db_txn_struct_i(result)->lt_map.create_no_array();
                (or create()?)

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. Data structures

            h3. A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. A separate Lock Tree for each table
            See {{db->i->lt}}

            ==== Each transaction has ranges it holds lock on ==

            Each transaction has a list of ranges that it is holding locks on:

              db_txn_struct_i(txn)->lt_map

              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;

              and a mutex to protect it:
                toku_mutex_t txn_mutex;

              init:
                db_txn_struct_i(result)->lt_map.create_no_array();
                (or create()?)

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. Data structures

            h3. A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. A separate Lock Tree for each table
            See {{db->i->lt}}

            ==== Each transaction has ranges it holds lock on ==

            Each transaction has a list of ranges that it is holding locks on:

              db_txn_struct_i(txn)->lt_map

              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;

              and a mutex to protect it:
                toku_mutex_t txn_mutex;

              init:
                db_txn_struct_i(result)->lt_map.create_no_array();
                (or create()?)

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3 .1.3 Each transaction keeps a track of ranges it holds lock on ==

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            {code}
              and a mutex to protect it:
                toku_mutex_t txn_mutex;
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3 .1.3 Each transaction keeps a track of ranges it holds lock on ==

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            {code}
              and a mutex to protect it:
                toku_mutex_t txn_mutex;
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3 .1.3 Each transaction keeps a track of ranges it holds lock on

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3 .1.3 Each transaction keeps a track of ranges it holds lock on

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer
            above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            h3. 2.3 Getting a write lock

            h3. 2.4 Getting a read lock

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer
            above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            h3. 2.3 Getting a write lock

            h3. 2.4 Getting a read lock

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            h3. 2.4 Releasing a lock.

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on:
            {code:cpp}
              db_txn_struct_i(txn)->lt_map

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
            {code}

            and a mutex to protect it:
            {code}
                toku_mutex_t txn_mutex;
            {code}
            Note that lock escalation process may modify this list (?).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            h3. 2.4 Releasing a lock.

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            h3. 2.4 Releasing a lock.

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            h3. 2.4 Releasing a lock.

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            h3. 2.4 Releasing a lock.

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            h3. 2.4 Releasing a lock.

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initialize the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            h3. 2.5 Releasing all locks.

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

            h3. 2.5 Releasing all of transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

            h3. 2.5 Releasing all of transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished)

            h3. 2.5 Releasing all of transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished)

            h3. 2.5 Releasing all of transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to
            * wake up all waiting lock requests (Yes. that function will not do that. A
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to
            * wake up all waiting lock requests (Yes. that function will not do that. A
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to
            * wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            psergei Sergei Petrunia made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to
            * wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            1. Data structures
            1.1 A Global Lock Tree Manager object
            1.2 A separate Lock Tree for each table
            1.3 Each transaction keeps a track of ranges it is holding locks
            2. Functions
            2.1 Initializing the Lock Manager
            2.2 Create Lock Tree for a table
            2.3 Getting a lock
            2.4 Releasing a lock.
            2.5 Releasing all of the transaction's locks

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to
            * wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}

            The MDEV text now has a description of how to use the range locker from TokuDB.

            Other input: there is a big concern about regressions wrt the current way of doing locking. Most likely, we will need to support both current locking mode (where gap locking is not available for any transaction) and the range locking mode (where some transactions may take range locks in some circumstances. Others take row locks. Both kinds of locks inhibit each other).

            psergei Sergei Petrunia added a comment - The MDEV text now has a description of how to use the range locker from TokuDB. Other input: there is a big concern about regressions wrt the current way of doing locking. Most likely, we will need to support both current locking mode (where gap locking is not available for any transaction) and the range locking mode (where some transactions may take range locks in some circumstances. Others take row locks. Both kinds of locks inhibit each other).
            psergei Sergei Petrunia added a comment - - edited

            Current locking code does "Snapshot Checking" (See PessimisticTransaction::ValidateSnapshot):

            When acquiring a point lock on $ROW_KEY, a transaction will check whether there were any changes made to $ROW_KEY after the transaction's snapshot was taken.

            This apparently cannot be efficiently done for range locks.

            But it seems to be also unnecessary. Here's why:

            Snapshot checking (ValidateSnapshot) is needed to prevent situations like this:

            trx1> start; allocate a snapshot 
             
            trx2> update value for $ROW_KEY_1; commit;
             
            trx1> update value for $ROW_KEY_1;   -- note that we are using a snapshot and
                                                 -- dont see trx2's changes
             
            trx1> commit; -- this overwrites changes by trx2.
            

            That is, this is an "optimistic-like" method to make sure that transaction's snapshot has not been "made obsolete" by some other transaction.

            With Range Locking,

            • We can't have "ValidateSnapshot for ranges"
            • but we place locks on all records we read.

            Range locks would not prevent the above scenario between trx1 and trx2, as trx2 updates $ROW_KEY_1 before trx1 attempts to read it.

            However, when transactions use locking, we can assume that trx1 "happened after" trx2 has committed. (The only thing that would prevent this assumption would be that trx1 has read a value that trx2 is modifying. But in that case, trx1 would have held a read lock that would have prevented trx2 from making the modification).

            The only issue here is that trx1 must not use a snapshot that was created before trx2 has committed.

            To sum up: RangeLockingForReads

            • Does not need to use ValidateSnapshot
            • But must not use the snapshot from the beginning of the transaction. (That is, if we are reading data using snapshot S, then S must have been acquired
              after we have obtained a lock covering the rowkey we are reading. This is our guarantee that nobody has sneaked in an update).

            If we are holding all locks for the duration of the transaction, there is no problem with reading inconsistent data (the data will be the same as if we've used the snapshot made after the most-recently-modified row we've read)

            psergei Sergei Petrunia added a comment - - edited Current locking code does "Snapshot Checking" (See PessimisticTransaction::ValidateSnapshot): When acquiring a point lock on $ROW_KEY, a transaction will check whether there were any changes made to $ROW_KEY after the transaction's snapshot was taken. This apparently cannot be efficiently done for range locks. But it seems to be also unnecessary. Here's why: Snapshot checking (ValidateSnapshot) is needed to prevent situations like this: trx1> start; allocate a snapshot   trx2> update value for $ROW_KEY_1; commit;   trx1> update value for $ROW_KEY_1; -- note that we are using a snapshot and -- dont see trx2's changes   trx1> commit; -- this overwrites changes by trx2. That is, this is an "optimistic-like" method to make sure that transaction's snapshot has not been "made obsolete" by some other transaction. With Range Locking, We can't have "ValidateSnapshot for ranges" but we place locks on all records we read. Range locks would not prevent the above scenario between trx1 and trx2, as trx2 updates $ROW_KEY_1 before trx1 attempts to read it. However, when transactions use locking, we can assume that trx1 "happened after" trx2 has committed. (The only thing that would prevent this assumption would be that trx1 has read a value that trx2 is modifying. But in that case, trx1 would have held a read lock that would have prevented trx2 from making the modification). The only issue here is that trx1 must not use a snapshot that was created before trx2 has committed. To sum up: RangeLockingForReads Does not need to use ValidateSnapshot But must not use the snapshot from the beginning of the transaction. (That is, if we are reading data using snapshot S, then S must have been acquired after we have obtained a lock covering the rowkey we are reading. This is our guarantee that nobody has sneaked in an update). If we are holding all locks for the duration of the transaction, there is no problem with reading inconsistent data (the data will be the same as if we've used the snapshot made after the most-recently-modified row we've read)

            1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535
            2. MyRocks does allow transactions to explicitly take a snapshot at the very beginning, before any reads start. What happens to those cases?

            myabandeh Maysam Yabandeh added a comment - 1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535 2. MyRocks does allow transactions to explicitly take a snapshot at the very beginning, before any reads start. What happens to those cases?

            If the transaction has already taken a snapshot at the beginning, perhaps we can get the implementation to guarantee that it would never call ::Get before RangeLockingForReads, and then upgrade the snapshot after the last call to RangeLockingForReads. This would be as we have delayed the transaction's request to take the snapshot.

            The problem with this approach would be losing linearlizability: If for the two transactions, the client make connections between their input/output outside the sql engine, then it might get inconsistent results as we did not actually take snapshot at the wall-clock-time that we confirmed the client that we did. For example in this sequence of events running "from the same client session":

            K1=V1
            txn B starts
            txn B take snapshot
             
            txn A writes VA to K1
            txn A commits
             
            txn B reads K1
            

            The client expects Txn B to read V1 but we return VA. I think it should be fine since our supported isolation level is not linearizable anyway (it is not even serializable).

            myabandeh Maysam Yabandeh added a comment - If the transaction has already taken a snapshot at the beginning, perhaps we can get the implementation to guarantee that it would never call ::Get before RangeLockingForReads, and then upgrade the snapshot after the last call to RangeLockingForReads. This would be as we have delayed the transaction's request to take the snapshot. The problem with this approach would be losing linearlizability: If for the two transactions, the client make connections between their input/output outside the sql engine, then it might get inconsistent results as we did not actually take snapshot at the wall-clock-time that we confirmed the client that we did. For example in this sequence of events running "from the same client session": K1=V1 txn B starts txn B take snapshot   txn A writes VA to K1 txn A commits   txn B reads K1 The client expects Txn B to read V1 but we return VA. I think it should be fine since our supported isolation level is not linearizable anyway (it is not even serializable).

            I've put up a tree here: https://github.com/spetrunia/mysql-5.6/tree/range-locking-fb-mysql-5.6.35

            Current status:

            • MyRocks has a read-only global variable @@rocksdb_use_range_locking which one can set in my.cnf
            • In addition to class TransactionLockMgr, RocksDB (a modified copy of it) includes another class which uses PerconaFT' locktree to provide locks.
            • Currently, it only does point, write-only locks.
            • The state is: it compiled, it worked for a basic example. Lots of details are still missing and in particular, the APIs are not final.
            psergei Sergei Petrunia added a comment - I've put up a tree here: https://github.com/spetrunia/mysql-5.6/tree/range-locking-fb-mysql-5.6.35 Current status: MyRocks has a read-only global variable @@rocksdb_use_range_locking which one can set in my.cnf In addition to class TransactionLockMgr, RocksDB (a modified copy of it) includes another class which uses PerconaFT' locktree to provide locks. Currently, it only does point, write-only locks. The state is: it compiled, it worked for a basic example. Lots of details are still missing and in particular, the APIs are not final.

            1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535

            I am not sure when that happens (IRC in MyRocks, normally a transaction would create/use a snapshot before it has written any data). Will check

            psergei Sergei Petrunia added a comment - 1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535 I am not sure when that happens (IRC in MyRocks, normally a transaction would create/use a snapshot before it has written any data). Will check
            psergei Sergei Petrunia made changes -
            Attachment screenshot-1.png [ 46411 ]
            psergei Sergei Petrunia made changes -
            Attachment screenshot-2.png [ 46412 ]

            I took the current patch (it uses locktree to do point locks, all locks are
            exclusive write locks under the hood, etc) and ran a benchmark.

            The benchmark compares the performance of the current locking system with the new locking system with varying number of client connections.

            sysbench ... --time=60 /usr/share/sysbench/oltp_write_only.lua  
            --table-size=1000000 --mysql_storage_engine=RocksDB --threads=$n run
            

            The results are:

            n_threads	current_locking_tps	new_locking_tps	new_to_current_ratio
            1	433.7	417.64	0.963
            2	585.28	553.67	0.946
            5	1358.33	1340.1	0.987
            10	2435.65	2423.49	0.995
            20	3968.21	3806.98	0.959
            40	5306.06	4975.17	0.938
            60	5913.78	5256.03	0.889
            80	6122.57	5607.66	0.916
            100	6280.9	5736.32	0.913
            120	6423.71	5631.45	0.877
            

            Plotting this

            Plotting the slowdown ratio

            psergei Sergei Petrunia added a comment - I took the current patch (it uses locktree to do point locks, all locks are exclusive write locks under the hood, etc) and ran a benchmark. The benchmark compares the performance of the current locking system with the new locking system with varying number of client connections. sysbench ... --time=60 /usr/share/sysbench/oltp_write_only.lua --table-size=1000000 --mysql_storage_engine=RocksDB --threads=$n run The results are: n_threads current_locking_tps new_locking_tps new_to_current_ratio 1 433.7 417.64 0.963 2 585.28 553.67 0.946 5 1358.33 1340.1 0.987 10 2435.65 2423.49 0.995 20 3968.21 3806.98 0.959 40 5306.06 4975.17 0.938 60 5913.78 5256.03 0.889 80 6122.57 5607.66 0.916 100 6280.9 5736.32 0.913 120 6423.71 5631.45 0.877 Plotting this Plotting the slowdown ratio

            So

            • The difference is clearly visible
            • New locking is slower, the difference is growing as the number of threads grows.
            • Maybe it's because it read locks are made write locks under the hood? (can be checked by forcing "old" locking to use write locks always)
            psergei Sergei Petrunia added a comment - So The difference is clearly visible New locking is slower, the difference is growing as the number of threads grows. Maybe it's because it read locks are made write locks under the hood? (can be checked by forcing "old" locking to use write locks always)
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            psergei Sergei Petrunia made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            psergei Sergei Petrunia added a comment - - edited

            pt-table-checksum works as follows:

            The table is broken into chunks. Then, for each chunk, the master computes the checksum like so:

            REPLACE INTO 
              percona.checksums(
                db, tbl, chunk, 
                chunk_index, lower_boundary, upper_boundary, 
                this_cnt, this_crc)
            SELECT 
              'test', 't10', '48', 
              'PRIMARY', '950358', '972636', -- boundaries
              COUNT(*) AS cnt,
              ... , --  here is a long expression to compute the row checksum
            FROM 
              test.t10 FORCE INDEX(PRIMARY)
            WHERE 
              ((pk >= '950358')) AND ((pk <= '972636')) /*checksum chunk*/
            

            This statement is replicated to the slave using SBR. That is, the slave will run it too, and compute the checksum of the data on the slave.

            Then, the master reads the checksum data:

            SELECT this_crc, this_cnt 
            FROM percona.checksums 
            WHERE db = 'test' AND tbl = 't10' AND chunk = '48';
            

            And saves it in master_crc column:

            UPDATE percona.checksums 
            SET 
              chunk_time = '0.455180', 
              master_crc = '691e28bc', 
              master_cnt = '22279' 
            WHERE 
              db = 'test' AND tbl = 't10' AND chunk = '48'
            

            This way, on the slave we will get

            • master_crc is CRC value from the master
            • this_cnt is CRC value computed locally.

            The need for Gap Locking comes from Statement replication of REPLACE INTO ... SELECT. When executed on the slave, it should read the same data as it did on the master. Fo that, execution of REPLACE INTO ... SELECT FROM t1 on the master must prevent any concurrent transaction from making modifications to t1 and committing.

            pt-table-checksum code also has "LOCK IN SHARE MODE" query inside but it does not seem to be used.

            psergei Sergei Petrunia added a comment - - edited pt-table-checksum works as follows: The table is broken into chunks. Then, for each chunk, the master computes the checksum like so: REPLACE INTO percona.checksums( db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test' , 't10' , '48' , 'PRIMARY' , '950358' , '972636' , -- boundaries COUNT (*) AS cnt, ... , -- here is a long expression to compute the row checksum FROM test.t10 FORCE INDEX ( PRIMARY ) WHERE ((pk >= '950358' )) AND ((pk <= '972636' )) /*checksum chunk*/ This statement is replicated to the slave using SBR. That is, the slave will run it too, and compute the checksum of the data on the slave. Then, the master reads the checksum data: SELECT this_crc, this_cnt FROM percona.checksums WHERE db = 'test' AND tbl = 't10' AND chunk = '48' ; And saves it in master_crc column: UPDATE percona.checksums SET chunk_time = '0.455180' , master_crc = '691e28bc' , master_cnt = '22279' WHERE db = 'test' AND tbl = 't10' AND chunk = '48' This way, on the slave we will get master_crc is CRC value from the master this_cnt is CRC value computed locally. The need for Gap Locking comes from Statement replication of REPLACE INTO ... SELECT . When executed on the slave, it should read the same data as it did on the master. Fo that, execution of REPLACE INTO ... SELECT FROM t1 on the master must prevent any concurrent transaction from making modifications to t1 and committing. pt-table-checksum code also has "LOCK IN SHARE MODE" query inside but it does not seem to be used.

            InnoDB's equivalent of Snapshot Checking

            InnoDB also uses multi-versioning and locking for intended writes. It doesn't do SnapshotChecking, so it faces a similar problem with overwriting the changes that were made after the transaction's snapshot was taken but before the lock was acquired:

            1. trx1> start; allocate a snapshot 
             
            2. trx2> update value for $ROW_KEY_1; commit;
             
            3. trx1> update value for $ROW_KEY_1;
             
            4. trx1> commit;
            

            InnoDB solves this by having DML statements to read the latest committed data, instead of the latest snapshot.

            This does look like a READ-COMMITTED isolation level:

            trx1> begin;
            trx1> select * from t1 where pk=3;
            +----+------+
            | pk | a    |
            +----+------+
            |  3 |    3 |
            +----+------+
            

            trx2> update t1 set a=33 where pk=3; -- autocommit=1 here
            

            Transaction trx1 is reading from the snapshot:

            trx1> select * from t1 where pk=3;
            +----+------+
            | pk | a    |
            +----+------+
            |  3 |    3 |
            +----+------+
            

            unless it's a FOR UPDATE (or DML) which will see the latest committed data:

            trx1> select * from t1 where pk=3 for update;
            +----+------+
            | pk | a    |
            +----+------+
            |  3 |   33 |
            +----+------+
            

            Regardless of that, further SELECTs will continue to read from the snapshot:

            trx1> select * from t1 where pk=3;
            +----+------+
            | pk | a    |
            +----+------+
            |  3 |    3 |
            +----+------+
            

            DML will operate on the latest committed data:

            trx1> update t1 set a=a+1 where pk=3;
            Query OK, 1 row affected (0.00 sec)
            Rows matched: 1  Changed: 1  Warnings: 0
             
            trx1> select * from t1 where pk=3;
            +----+------+
            | pk | a    |
            +----+------+
            |  3 |   34 |
            +----+------+
            

            This behavior "breaks" the promise of REPEATABLE-READ on the master, but in return, the statement will have the same effect when it is run on the slave.

            Use in Range Locking in MyRocks

            Range Locking mode in MyRocks can use this approach too:

            • DML statements and SELECT FOR UPDATE/LOCK IN SHARE MODE should read the latest committed data (this includes the unique key checks they do)
            • No Snapshot Checking is necessary.
            • Regular SELECTs should still read from the snapshot (This should happen even if the transaction is already holding a lock on the row. Even in this case, regular SELECT may return an out-of-date version of the row).
            psergei Sergei Petrunia added a comment - InnoDB's equivalent of Snapshot Checking InnoDB also uses multi-versioning and locking for intended writes. It doesn't do SnapshotChecking, so it faces a similar problem with overwriting the changes that were made after the transaction's snapshot was taken but before the lock was acquired: 1. trx1> start; allocate a snapshot 2. trx2> update value for $ROW_KEY_1; commit; 3. trx1> update value for $ROW_KEY_1; 4. trx1> commit; InnoDB solves this by having DML statements to read the latest committed data, instead of the latest snapshot. This does look like a READ-COMMITTED isolation level: trx1> begin; trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 3 | +----+------+ trx2> update t1 set a=33 where pk=3; -- autocommit=1 here Transaction trx1 is reading from the snapshot: trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 3 | +----+------+ unless it's a FOR UPDATE (or DML) which will see the latest committed data: trx1> select * from t1 where pk=3 for update; +----+------+ | pk | a | +----+------+ | 3 | 33 | +----+------+ Regardless of that, further SELECTs will continue to read from the snapshot: trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 3 | +----+------+ DML will operate on the latest committed data: trx1> update t1 set a=a+1 where pk=3; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0   trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 34 | +----+------+ This behavior "breaks" the promise of REPEATABLE-READ on the master, but in return, the statement will have the same effect when it is run on the slave. Use in Range Locking in MyRocks Range Locking mode in MyRocks can use this approach too: DML statements and SELECT FOR UPDATE/LOCK IN SHARE MODE should read the latest committed data (this includes the unique key checks they do) No Snapshot Checking is necessary. Regular SELECTs should still read from the snapshot (This should happen even if the transaction is already holding a lock on the row. Even in this case, regular SELECT may return an out-of-date version of the row).
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia added a comment - - edited

            Currently failing tests:

            rocksdb.rqg_transactions
            rocksdb.rocksdb_deadlock_stress_rc
            rocksdb.rocksdb_deadlock_stress_rr
            rocksdb.deadlock_stats
            rocksdb.compact_deletes
            rocksdb.rocksdb_deadlock_detect_rc
            rocksdb.deadlock
            rocksdb.deadlock_tracking
            rocksdb.gap_lock_raise_error
            rocksdb.i_s_deadlock
            rocksdb.rocksdb_deadlock_detect_rr

            rocksdb.rqg_transactions 'range_locking'

            • Assertion failure in toku::treenode::remove

            rocksdb.compact_deletes 'range_locking'

            • Timed out, it was just hangint with no user activity??

            rocksdb.rocksdb_deadlock_detect_rc 'range_locking'

            • Lock wait timeout error

            rocksdb.rocksdb_deadlock_stress_rc 'range_locking'
            rocksdb.rocksdb_deadlock_stress_rr 'range_locking'

            • Lock wait timeout error

            rocksdb.deadlock 'range_locking'

            • 900 sec. timeout, several threads waiting for a lock

            rocksdb.deadlock_stats 'range_locking'- "mysqltest got signal 6" - crash on the client??

            • still, the test seems to use deadlock detector.

            rocksdb.deadlock_tracking 'range_locking'

            • Lock wait timeout error.

            rocksdb.gap_lock_raise_error 'range_locking'

            • Lock wait timeout error.

            rocksdb.i_s_deadlock 'range_locking'

            • Lock wait timeout error.

            rocksdb.rocksdb_deadlock_detect_rr 'range_locking'

            • Lock wait timeout error.
            psergei Sergei Petrunia added a comment - - edited Currently failing tests: rocksdb.rqg_transactions rocksdb.rocksdb_deadlock_stress_rc rocksdb.rocksdb_deadlock_stress_rr rocksdb.deadlock_stats rocksdb.compact_deletes rocksdb.rocksdb_deadlock_detect_rc rocksdb.deadlock rocksdb.deadlock_tracking rocksdb.gap_lock_raise_error rocksdb.i_s_deadlock rocksdb.rocksdb_deadlock_detect_rr rocksdb.rqg_transactions 'range_locking' Assertion failure in toku::treenode::remove rocksdb.compact_deletes 'range_locking' Timed out, it was just hangint with no user activity?? rocksdb.rocksdb_deadlock_detect_rc 'range_locking' Lock wait timeout error rocksdb.rocksdb_deadlock_stress_rc 'range_locking' rocksdb.rocksdb_deadlock_stress_rr 'range_locking' Lock wait timeout error rocksdb.deadlock 'range_locking' 900 sec. timeout, several threads waiting for a lock rocksdb.deadlock_stats 'range_locking'- "mysqltest got signal 6" - crash on the client?? still, the test seems to use deadlock detector. rocksdb.deadlock_tracking 'range_locking' Lock wait timeout error. rocksdb.gap_lock_raise_error 'range_locking' Lock wait timeout error. rocksdb.i_s_deadlock 'range_locking' Lock wait timeout error. rocksdb.rocksdb_deadlock_detect_rr 'range_locking' Lock wait timeout error.
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -

            Currently, the tests pass.
            rocksdb testsuite now has three "combinations" - write_pareared, write_committed, and range_locking.
            Tests that assume point locking are disabled in 'range_locking' mode.
            There are also tests that target specifically range locking.

            psergei Sergei Petrunia added a comment - Currently, the tests pass. rocksdb testsuite now has three "combinations" - write_pareared, write_committed, and range_locking. Tests that assume point locking are disabled in 'range_locking' mode. There are also tests that target specifically range locking.

            Remaining issues:

            • Reduce transaction's list of acquired locks to reflect the actions of lock escalation.
            • Turn off snapshot validation.
            psergei Sergei Petrunia added a comment - Remaining issues: Reduce transaction's list of acquired locks to reflect the actions of lock escalation. Turn off snapshot validation.
            psergei Sergei Petrunia made changes -

            Now the above is done and there are no known Gap-Lock-related test failures in the rocksdb test suite.

            • Also did some code cleanup in preparation for a pull request to RocksDB, but more cleanups will be needed.
            psergei Sergei Petrunia added a comment - Now the above is done and there are no known Gap-Lock-related test failures in the rocksdb test suite. Also did some code cleanup in preparation for a pull request to RocksDB, but more cleanups will be needed.
            psergei Sergei Petrunia added a comment - - edited

            Also did a basic benchmark: ran sysbench oltp_read_write.lua for:

            • rocksdb_use_range_locking=1
            • rocksdb_use_range_locking=0
            • the original tree that range locking patch is currently based on.

            SYSBENCH_BASE_ARGS=" --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=root \
              --time=60 \
              /usr/share/sysbench/oltp_read_write.lua --table-size=1000000"
            SYSBENCH_CUR_ARGS="$SYSBENCH_BASE_ARGS --mysql_storage_engine=RocksDB"
            sysbench $SYSBENCH_CUR_ARGS prepare;
             
              for threads in 1 10 20 40 ; do
                SYSBENCH_ALL_ARGS="$SYSBENCH_CUR_ARGS --threads=$threads"
              done
            

            Results:

            rangelocking=ON 
            1 307.74
            10 1576.26
            20 1819.30 
            40 1640.48 
            

            rangelocking=OFF
            1 307.58
            10 1579.74
            20 1838.34
            40 1620.53
            

            rangelocking-orig
            1 306.23
            10 1565.10
            20 1811.46
            40 1611.57
            

            psergei Sergei Petrunia added a comment - - edited Also did a basic benchmark: ran sysbench oltp_read_write.lua for: rocksdb_use_range_locking=1 rocksdb_use_range_locking=0 the original tree that range locking patch is currently based on. SYSBENCH_BASE_ARGS=" --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=root \ --time=60 \ /usr/share/sysbench/oltp_read_write.lua --table-size=1000000" SYSBENCH_CUR_ARGS="$SYSBENCH_BASE_ARGS --mysql_storage_engine=RocksDB" sysbench $SYSBENCH_CUR_ARGS prepare;   for threads in 1 10 20 40 ; do SYSBENCH_ALL_ARGS="$SYSBENCH_CUR_ARGS --threads=$threads" done Results: rangelocking=ON 1 307.74 10 1576.26 20 1819.30 40 1640.48 rangelocking=OFF 1 307.58 10 1579.74 20 1838.34 40 1620.53 rangelocking-orig 1 306.23 10 1565.10 20 1811.46 40 1611.57
            psergei Sergei Petrunia made changes -
            Attachment screenshot-3.png [ 47176 ]

            In tabular form

            	rangelocking=ON	rangelocking=OFF	rangelocking-orig
            1	307.74	307.58	306.23
            10	1576.26	1579.74	1565.1
            20	1819.3	1838.34	1811.46
            40	1640.48	1620.53	1611.57
            

            psergei Sergei Petrunia added a comment - In tabular form rangelocking=ON rangelocking=OFF rangelocking-orig 1 307.74 307.58 306.23 10 1576.26 1579.74 1565.1 20 1819.3 1838.34 1811.46 40 1640.48 1620.53 1611.57
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia added a comment - The pull request is at https://github.com/facebook/rocksdb/pull/5041
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -

            Got a question about refreshing the iterator.

            Consider a query:

            update t1 set col1=col1+1000 where (pk between 3 and 7) or (pk between 10 and 15);
            

            Suppose the range locking is ON, the table has `PRIMARY KEY(pk)`, and the query is using the PK.

            It will do this:

              trx->get_range_lock([3; 7]);
              iter = trx->get_iterator(); // (1)
              // Use the iter to read the latest commited rows in the [3..7] range 
              // (2)
             
              trx->get_range_lock([10; 15]);  // (3)
            

            Now, the iterator we created at point (1) is reading the snapshot of data taken at that moment.

            We need to read the latest-committed (to be precise - we need to see everything that was committed into the 10..15 range before the get_range_lock call marked with (3) was run.

            We should call this:

              iter->Refresh();
            

            But for me the iterator is `rocksdb::BaseDeltaIterator`, which doesn't override Refresh(), so it uses rocksdb::Iterator::Refresh, which is this:

              virtual Status Refresh() {
                return Status::NotSupported("Refresh() is not supported");
              }
            

            Does this mean

            • The iterator I've got will return me the latest data (and NOT the "snapshot at the time the iterator was created, (1))
              or
            • The iterator I've got doesnt support Refresh() so I should destroy and re-create it?
            psergei Sergei Petrunia added a comment - Got a question about refreshing the iterator. Consider a query: update t1 set col1=col1+1000 where (pk between 3 and 7) or (pk between 10 and 15); Suppose the range locking is ON, the table has `PRIMARY KEY(pk)`, and the query is using the PK. It will do this: trx->get_range_lock([3; 7]); iter = trx->get_iterator(); // (1) // Use the iter to read the latest commited rows in the [3..7] range // (2)   trx->get_range_lock([10; 15]); // (3) Now, the iterator we created at point (1) is reading the snapshot of data taken at that moment. We need to read the latest-committed (to be precise - we need to see everything that was committed into the 10..15 range before the get_range_lock call marked with (3) was run. We should call this: iter->Refresh(); But for me the iterator is `rocksdb::BaseDeltaIterator`, which doesn't override Refresh(), so it uses rocksdb::Iterator::Refresh, which is this: virtual Status Refresh() { return Status::NotSupported( "Refresh() is not supported" ); } Does this mean The iterator I've got will return me the latest data (and NOT the "snapshot at the time the iterator was created, (1)) or The iterator I've got doesnt support Refresh() so I should destroy and re-create it?
            psergei Sergei Petrunia made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            An MTR testcase for iterator refresh:
            https://gist.github.com/spetrunia/7ead10923d40bf2d9baa960740733945

            Result of it:
            https://gist.github.com/spetrunia/915cdeeb033251a288ec88509bb04582#file-range-locking-iterator-refresh-result-sql-L22

            It shows that the iterator sees the row that has been deleted. When it attempts to read the row, we get the Got error 1 'NotFound: error.

            Now, let's remove the DELETE statement from the testcase:
            https://gist.github.com/spetrunia/ac3392e8279007eb15411872cbc43241
            the output: https://gist.github.com/spetrunia/33ce1b208109c8b0331fc54768de58ec

            30 5000

            The INSERT'ed row was not updated, so it was not visible to the iterator.

            For the updated rows, the result looks as if the iterator saw the latest?

            40 5100
            41 5100
            42 5100
            43 5100
            44 5100
            45 5100

            (or is this the result of extra GetForUpdate calls?)

            psergei Sergei Petrunia added a comment - An MTR testcase for iterator refresh: https://gist.github.com/spetrunia/7ead10923d40bf2d9baa960740733945 Result of it: https://gist.github.com/spetrunia/915cdeeb033251a288ec88509bb04582#file-range-locking-iterator-refresh-result-sql-L22 It shows that the iterator sees the row that has been deleted. When it attempts to read the row, we get the Got error 1 'NotFound: error. Now, let's remove the DELETE statement from the testcase: https://gist.github.com/spetrunia/ac3392e8279007eb15411872cbc43241 the output: https://gist.github.com/spetrunia/33ce1b208109c8b0331fc54768de58ec 30 5000 The INSERT'ed row was not updated, so it was not visible to the iterator. For the updated rows, the result looks as if the iterator saw the latest? 40 5100 41 5100 42 5100 43 5100 44 5100 45 5100 (or is this the result of extra GetForUpdate calls?)

            Ok,

            • the iterator obtained from TransactionDB->NewIterator() has a non-trivial Refresh implementation, ArenaWrappedDBIter::Refresh().
            • the iterator obtained from Transaction->GetIterator() doesn't support refresh. It's a BaseDeltaIterator. It has base_iterator_= ArenaWrappedDBIter, delta_iterator_=WBWIIteratorImpl.
            psergei Sergei Petrunia added a comment - Ok, the iterator obtained from TransactionDB->NewIterator() has a non-trivial Refresh implementation, ArenaWrappedDBIter::Refresh(). the iterator obtained from Transaction->GetIterator() doesn't support refresh. It's a BaseDeltaIterator. It has base_iterator_= ArenaWrappedDBIter, delta_iterator_=WBWIIteratorImpl.
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 86097 ] MariaDB v4 [ 131685 ]
            AirFocus AirFocus made changes -
            Description (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            1. Data structures
            1.1 A Global Lock Tree Manager object
            1.2 A separate Lock Tree for each table
            1.3 Each transaction keeps a track of ranges it is holding locks
            2. Functions
            2.1 Initializing the Lock Manager
            2.2 Create Lock Tree for a table
            2.3 Getting a lock
            2.4 Releasing a lock.
            2.5 Releasing all of the transaction's locks

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,
            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table
            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so
            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:
            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager
            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:
            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:
            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}
            after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:
            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to
            * wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:
            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql\-5.6/issues/800 )

            Notes about how to use PerconaFT:

            1. Data structures
            1.1 A Global Lock Tree Manager object
            1.2 A separate Lock Tree for each table
            1.3 Each transaction keeps a track of ranges it is holding locks
            2. Functions
            2.1 Initializing the Lock Manager
            2.2 Create Lock Tree for a table
            2.3 Getting a lock
            2.4 Releasing a lock.
            2.5 Releasing all of the transaction's locks

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb\-internal.h,

            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table

            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so

            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} \- this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager

            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table\_share. It is done like so:

            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:

            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}

            after the last release\_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:

            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to

            * wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:

            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            Summary Gap Lock support in MyRocks Gap Lock support in MyRocks
            julien.fritsch Julien Fritsch made changes -
            Description (The upstream task is: https://github.com/facebook/mysql\-5.6/issues/800 )

            Notes about how to use PerconaFT:

            1. Data structures
            1.1 A Global Lock Tree Manager object
            1.2 A separate Lock Tree for each table
            1.3 Each transaction keeps a track of ranges it is holding locks
            2. Functions
            2.1 Initializing the Lock Manager
            2.2 Create Lock Tree for a table
            2.3 Getting a lock
            2.4 Releasing a lock.
            2.5 Releasing all of the transaction's locks

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb\-internal.h,

            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table

            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so

            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} \- this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager

            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table\_share. It is done like so:

            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:

            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}

            after the last release\_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:

            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to

            * wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:

            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}
            (The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

            Notes about how to use PerconaFT:

            1. Data structures
            1.1 A Global Lock Tree Manager object
            1.2 A separate Lock Tree for each table
            1.3 Each transaction keeps a track of ranges it is holding locks
            2. Functions
            2.1 Initializing the Lock Manager
            2.2 Create Lock Tree for a table
            2.3 Getting a lock
            2.4 Releasing a lock.
            2.5 Releasing all of the transaction's locks

            h2. 1. Data structures

            h3. 1.1 A Global Lock Tree Manager object

            There needs to be a global {{locktree_manager}}.

            See PerconaFT/src/ydb-internal.h,

            {noformat}
              struct __toku_db_env_internal {
                toku::locktree_manager ltm;
            {noformat}

            h3. 1.2 A separate Lock Tree for each table

            TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

            h3.1.3 Each transaction keeps a track of ranges it is holding locks

            Each transaction has a list of ranges that it is holding locks on. It is referred to like so

            {code:cpp}
              db_txn_struct_i(txn)->lt_map
            {code}

            and is stored in this structure, together with a mutex to protect it:

            {code:cpp}
              struct __toku_db_txn_internal {
                  // maps a locktree to a buffer of key ranges that are locked.
                  // it is protected by the txn_mutex, so hot indexing and a client
                  // thread can concurrently operate on this txn.
                  toku::omt<txn_lt_key_ranges> lt_map;
                  toku_mutex_t txn_mutex;
            {code}

            The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
            (See toku_txn_destroy for how to free this)

            h2. 2. Functions

            Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} \- this is TokuDB's layer above the Lock Tree.

            h3. 2.1 Initializing the Lock Manager

            TODO

            h3. 2.2 Create Lock Tree for a table

            TokuDB does it when it opens a table's table_share. It is done like so:

            {code:cpp}
                    db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                                         toku_ft_get_comparator(db->i->ft_handle),
                                                         &on_create_extra);
            {code}

            Then, one needs to release it:

            {code:cpp}
            db->dbenv->i->ltm.release_lt(db->i->lt);
            {code}

            after the last release\_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

            (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

            h3. 2.3 Getting a lock

            This function has an example:

            {code:cpp}
            // Get a range lock.
            // Return when the range lock is acquired or the default lock tree timeout has expired.
            int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
                    toku::lock_request::type lock_type) {
            {code}

            It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

            Point locks are obtained by passing the same key as left_key and right_key.

            h3. 2.4 Releasing a lock.

            TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

            LockTree has a function to release locks from a specified range:

            {code:cpp}
            locktree::release_locks(TXNID txnid, const range_buffer *ranges)
            {code}

            Besides calling that, one will need to

            * wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
            * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

            h3. 2.5 Releasing all of the transaction's locks

            See {{PerconaFT/src/ydb_txn.cc}}:

            {code:cpp}
            static void toku_txn_release_locks(DB_TXN *txn) {
                // Prevent access to the locktree map while releasing.
                // It is possible for lock escalation to attempt to
                // modify this data structure while the txn commits.
                toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

                size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
                for (size_t i = 0; i < num_ranges; i++) {
                    txn_lt_key_ranges ranges;
                    int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
                    invariant_zero(r);
                    toku_db_release_lt_key_ranges(txn, &ranges);
                }

                toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
            }
            {code}

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.