[MDEV-15603] Gap Lock support in MyRocks - Jira

Sergei Petrunia created issue - 2018-03-20 08:25

Sergei Petrunia added a comment - 2018-03-23 07:55

TokuDB's lock tree is here: storage/tokudb/PerconaFT/locktree. They lock
ranges.

(gdb) wher

  #0  toku::locktree::sto_try_acquire (this=0x7fff700342c0, prepared_lkr=0x7fffd4b6c390, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:291

  #1  0x00007ffff4d6eaa1 in toku::locktree::acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:380

  #2  0x00007ffff4d6eb73 in toku::locktree::try_acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:399

  #3  0x00007ffff4d6ec1a in toku::locktree::acquire_write_lock (this=0x7fff700342c0, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:412

  #4  0x00007ffff4d72dc4 in toku::lock_request::start (this=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/lock_request.cc:165

  #5  0x00007ffff4d603aa in toku_db_start_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE, request=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:211

  #6  0x00007ffff4d6022e in toku_db_get_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:182

  #7  0x00007ffff4e31643 in c_set_bounds (dbc=0x7fff7005f000, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, pre_acquire=true, out_of_range_error=-30989) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_cursor.cc:714

  #8  0x00007ffff4d195df in ha_tokudb::prelock_range (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:5978

  #9  0x00007ffff4d19a31 in ha_tokudb::read_range_first (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00, eq_range=false, sorted=true) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:6025

  #10 0x0000555555d761dc in handler::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:291

  #11 0x0000555555d763be in Mrr_simple_index_reader::get_next (this=0x7fff7002d3d8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:323

  #12 0x0000555555d7901a in DsMrr_impl::dsmrr_next (this=0x7fff7002d298, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:1399

  #13 0x00007ffff4d30b56 in ha_tokudb::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb_mrr_maria.cc:42

  #14 0x000055555601f3a2 in QUICK_RANGE_SELECT::get_next (this=0x7fff7002f800) at /home/psergey/dev-git/10.3-r2/sql/opt_range.cc:11454

  #15 0x0000555556030e64 in rr_quick (info=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.cc:366

  #16 0x0000555555b3b03b in READ_RECORD::read_record (this=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.h:73

  #17 0x0000555555c3e4a4 in join_init_read_record (tab=0x7fff700161e8) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:20227

  #18 0x0000555555c3c256 in sub_select (join=0x7fff700145b0, join_tab=0x7fff700161e8, end_of_records=false) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:19301

  #19 0x0000555555c3b821 in do_select (join=0x7fff700145b0, procedure=0x0) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:18844

Sergei Petrunia added a comment - 2018-03-23 07:55 TokuDB's lock tree is here: storage/tokudb/PerconaFT/locktree. They lock ranges. (gdb) wher #0 toku::locktree::sto_try_acquire (this=0x7fff700342c0, prepared_lkr=0x7fffd4b6c390, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:291 #1 0x00007ffff4d6eaa1 in toku::locktree::acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:380 #2 0x00007ffff4d6eb73 in toku::locktree::try_acquire_lock (this=0x7fff700342c0, is_write_request=true, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:399 #3 0x00007ffff4d6ec1a in toku::locktree::acquire_write_lock (this=0x7fff700342c0, txnid=11, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, conflicts=0x7fffd4b6c4c0, big_txn=false) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/locktree.cc:412 #4 0x00007ffff4d72dc4 in toku::lock_request::start (this=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/locktree/lock_request.cc:165 #5 0x00007ffff4d603aa in toku_db_start_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE, request=0x7fffd4b6c5b0) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:211 #6 0x00007ffff4d6022e in toku_db_get_range_lock (db=0x7fff700271e0, txn=0x7fff70060600, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, lock_type=toku::lock_request::WRITE) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:182 #7 0x00007ffff4e31643 in c_set_bounds (dbc=0x7fff7005f000, left_key=0x7fffd4b6c750, right_key=0x7fffd4b6c770, pre_acquire=true, out_of_range_error=-30989) at /home/psergey/dev-git/10.3-r2/storage/tokudb/PerconaFT/src/ydb_cursor.cc:714 #8 0x00007ffff4d195df in ha_tokudb::prelock_range (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:5978 #9 0x00007ffff4d19a31 in ha_tokudb::read_range_first (this=0x7fff7002cdf8, start_key=0x7fff7002cee0, end_key=0x7fff7002cf00, eq_range=false, sorted=true) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb.cc:6025 #10 0x0000555555d761dc in handler::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:291 #11 0x0000555555d763be in Mrr_simple_index_reader::get_next (this=0x7fff7002d3d8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:323 #12 0x0000555555d7901a in DsMrr_impl::dsmrr_next (this=0x7fff7002d298, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/sql/multi_range_read.cc:1399 #13 0x00007ffff4d30b56 in ha_tokudb::multi_range_read_next (this=0x7fff7002cdf8, range_info=0x7fffd4b6c950) at /home/psergey/dev-git/10.3-r2/storage/tokudb/ha_tokudb_mrr_maria.cc:42 #14 0x000055555601f3a2 in QUICK_RANGE_SELECT::get_next (this=0x7fff7002f800) at /home/psergey/dev-git/10.3-r2/sql/opt_range.cc:11454 #15 0x0000555556030e64 in rr_quick (info=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.cc:366 #16 0x0000555555b3b03b in READ_RECORD::read_record (this=0x7fff700162b0) at /home/psergey/dev-git/10.3-r2/sql/records.h:73 #17 0x0000555555c3e4a4 in join_init_read_record (tab=0x7fff700161e8) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:20227 #18 0x0000555555c3c256 in sub_select (join=0x7fff700145b0, join_tab=0x7fff700161e8, end_of_records=false) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:19301 #19 0x0000555555c3b821 in do_select (join=0x7fff700145b0, procedure=0x0) at /home/psergey/dev-git/10.3-r2/sql/sql_select.cc:18844

Sergei Petrunia added a comment - 2018-07-24 11:11

Data collected so far:
https://gist.github.com/spetrunia/c75b34d70aaea3b927e478557ff89ab5

Sergei Petrunia added a comment - 2018-07-24 11:11 Data collected so far: https://gist.github.com/spetrunia/c75b34d70aaea3b927e478557ff89ab5

Sergei Petrunia made changes - 2018-08-20 18:28

Field	Original Value	New Value
Description	This task is for tracking https://github.com/facebook/mysql-5.6/issues/800	This task is for tracking https://github.com/facebook/mysql-5.6/issues/800. == Data structures == ==== A Global Lock Tree Manager === PerconaFT/src/ydb-internal.h struct __toku_db_env_internal { toku::locktree_manager ltm; ==== Each table has its own Lock Tree === See db->i->lt. ==== Each transaction has ranges it holds lock on == Each transaction has a list of ranges that it is holding locks on: db_txn_struct_i(txn)->lt_map struct __toku_db_txn_internal { // maps a locktree to a buffer of key ranges that are locked. // it is protected by the txn_mutex, so hot indexing and a client // thread can concurrently operate on this txn. toku::omt<txn_lt_key_ranges> lt_map; and a mutex to protect it: toku_mutex_t txn_mutex; init: db_txn_struct_i(result)->lt_map.create_no_array(); (or create()?)

Sergei Petrunia made changes - 2018-08-20 18:30

Description

This task is for tracking https://github.com/facebook/mysql-5.6/issues/800.

== Data structures ==

==== A Global Lock Tree Manager ===
PerconaFT/src/ydb-internal.h

  struct __toku_db_env_internal {
    toku::locktree_manager ltm;

==== Each table has its own Lock Tree ===
See db->i->lt.

==== Each transaction has ranges it holds lock on ==

Each transaction has a list of ranges that it is holding locks on:

  db_txn_struct_i(txn)->lt_map

  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;

  and a mutex to protect it:
    toku_mutex_t txn_mutex;

  init:
    db_txn_struct_i(result)->lt_map.create_no_array();
    (or create()?)

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. Data structures

h3. A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. A separate Lock Tree for each table
See {{db->i->lt}}

==== Each transaction has ranges it holds lock on ==

Each transaction has a list of ranges that it is holding locks on:

  db_txn_struct_i(txn)->lt_map

  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;

  and a mutex to protect it:
    toku_mutex_t txn_mutex;

  init:
    db_txn_struct_i(result)->lt_map.create_no_array();
    (or create()?)

Sergei Petrunia made changes - 2018-08-20 18:33

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. Data structures

h3. A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. A separate Lock Tree for each table
See {{db->i->lt}}

==== Each transaction has ranges it holds lock on ==

Each transaction has a list of ranges that it is holding locks on:

  db_txn_struct_i(txn)->lt_map

  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;

  and a mutex to protect it:
    toku_mutex_t txn_mutex;

  init:
    db_txn_struct_i(result)->lt_map.create_no_array();
    (or create()?)

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3 .1.3 Each transaction keeps a track of ranges it holds lock on ==

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

{code}
  and a mutex to protect it:
    toku_mutex_t txn_mutex;
{code}

Sergei Petrunia made changes - 2018-08-20 18:33

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3 .1.3 Each transaction keeps a track of ranges it holds lock on ==

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

{code}
  and a mutex to protect it:
    toku_mutex_t txn_mutex;
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3 .1.3 Each transaction keeps a track of ranges it holds lock on

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

Sergei Petrunia made changes - 2018-08-20 18:34

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3 .1.3 Each transaction keeps a track of ranges it holds lock on

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

Sergei Petrunia made changes - 2018-08-20 18:39

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer
above the Lock Tree.

h3. 2.1 Initialize the Lock Manager

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

h3. 2.3 Getting a write lock

h3. 2.4 Getting a read lock

Sergei Petrunia made changes - 2018-08-20 18:56

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer
above the Lock Tree.

h3. 2.1 Initialize the Lock Manager

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

h3. 2.3 Getting a write lock

h3. 2.4 Getting a read lock

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

h3. 2.4 Releasing a lock.

Sergei Petrunia made changes - 2018-08-20 19:00

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on:
{code:cpp}
  db_txn_struct_i(txn)->lt_map

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
{code}

and a mutex to protect it:
{code}
    toku_mutex_t txn_mutex;
{code}
Note that lock escalation process may modify this list (?).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

h3. 2.4 Releasing a lock.

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

h3. 2.4 Releasing a lock.

Sergei Petrunia made changes - 2018-08-20 19:07

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

h3. 2.4 Releasing a lock.

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

h3. 2.4 Releasing a lock.

Sergei Petrunia made changes - 2018-08-20 19:09

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

h3. 2.4 Releasing a lock.

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

Sergei Petrunia made changes - 2018-08-20 19:13

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia made changes - 2018-08-20 19:16

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}
h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia made changes - 2018-08-20 19:23

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}
h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}
h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia made changes - 2018-08-20 19:28

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}
h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initialize the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

h3. 2.5 Releasing all locks.

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

h3. 2.5 Releasing all of transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia made changes - 2018-08-20 19:33

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when transaction is finished)

h3. 2.5 Releasing all of transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished)

h3. 2.5 Releasing all of transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia made changes - 2018-08-20 19:35

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Transaction will also need to remove them from the list of locks it is holding (note: this is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished)

h3. 2.5 Releasing all of transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Besides calling that, one will need to
* wake up all waiting lock requests (Yes. that function will not do that. A
* Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

h3. 2.5 Releasing all of the transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia made changes - 2018-08-20 19:39

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Besides calling that, one will need to
* wake up all waiting lock requests (Yes. that function will not do that. A
* Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

h3. 2.5 Releasing all of the transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Besides calling that, one will need to
* wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
* Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

h3. 2.5 Releasing all of the transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia made changes - 2018-08-20 20:51

Description

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Besides calling that, one will need to
* wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
* Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

h3. 2.5 Releasing all of the transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

1. Data structures
1.1 A Global Lock Tree Manager object
1.2 A separate Lock Tree for each table
1.3 Each transaction keeps a track of ranges it is holding locks
2. Functions
2.1 Initializing the Lock Manager
2.2 Create Lock Tree for a table
2.3 Getting a lock
2.4 Releasing a lock.
2.5 Releasing all of the transaction's locks

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,
{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so
{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:
{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager
TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:
{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:
{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:
{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Besides calling that, one will need to
* wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
* Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

h3. 2.5 Releasing all of the transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:
{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

Sergei Petrunia added a comment - 2018-08-21 09:57

The MDEV text now has a description of how to use the range locker from TokuDB.

Other input: there is a big concern about regressions wrt the current way of doing locking. Most likely, we will need to support both current locking mode (where gap locking is not available for any transaction) and the range locking mode (where some transactions may take range locks in some circumstances. Others take row locks. Both kinds of locks inhibit each other).

Sergei Petrunia added a comment - 2018-08-21 09:57 The MDEV text now has a description of how to use the range locker from TokuDB. Other input: there is a big concern about regressions wrt the current way of doing locking. Most likely, we will need to support both current locking mode (where gap locking is not available for any transaction) and the range locking mode (where some transactions may take range locks in some circumstances. Others take row locks. Both kinds of locks inhibit each other).

Sergei Petrunia added a comment - 2018-09-03 18:09 - edited

Current locking code does "Snapshot Checking" (See PessimisticTransaction::ValidateSnapshot):

When acquiring a point lock on $ROW_KEY, a transaction will check whether there were any changes made to $ROW_KEY after the transaction's snapshot was taken.

This apparently cannot be efficiently done for range locks.

But it seems to be also unnecessary. Here's why:

Snapshot checking (ValidateSnapshot) is needed to prevent situations like this:

trx1> start; allocate a snapshot

trx2> update value for $ROW_KEY_1; commit;

trx1> update value for $ROW_KEY_1;   -- note that we are using a snapshot and

                                     -- dont see trx2's changes

trx1> commit; -- this overwrites changes by trx2.

That is, this is an "optimistic-like" method to make sure that transaction's snapshot has not been "made obsolete" by some other transaction.

With Range Locking,

We can't have "ValidateSnapshot for ranges"
but we place locks on all records we read.

Range locks would not prevent the above scenario between trx1 and trx2, as trx2 updates $ROW_KEY_1 before trx1 attempts to read it.

However, when transactions use locking, we can assume that trx1 "happened after" trx2 has committed. (The only thing that would prevent this assumption would be that trx1 has read a value that trx2 is modifying. But in that case, trx1 would have held a read lock that would have prevented trx2 from making the modification).

The only issue here is that trx1 must not use a snapshot that was created before trx2 has committed.

To sum up: RangeLockingForReads

Does not need to use ValidateSnapshot
But must not use the snapshot from the beginning of the transaction. (That is, if we are reading data using snapshot S, then S must have been acquired
after we have obtained a lock covering the rowkey we are reading. This is our guarantee that nobody has sneaked in an update).

If we are holding all locks for the duration of the transaction, there is no problem with reading inconsistent data (the data will be the same as if we've used the snapshot made after the most-recently-modified row we've read)

Sergei Petrunia added a comment - 2018-09-03 18:09 - edited Current locking code does "Snapshot Checking" (See PessimisticTransaction::ValidateSnapshot): When acquiring a point lock on $ROW_KEY, a transaction will check whether there were any changes made to $ROW_KEY after the transaction's snapshot was taken. This apparently cannot be efficiently done for range locks. But it seems to be also unnecessary. Here's why: Snapshot checking (ValidateSnapshot) is needed to prevent situations like this: trx1> start; allocate a snapshot trx2> update value for $ROW_KEY_1; commit; trx1> update value for $ROW_KEY_1; -- note that we are using a snapshot and -- dont see trx2's changes trx1> commit; -- this overwrites changes by trx2. That is, this is an "optimistic-like" method to make sure that transaction's snapshot has not been "made obsolete" by some other transaction. With Range Locking, We can't have "ValidateSnapshot for ranges" but we place locks on all records we read. Range locks would not prevent the above scenario between trx1 and trx2, as trx2 updates $ROW_KEY_1 before trx1 attempts to read it. However, when transactions use locking, we can assume that trx1 "happened after" trx2 has committed. (The only thing that would prevent this assumption would be that trx1 has read a value that trx2 is modifying. But in that case, trx1 would have held a read lock that would have prevented trx2 from making the modification). The only issue here is that trx1 must not use a snapshot that was created before trx2 has committed. To sum up: RangeLockingForReads Does not need to use ValidateSnapshot But must not use the snapshot from the beginning of the transaction. (That is, if we are reading data using snapshot S, then S must have been acquired after we have obtained a lock covering the rowkey we are reading. This is our guarantee that nobody has sneaked in an update). If we are holding all locks for the duration of the transaction, there is no problem with reading inconsistent data (the data will be the same as if we've used the snapshot made after the most-recently-modified row we've read)

Maysam Yabandeh added a comment - 2018-09-07 22:33

1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535
2. MyRocks does allow transactions to explicitly take a snapshot at the very beginning, before any reads start. What happens to those cases?

Maysam Yabandeh added a comment - 2018-09-07 22:33 1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535 2. MyRocks does allow transactions to explicitly take a snapshot at the very beginning, before any reads start. What happens to those cases?

Maysam Yabandeh added a comment - 2018-09-07 22:58

If the transaction has already taken a snapshot at the beginning, perhaps we can get the implementation to guarantee that it would never call ::Get before RangeLockingForReads, and then upgrade the snapshot after the last call to RangeLockingForReads. This would be as we have delayed the transaction's request to take the snapshot.

The problem with this approach would be losing linearlizability: If for the two transactions, the client make connections between their input/output outside the sql engine, then it might get inconsistent results as we did not actually take snapshot at the wall-clock-time that we confirmed the client that we did. For example in this sequence of events running "from the same client session":

K1=V1

txn B starts

txn B take snapshot

txn A writes VA to K1

txn A commits

txn B reads K1

The client expects Txn B to read V1 but we return VA. I think it should be fine since our supported isolation level is not linearizable anyway (it is not even serializable).

Maysam Yabandeh added a comment - 2018-09-07 22:58 If the transaction has already taken a snapshot at the beginning, perhaps we can get the implementation to guarantee that it would never call ::Get before RangeLockingForReads, and then upgrade the snapshot after the last call to RangeLockingForReads. This would be as we have delayed the transaction's request to take the snapshot. The problem with this approach would be losing linearlizability: If for the two transactions, the client make connections between their input/output outside the sql engine, then it might get inconsistent results as we did not actually take snapshot at the wall-clock-time that we confirmed the client that we did. For example in this sequence of events running "from the same client session": K1=V1 txn B starts txn B take snapshot txn A writes VA to K1 txn A commits txn B reads K1 The client expects Txn B to read V1 but we return VA. I think it should be fine since our supported isolation level is not linearizable anyway (it is not even serializable).

Sergei Petrunia added a comment - 2018-09-10 20:29

I've put up a tree here: https://github.com/spetrunia/mysql-5.6/tree/range-locking-fb-mysql-5.6.35

Current status:

MyRocks has a read-only global variable @@rocksdb_use_range_locking which one can set in my.cnf
In addition to class TransactionLockMgr, RocksDB (a modified copy of it) includes another class which uses PerconaFT' locktree to provide locks.
Currently, it only does point, write-only locks.
The state is: it compiled, it worked for a basic example. Lots of details are still missing and in particular, the APIs are not final.

Sergei Petrunia added a comment - 2018-09-10 20:29 I've put up a tree here: https://github.com/spetrunia/mysql-5.6/tree/range-locking-fb-mysql-5.6.35 Current status: MyRocks has a read-only global variable @@rocksdb_use_range_locking which one can set in my.cnf In addition to class TransactionLockMgr, RocksDB (a modified copy of it) includes another class which uses PerconaFT' locktree to provide locks. Currently, it only does point, write-only locks. The state is: it compiled, it worked for a basic example. Lots of details are still missing and in particular, the APIs are not final.

Sergei Petrunia added a comment - 2018-09-10 20:31

1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535

I am not sure when that happens (IRC in MyRocks, normally a transaction would create/use a snapshot before it has written any data). Will check

Sergei Petrunia added a comment - 2018-09-10 20:31 1. If there is no snapshot taken before acquiring the lock, then even the existing code would not call ValidateSnapshot: https://github.com/facebook/rocksdb/blob/ea212e531696cab9cc8c2c3da49119b7888402ef/utilities/transactions/pessimistic_transaction.cc#L535 I am not sure when that happens (IRC in MyRocks, normally a transaction would create/use a snapshot before it has written any data). Will check

Sergei Petrunia made changes - 2018-09-12 11:47

Attachment

screenshot-1.png [ 46411 ]

Sergei Petrunia made changes - 2018-09-12 11:49

Attachment

screenshot-2.png [ 46412 ]

Sergei Petrunia added a comment - 2018-09-12 11:50

I took the current patch (it uses locktree to do point locks, all locks are
exclusive write locks under the hood, etc) and ran a benchmark.

The benchmark compares the performance of the current locking system with the new locking system with varying number of client connections.

sysbench ... --time=60 /usr/share/sysbench/oltp_write_only.lua

--table-size=1000000 --mysql_storage_engine=RocksDB --threads=$n run

The results are:

n_threads current_locking_tps new_locking_tps new_to_current_ratio
1 433.7 417.64 0.963
2 585.28 553.67 0.946
5 1358.33 1340.1 0.987
10 2435.65 2423.49 0.995
20 3968.21 3806.98 0.959
40 5306.06 4975.17 0.938
60 5913.78 5256.03 0.889
80 6122.57 5607.66 0.916
100 6280.9 5736.32 0.913
120 6423.71 5631.45 0.877

Plotting this

Plotting the slowdown ratio

Sergei Petrunia added a comment - 2018-09-12 11:50 I took the current patch (it uses locktree to do point locks, all locks are exclusive write locks under the hood, etc) and ran a benchmark. The benchmark compares the performance of the current locking system with the new locking system with varying number of client connections. sysbench ... --time=60 /usr/share/sysbench/oltp_write_only.lua --table-size=1000000 --mysql_storage_engine=RocksDB --threads=$n run The results are: n_threads current_locking_tps new_locking_tps new_to_current_ratio 1 433.7 417.64 0.963 2 585.28 553.67 0.946 5 1358.33 1340.1 0.987 10 2435.65 2423.49 0.995 20 3968.21 3806.98 0.959 40 5306.06 4975.17 0.938 60 5913.78 5256.03 0.889 80 6122.57 5607.66 0.916 100 6280.9 5736.32 0.913 120 6423.71 5631.45 0.877 Plotting this Plotting the slowdown ratio

Sergei Petrunia added a comment - 2018-09-12 11:56

So

The difference is clearly visible
New locking is slower, the difference is growing as the number of threads grows.
Maybe it's because it read locks are made write locks under the hood? (can be checked by forcing "old" locking to use write locks always)

Sergei Petrunia added a comment - 2018-09-12 11:56 So The difference is clearly visible New locking is slower, the difference is growing as the number of threads grows. Maybe it's because it read locks are made write locks under the hood? (can be checked by forcing "old" locking to use write locks always)

Sergei Petrunia made changes - 2018-09-20 08:50

Status

Open [ 1 ]

Confirmed [ 10101 ]

Sergei Petrunia made changes - 2018-09-20 08:50

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Sergei Petrunia added a comment - 2018-09-20 08:54 - edited

pt-table-checksum works as follows:

The table is broken into chunks. Then, for each chunk, the master computes the checksum like so:

REPLACE INTO

  percona.checksums(

    db, tbl, chunk,

    chunk_index, lower_boundary, upper_boundary,

    this_cnt, this_crc)

SELECT

  'test', 't10', '48',

  'PRIMARY', '950358', '972636', -- boundaries

  COUNT(*) AS cnt,

  ... , --  here is a long expression to compute the row checksum

FROM

  test.t10 FORCE INDEX(PRIMARY)

WHERE

  ((pk >= '950358')) AND ((pk <= '972636')) /*checksum chunk*/

This statement is replicated to the slave using SBR. That is, the slave will run it too, and compute the checksum of the data on the slave.

Then, the master reads the checksum data:

SELECT this_crc, this_cnt

FROM percona.checksums

WHERE db = 'test' AND tbl = 't10' AND chunk = '48';

And saves it in master_crc column:

UPDATE percona.checksums

SET

  chunk_time = '0.455180',

  master_crc = '691e28bc',

  master_cnt = '22279'

WHERE

  db = 'test' AND tbl = 't10' AND chunk = '48'

This way, on the slave we will get

master_crc is CRC value from the master
this_cnt is CRC value computed locally.

The need for Gap Locking comes from Statement replication of REPLACE INTO ... SELECT. When executed on the slave, it should read the same data as it did on the master. Fo that, execution of REPLACE INTO ... SELECT FROM t1 on the master must prevent any concurrent transaction from making modifications to t1 and committing.

pt-table-checksum code also has "LOCK IN SHARE MODE" query inside but it does not seem to be used.

Sergei Petrunia added a comment - 2018-09-20 08:54 - edited pt-table-checksum works as follows: The table is broken into chunks. Then, for each chunk, the master computes the checksum like so: REPLACE INTO percona.checksums( db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'test' , 't10' , '48' , 'PRIMARY' , '950358' , '972636' , -- boundaries COUNT (*) AS cnt, ... , -- here is a long expression to compute the row checksum FROM test.t10 FORCE INDEX ( PRIMARY ) WHERE ((pk >= '950358' )) AND ((pk <= '972636' )) /*checksum chunk*/ This statement is replicated to the slave using SBR. That is, the slave will run it too, and compute the checksum of the data on the slave. Then, the master reads the checksum data: SELECT this_crc, this_cnt FROM percona.checksums WHERE db = 'test' AND tbl = 't10' AND chunk = '48' ; And saves it in master_crc column: UPDATE percona.checksums SET chunk_time = '0.455180' , master_crc = '691e28bc' , master_cnt = '22279' WHERE db = 'test' AND tbl = 't10' AND chunk = '48' This way, on the slave we will get master_crc is CRC value from the master this_cnt is CRC value computed locally. The need for Gap Locking comes from Statement replication of REPLACE INTO ... SELECT . When executed on the slave, it should read the same data as it did on the master. Fo that, execution of REPLACE INTO ... SELECT FROM t1 on the master must prevent any concurrent transaction from making modifications to t1 and committing. pt-table-checksum code also has "LOCK IN SHARE MODE" query inside but it does not seem to be used.

Sergei Petrunia added a comment - 2018-10-15 20:21

InnoDB's equivalent of Snapshot Checking

InnoDB also uses multi-versioning and locking for intended writes. It doesn't do SnapshotChecking, so it faces a similar problem with overwriting the changes that were made after the transaction's snapshot was taken but before the lock was acquired:

1. trx1> start; allocate a snapshot

2. trx2> update value for $ROW_KEY_1; commit;

3. trx1> update value for $ROW_KEY_1;

4. trx1> commit;

InnoDB solves this by having DML statements to read the latest committed data, instead of the latest snapshot.

This does look like a READ-COMMITTED isolation level:

trx1> begin;

trx1> select * from t1 where pk=3;

+----+------+

| pk | a    |

+----+------+

|  3 |    3 |

+----+------+

trx2> update t1 set a=33 where pk=3; -- autocommit=1 here

Transaction trx1 is reading from the snapshot:

trx1> select * from t1 where pk=3;

+----+------+

| pk | a    |

+----+------+

|  3 |    3 |

+----+------+

unless it's a FOR UPDATE (or DML) which will see the latest committed data:

trx1> select * from t1 where pk=3 for update;

+----+------+

| pk | a    |

+----+------+

|  3 |   33 |

+----+------+

Regardless of that, further SELECTs will continue to read from the snapshot:

trx1> select * from t1 where pk=3;

+----+------+

| pk | a    |

+----+------+

|  3 |    3 |

+----+------+

DML will operate on the latest committed data:

trx1> update t1 set a=a+1 where pk=3;

Query OK, 1 row affected (0.00 sec)

Rows matched: 1  Changed: 1  Warnings: 0

trx1> select * from t1 where pk=3;

+----+------+

| pk | a    |

+----+------+

|  3 |   34 |

+----+------+

This behavior "breaks" the promise of REPEATABLE-READ on the master, but in return, the statement will have the same effect when it is run on the slave.

Use in Range Locking in MyRocks

Range Locking mode in MyRocks can use this approach too:

DML statements and SELECT FOR UPDATE/LOCK IN SHARE MODE should read the latest committed data (this includes the unique key checks they do)

No Snapshot Checking is necessary.

Regular SELECTs should still read from the snapshot (This should happen even if the transaction is already holding a lock on the row. Even in this case, regular SELECT may return an out-of-date version of the row).

Sergei Petrunia added a comment - 2018-10-15 20:21 InnoDB's equivalent of Snapshot Checking InnoDB also uses multi-versioning and locking for intended writes. It doesn't do SnapshotChecking, so it faces a similar problem with overwriting the changes that were made after the transaction's snapshot was taken but before the lock was acquired: 1. trx1> start; allocate a snapshot 2. trx2> update value for $ROW_KEY_1; commit; 3. trx1> update value for $ROW_KEY_1; 4. trx1> commit; InnoDB solves this by having DML statements to read the latest committed data, instead of the latest snapshot. This does look like a READ-COMMITTED isolation level: trx1> begin; trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 3 | +----+------+ trx2> update t1 set a=33 where pk=3; -- autocommit=1 here Transaction trx1 is reading from the snapshot: trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 3 | +----+------+ unless it's a FOR UPDATE (or DML) which will see the latest committed data: trx1> select * from t1 where pk=3 for update; +----+------+ | pk | a | +----+------+ | 3 | 33 | +----+------+ Regardless of that, further SELECTs will continue to read from the snapshot: trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 3 | +----+------+ DML will operate on the latest committed data: trx1> update t1 set a=a+1 where pk=3; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 trx1> select * from t1 where pk=3; +----+------+ | pk | a | +----+------+ | 3 | 34 | +----+------+ This behavior "breaks" the promise of REPEATABLE-READ on the master, but in return, the statement will have the same effect when it is run on the slave. Use in Range Locking in MyRocks Range Locking mode in MyRocks can use this approach too: DML statements and SELECT FOR UPDATE/LOCK IN SHARE MODE should read the latest committed data (this includes the unique key checks they do) No Snapshot Checking is necessary. Regular SELECTs should still read from the snapshot (This should happen even if the transaction is already holding a lock on the row. Even in this case, regular SELECT may return an out-of-date version of the row).

Sergei Petrunia made changes - 2018-11-29 17:43

Link

This issue includes MDEV-17873 [ MDEV-17873 ]

Sergei Petrunia made changes - 2018-11-29 19:32

Link

This issue includes ~~MDEV-17874~~ [ ~~MDEV-17874~~ ]

Sergei Petrunia added a comment - 2018-11-30 17:13 - edited

Currently failing tests:

~~rocksdb.rqg_transactions~~
~~rocksdb.rocksdb_deadlock_stress_rc~~
~~rocksdb.rocksdb_deadlock_stress_rr~~
~~rocksdb.deadlock_stats~~
rocksdb.compact_deletes
rocksdb.rocksdb_deadlock_detect_rc
rocksdb.deadlock
rocksdb.deadlock_tracking
rocksdb.gap_lock_raise_error
rocksdb.i_s_deadlock
rocksdb.rocksdb_deadlock_detect_rr

rocksdb.rqg_transactions 'range_locking'

Assertion failure in toku::treenode::remove

rocksdb.compact_deletes 'range_locking'

Timed out, it was just hangint with no user activity??

rocksdb.rocksdb_deadlock_detect_rc 'range_locking'

Lock wait timeout error

rocksdb.rocksdb_deadlock_stress_rc 'range_locking'
rocksdb.rocksdb_deadlock_stress_rr 'range_locking'

Lock wait timeout error

rocksdb.deadlock 'range_locking'

900 sec. timeout, several threads waiting for a lock

rocksdb.deadlock_stats 'range_locking'- "mysqltest got signal 6" - crash on the client??

still, the test seems to use deadlock detector.

rocksdb.deadlock_tracking 'range_locking'

Lock wait timeout error.

rocksdb.gap_lock_raise_error 'range_locking'

Lock wait timeout error.

rocksdb.i_s_deadlock 'range_locking'

Lock wait timeout error.

rocksdb.rocksdb_deadlock_detect_rr 'range_locking'

Lock wait timeout error.

Sergei Petrunia added a comment - 2018-11-30 17:13 - edited Currently failing tests: rocksdb.rqg_transactions rocksdb.rocksdb_deadlock_stress_rc rocksdb.rocksdb_deadlock_stress_rr rocksdb.deadlock_stats rocksdb.compact_deletes rocksdb.rocksdb_deadlock_detect_rc rocksdb.deadlock rocksdb.deadlock_tracking rocksdb.gap_lock_raise_error rocksdb.i_s_deadlock rocksdb.rocksdb_deadlock_detect_rr rocksdb.rqg_transactions 'range_locking' Assertion failure in toku::treenode::remove rocksdb.compact_deletes 'range_locking' Timed out, it was just hangint with no user activity?? rocksdb.rocksdb_deadlock_detect_rc 'range_locking' Lock wait timeout error rocksdb.rocksdb_deadlock_stress_rc 'range_locking' rocksdb.rocksdb_deadlock_stress_rr 'range_locking' Lock wait timeout error rocksdb.deadlock 'range_locking' 900 sec. timeout, several threads waiting for a lock rocksdb.deadlock_stats 'range_locking'- "mysqltest got signal 6" - crash on the client?? still, the test seems to use deadlock detector. rocksdb.deadlock_tracking 'range_locking' Lock wait timeout error. rocksdb.gap_lock_raise_error 'range_locking' Lock wait timeout error. rocksdb.i_s_deadlock 'range_locking' Lock wait timeout error. rocksdb.rocksdb_deadlock_detect_rr 'range_locking' Lock wait timeout error.

Sergei Petrunia made changes - 2018-12-02 21:18

Link

This issue includes ~~MDEV-17887~~ [ ~~MDEV-17887~~ ]

Sergei Petrunia made changes - 2018-12-29 14:31

Link

This issue includes ~~MDEV-18104~~ [ ~~MDEV-18104~~ ]

Sergei Petrunia added a comment - 2019-01-08 10:03

Currently, the tests pass.
rocksdb testsuite now has three "combinations" - write_pareared, write_committed, and range_locking.
Tests that assume point locking are disabled in 'range_locking' mode.
There are also tests that target specifically range locking.

Sergei Petrunia added a comment - 2019-01-08 10:03 Currently, the tests pass. rocksdb testsuite now has three "combinations" - write_pareared, write_committed, and range_locking. Tests that assume point locking are disabled in 'range_locking' mode. There are also tests that target specifically range locking.

Sergei Petrunia added a comment - 2019-01-08 10:04

Remaining issues:

Reduce transaction's list of acquired locks to reflect the actions of lock escalation.
Turn off snapshot validation.

Sergei Petrunia added a comment - 2019-01-08 10:04 Remaining issues: Reduce transaction's list of acquired locks to reflect the actions of lock escalation. Turn off snapshot validation.

Sergei Petrunia made changes - 2019-01-14 14:03

Link

This issue includes ~~MDEV-18227~~ [ ~~MDEV-18227~~ ]

Sergei Petrunia added a comment - 2019-01-28 21:33

Now the above is done and there are no known Gap-Lock-related test failures in the rocksdb test suite.

Also did some code cleanup in preparation for a pull request to RocksDB, but more cleanups will be needed.

Sergei Petrunia added a comment - 2019-01-28 21:33 Now the above is done and there are no known Gap-Lock-related test failures in the rocksdb test suite. Also did some code cleanup in preparation for a pull request to RocksDB, but more cleanups will be needed.

Sergei Petrunia added a comment - 2019-01-28 21:39 - edited

Also did a basic benchmark: ran sysbench oltp_read_write.lua for:

rocksdb_use_range_locking=1
rocksdb_use_range_locking=0
the original tree that range locking patch is currently based on.

SYSBENCH_BASE_ARGS=" --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=root \

  --time=60 \

  /usr/share/sysbench/oltp_read_write.lua --table-size=1000000"

SYSBENCH_CUR_ARGS="$SYSBENCH_BASE_ARGS --mysql_storage_engine=RocksDB"

sysbench $SYSBENCH_CUR_ARGS prepare;

  for threads in 1 10 20 40 ; do

    SYSBENCH_ALL_ARGS="$SYSBENCH_CUR_ARGS --threads=$threads"

  done

Results:

rangelocking=ON

1 307.74

10 1576.26

20 1819.30

40 1640.48

rangelocking=OFF

1 307.58

10 1579.74

20 1838.34

40 1620.53

rangelocking-orig

1 306.23

10 1565.10

20 1811.46

40 1611.57

Sergei Petrunia added a comment - 2019-01-28 21:39 - edited Also did a basic benchmark: ran sysbench oltp_read_write.lua for: rocksdb_use_range_locking=1 rocksdb_use_range_locking=0 the original tree that range locking patch is currently based on. SYSBENCH_BASE_ARGS=" --db-driver=mysql --mysql-host=127.0.0.1 --mysql-user=root \ --time=60 \ /usr/share/sysbench/oltp_read_write.lua --table-size=1000000" SYSBENCH_CUR_ARGS="$SYSBENCH_BASE_ARGS --mysql_storage_engine=RocksDB" sysbench $SYSBENCH_CUR_ARGS prepare; for threads in 1 10 20 40 ; do SYSBENCH_ALL_ARGS="$SYSBENCH_CUR_ARGS --threads=$threads" done Results: rangelocking=ON 1 307.74 10 1576.26 20 1819.30 40 1640.48 rangelocking=OFF 1 307.58 10 1579.74 20 1838.34 40 1620.53 rangelocking-orig 1 306.23 10 1565.10 20 1811.46 40 1611.57

Sergei Petrunia made changes - 2019-01-28 21:51

Attachment

screenshot-3.png [ 47176 ]

Sergei Petrunia added a comment - 2019-01-28 21:51

In tabular form

	rangelocking=ON	rangelocking=OFF	rangelocking-orig

1	307.74	307.58	306.23

10	1576.26	1579.74	1565.1

20	1819.3	1838.34	1811.46

40	1640.48	1620.53	1611.57

Sergei Petrunia added a comment - 2019-01-28 21:51 In tabular form rangelocking=ON rangelocking=OFF rangelocking-orig 1 307.74 307.58 306.23 10 1576.26 1579.74 1565.1 20 1819.3 1838.34 1811.46 40 1640.48 1620.53 1611.57

Sergei Petrunia made changes - 2019-03-08 08:48

Link

This issue relates to ~~MDEV-18856~~ [ ~~MDEV-18856~~ ]

Sergei Petrunia added a comment - 2019-04-30 18:27

The pull request is at https://github.com/facebook/rocksdb/pull/5041

Sergei Petrunia added a comment - 2019-04-30 18:27 The pull request is at https://github.com/facebook/rocksdb/pull/5041

Sergei Petrunia made changes - 2019-05-13 14:12

Link

This issue includes MDEV-19451 [ MDEV-19451 ]

Sergei Petrunia made changes - 2019-07-08 15:59

Link

This issue includes MDEV-19986 [ MDEV-19986 ]

Sergei Petrunia added a comment - 2019-11-18 10:36

Got a question about refreshing the iterator.

Consider a query:

update t1 set col1=col1+1000 where (pk between 3 and 7) or (pk between 10 and 15);

Suppose the range locking is ON, the table has `PRIMARY KEY(pk)`, and the query is using the PK.

It will do this:

  trx->get_range_lock([3; 7]);

  iter = trx->get_iterator(); // (1)

  // Use the iter to read the latest commited rows in the [3..7] range

  // (2)

  trx->get_range_lock([10; 15]);  // (3)

Now, the iterator we created at point (1) is reading the snapshot of data taken at that moment.

We need to read the latest-committed (to be precise - we need to see everything that was committed into the 10..15 range before the get_range_lock call marked with (3) was run.

We should call this:

  iter->Refresh();

But for me the iterator is `rocksdb::BaseDeltaIterator`, which doesn't override Refresh(), so it uses rocksdb::Iterator::Refresh, which is this:

  virtual Status Refresh() {

    return Status::NotSupported("Refresh() is not supported");

Does this mean

The iterator I've got will return me the latest data (and NOT the "snapshot at the time the iterator was created, (1))
or
The iterator I've got doesnt support Refresh() so I should destroy and re-create it?

Sergei Petrunia added a comment - 2019-11-18 10:36 Got a question about refreshing the iterator. Consider a query: update t1 set col1=col1+1000 where (pk between 3 and 7) or (pk between 10 and 15); Suppose the range locking is ON, the table has `PRIMARY KEY(pk)`, and the query is using the PK. It will do this: trx->get_range_lock([3; 7]); iter = trx->get_iterator(); // (1) // Use the iter to read the latest commited rows in the [3..7] range // (2) trx->get_range_lock([10; 15]); // (3) Now, the iterator we created at point (1) is reading the snapshot of data taken at that moment. We need to read the latest-committed (to be precise - we need to see everything that was committed into the 10..15 range before the get_range_lock call marked with (3) was run. We should call this: iter->Refresh(); But for me the iterator is `rocksdb::BaseDeltaIterator`, which doesn't override Refresh(), so it uses rocksdb::Iterator::Refresh, which is this: virtual Status Refresh() { return Status::NotSupported( "Refresh() is not supported" ); } Does this mean The iterator I've got will return me the latest data (and NOT the "snapshot at the time the iterator was created, (1)) or The iterator I've got doesnt support Refresh() so I should destroy and re-create it?

Sergei Petrunia made changes - 2019-11-19 11:21

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Sergei Petrunia added a comment - 2019-12-02 21:08

An MTR testcase for iterator refresh:
https://gist.github.com/spetrunia/7ead10923d40bf2d9baa960740733945

Result of it:
https://gist.github.com/spetrunia/915cdeeb033251a288ec88509bb04582#file-range-locking-iterator-refresh-result-sql-L22

It shows that the iterator sees the row that has been deleted. When it attempts to read the row, we get the Got error 1 'NotFound: error.

Now, let's remove the DELETE statement from the testcase:
https://gist.github.com/spetrunia/ac3392e8279007eb15411872cbc43241
the output: https://gist.github.com/spetrunia/33ce1b208109c8b0331fc54768de58ec

30 5000

The INSERT'ed row was not updated, so it was not visible to the iterator.

For the updated rows, the result looks as if the iterator saw the latest?

40 5100
41 5100
42 5100
43 5100
44 5100
45 5100

(or is this the result of extra GetForUpdate calls?)

Sergei Petrunia added a comment - 2019-12-02 21:08 An MTR testcase for iterator refresh: https://gist.github.com/spetrunia/7ead10923d40bf2d9baa960740733945 Result of it: https://gist.github.com/spetrunia/915cdeeb033251a288ec88509bb04582#file-range-locking-iterator-refresh-result-sql-L22 It shows that the iterator sees the row that has been deleted. When it attempts to read the row, we get the Got error 1 'NotFound: error. Now, let's remove the DELETE statement from the testcase: https://gist.github.com/spetrunia/ac3392e8279007eb15411872cbc43241 the output: https://gist.github.com/spetrunia/33ce1b208109c8b0331fc54768de58ec 30 5000 The INSERT'ed row was not updated, so it was not visible to the iterator. For the updated rows, the result looks as if the iterator saw the latest? 40 5100 41 5100 42 5100 43 5100 44 5100 45 5100 (or is this the result of extra GetForUpdate calls?)

Sergei Petrunia added a comment - 2019-12-05 17:36

Ok,

the iterator obtained from TransactionDB->NewIterator() has a non-trivial Refresh implementation, ArenaWrappedDBIter::Refresh().
the iterator obtained from Transaction->GetIterator() doesn't support refresh. It's a BaseDeltaIterator. It has base_iterator_= ArenaWrappedDBIter, delta_iterator_=WBWIIteratorImpl.

Sergei Petrunia added a comment - 2019-12-05 17:36 Ok, the iterator obtained from TransactionDB->NewIterator() has a non-trivial Refresh implementation, ArenaWrappedDBIter::Refresh(). the iterator obtained from Transaction->GetIterator() doesn't support refresh. It's a BaseDeltaIterator. It has base_iterator_= ArenaWrappedDBIter, delta_iterator_=WBWIIteratorImpl.

Sergei Petrunia made changes - 2019-12-13 09:03

Link

This issue includes MDEV-21314 [ MDEV-21314 ]

Sergei Petrunia made changes - 2020-01-27 11:07

Link

This issue relates to MDEV-21574 [ MDEV-21574 ]

Sergei Petrunia made changes - 2020-04-06 09:28

Link

This issue relates to ~~MDEV-21186~~ [ ~~MDEV-21186~~ ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 86097 ]

MariaDB v4 [ 131685 ]

AirFocus made changes - 2022-08-09 16:11

Description	(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 ) Notes about how to use PerconaFT: 1. Data structures 1.1 A Global Lock Tree Manager object 1.2 A separate Lock Tree for each table 1.3 Each transaction keeps a track of ranges it is holding locks 2. Functions 2.1 Initializing the Lock Manager 2.2 Create Lock Tree for a table 2.3 Getting a lock 2.4 Releasing a lock. 2.5 Releasing all of the transaction's locks h2. 1. Data structures h3. 1.1 A Global Lock Tree Manager object There needs to be a global {{locktree_manager}}. See PerconaFT/src/ydb-internal.h, {noformat} struct __toku_db_env_internal { toku::locktree_manager ltm; {noformat} h3. 1.2 A separate Lock Tree for each table TokuDB uses a separate Lock Tree for each table {{db->i->lt}}. h3.1.3 Each transaction keeps a track of ranges it is holding locks Each transaction has a list of ranges that it is holding locks on. It is referred to like so {code:cpp} db_txn_struct_i(txn)->lt_map {code} and is stored in this structure, together with a mutex to protect it: {code:cpp} struct __toku_db_txn_internal { // maps a locktree to a buffer of key ranges that are locked. // it is protected by the txn_mutex, so hot indexing and a client // thread can concurrently operate on this txn. toku::omt<txn_lt_key_ranges> lt_map; toku_mutex_t txn_mutex; {code} The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread). (See toku_txn_destroy for how to free this) h2. 2. Functions Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} - this is TokuDB's layer above the Lock Tree. h3. 2.1 Initializing the Lock Manager TODO h3. 2.2 Create Lock Tree for a table TokuDB does it when it opens a table's table_share. It is done like so: {code:cpp} db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id, toku_ft_get_comparator(db->i->ft_handle), &on_create_extra); {code} Then, one needs to release it: {code:cpp} db->dbenv->i->ltm.release_lt(db->i->lt); {code} after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty). (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table) h3. 2.3 Getting a lock This function has an example: {code:cpp} // Get a range lock. // Return when the range lock is acquired or the default lock tree timeout has expired. int toku_db_get_range_lock(DB db, DB_TXN txn, const DBT left_key, const DBT right_key, toku::lock_request::type lock_type) { {code} It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?) Point locks are obtained by passing the same key as left_key and right_key. h3. 2.4 Releasing a lock. TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted). LockTree has a function to release locks from a specified range: {code:cpp} locktree::release_locks(TXNID txnid, const range_buffer ranges) {code} Besides calling that, one will need to wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?) * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished. h3. 2.5 Releasing all of the transaction's locks See {{PerconaFT/src/ydb_txn.cc}}: {code:cpp} static void toku_txn_release_locks(DB_TXN *txn) { // Prevent access to the locktree map while releasing. // It is possible for lock escalation to attempt to // modify this data structure while the txn commits. toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex); size_t num_ranges = db_txn_struct_i(txn)->lt_map.size(); for (size_t i = 0; i < num_ranges; i++) { txn_lt_key_ranges ranges; int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges); invariant_zero(r); toku_db_release_lt_key_ranges(txn, &ranges); } toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex); } {code}	(The upstream task is: https://github.com/facebook/mysql\-5.6/issues/800 ) Notes about how to use PerconaFT: 1. Data structures 1.1 A Global Lock Tree Manager object 1.2 A separate Lock Tree for each table 1.3 Each transaction keeps a track of ranges it is holding locks 2. Functions 2.1 Initializing the Lock Manager 2.2 Create Lock Tree for a table 2.3 Getting a lock 2.4 Releasing a lock. 2.5 Releasing all of the transaction's locks h2. 1. Data structures h3. 1.1 A Global Lock Tree Manager object There needs to be a global {{locktree_manager}}. See PerconaFT/src/ydb\-internal.h, {noformat} struct __toku_db_env_internal { toku::locktree_manager ltm; {noformat} h3. 1.2 A separate Lock Tree for each table TokuDB uses a separate Lock Tree for each table {{db->i->lt}}. h3.1.3 Each transaction keeps a track of ranges it is holding locks Each transaction has a list of ranges that it is holding locks on. It is referred to like so {code:cpp} db_txn_struct_i(txn)->lt_map {code} and is stored in this structure, together with a mutex to protect it: {code:cpp} struct __toku_db_txn_internal { // maps a locktree to a buffer of key ranges that are locked. // it is protected by the txn_mutex, so hot indexing and a client // thread can concurrently operate on this txn. toku::omt<txn_lt_key_ranges> lt_map; toku_mutex_t txn_mutex; {code} The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread). (See toku_txn_destroy for how to free this) h2. 2. Functions Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} \- this is TokuDB's layer above the Lock Tree. h3. 2.1 Initializing the Lock Manager TODO h3. 2.2 Create Lock Tree for a table TokuDB does it when it opens a table's table\_share. It is done like so: {code:cpp} db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id, toku_ft_get_comparator(db->i->ft_handle), &on_create_extra); {code} Then, one needs to release it: {code:cpp} db->dbenv->i->ltm.release_lt(db->i->lt); {code} after the last release\_lt call, the Lock Tree will be deleted (it is guaranteed to be empty). (TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table) h3. 2.3 Getting a lock This function has an example: {code:cpp} // Get a range lock. // Return when the range lock is acquired or the default lock tree timeout has expired. int toku_db_get_range_lock(DB db, DB_TXN txn, const DBT left_key, const DBT right_key, toku::lock_request::type lock_type) { {code} It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?) Point locks are obtained by passing the same key as left_key and right_key. h3. 2.4 Releasing a lock. TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted). LockTree has a function to release locks from a specified range: {code:cpp} locktree::release_locks(TXNID txnid, const range_buffer ranges) {code} Besides calling that, one will need to wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?) * Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished. h3. 2.5 Releasing all of the transaction's locks See {{PerconaFT/src/ydb_txn.cc}}: {code:cpp} static void toku_txn_release_locks(DB_TXN *txn) { // Prevent access to the locktree map while releasing. // It is possible for lock escalation to attempt to // modify this data structure while the txn commits. toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex); size_t num_ranges = db_txn_struct_i(txn)->lt_map.size(); for (size_t i = 0; i < num_ranges; i++) { txn_lt_key_ranges ranges; int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges); invariant_zero(r); toku_db_release_lt_key_ranges(txn, &ranges); } toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex); } {code}
Summary	Gap Lock support in MyRocks	Gap Lock support in MyRocks

Julien Fritsch made changes - 2022-08-10 08:21

Description

(The upstream task is: https://github.com/facebook/mysql\-5.6/issues/800 )

Notes about how to use PerconaFT:

1. Data structures
1.1 A Global Lock Tree Manager object
1.2 A separate Lock Tree for each table
1.3 Each transaction keeps a track of ranges it is holding locks
2. Functions
2.1 Initializing the Lock Manager
2.2 Create Lock Tree for a table
2.3 Getting a lock
2.4 Releasing a lock.
2.5 Releasing all of the transaction's locks

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb\-internal.h,

{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table

TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so

{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} \- this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager

TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table\_share. It is done like so:

{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:

{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}

after the last release\_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:

{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Besides calling that, one will need to

* wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
* Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

h3. 2.5 Releasing all of the transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:

{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

(The upstream task is: https://github.com/facebook/mysql-5.6/issues/800 )

Notes about how to use PerconaFT:

1. Data structures
1.1 A Global Lock Tree Manager object
1.2 A separate Lock Tree for each table
1.3 Each transaction keeps a track of ranges it is holding locks
2. Functions
2.1 Initializing the Lock Manager
2.2 Create Lock Tree for a table
2.3 Getting a lock
2.4 Releasing a lock.
2.5 Releasing all of the transaction's locks

h2. 1. Data structures

h3. 1.1 A Global Lock Tree Manager object

There needs to be a global {{locktree_manager}}.

See PerconaFT/src/ydb-internal.h,

{noformat}
  struct __toku_db_env_internal {
    toku::locktree_manager ltm;
{noformat}

h3. 1.2 A separate Lock Tree for each table

TokuDB uses a separate Lock Tree for each table {{db->i->lt}}.

h3.1.3 Each transaction keeps a track of ranges it is holding locks

Each transaction has a list of ranges that it is holding locks on. It is referred to like so

{code:cpp}
  db_txn_struct_i(txn)->lt_map
{code}

and is stored in this structure, together with a mutex to protect it:

{code:cpp}
  struct __toku_db_txn_internal {
      // maps a locktree to a buffer of key ranges that are locked.
      // it is protected by the txn_mutex, so hot indexing and a client
      // thread can concurrently operate on this txn.
      toku::omt<txn_lt_key_ranges> lt_map;
      toku_mutex_t txn_mutex;
{code}

The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)

h2. 2. Functions

Most functions that are mentioned here are from {{storage/tokudb/PerconaFT/src/}}, {{ydb_txn.cc}}, {{ydb_row_lock.cc}} \- this is TokuDB's layer above the Lock Tree.

h3. 2.1 Initializing the Lock Manager

TODO

h3. 2.2 Create Lock Tree for a table

TokuDB does it when it opens a table's table_share. It is done like so:

{code:cpp}
        db->i->lt = db->dbenv->i->ltm.get_lt(db->i->dict_id,
                                             toku_ft_get_comparator(db->i->ft_handle),
                                             &on_create_extra);
{code}

Then, one needs to release it:

{code:cpp}
db->dbenv->i->ltm.release_lt(db->i->lt);
{code}

after the last release\_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).

(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)

h3. 2.3 Getting a lock

This function has an example:

{code:cpp}
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
int toku_db_get_range_lock(DB *db, DB_TXN *txn, const DBT *left_key, const DBT *right_key,
        toku::lock_request::type lock_type) {
{code}

It is also possible to start an asynchronous lock request and then wait for it (see {{toku_db_start_range_lock}}, {{toku_db_wait_range_lock}}). We don't have a use for this it seems (?)

Point locks are obtained by passing the same key as left_key and right_key.

h3. 2.4 Releasing a lock.

TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).

LockTree has a function to release locks from a specified range:

{code:cpp}
locktree::release_locks(TXNID txnid, const range_buffer *ranges)
{code}

Besides calling that, one will need to

* wake up all waiting lock requests. {{release_locks}} doesn't wake them up. There is {{toku::lock_request::retry_all_lock_requests}} call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
* Remove the released lock from the list of locks it is holding (which is in {{db_txn_struct_i(txn)->lt_map}}). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.

h3. 2.5 Releasing all of the transaction's locks

See {{PerconaFT/src/ydb_txn.cc}}:

{code:cpp}
static void toku_txn_release_locks(DB_TXN *txn) {
    // Prevent access to the locktree map while releasing.
    // It is possible for lock escalation to attempt to
    // modify this data structure while the txn commits.
    toku_mutex_lock(&db_txn_struct_i(txn)->txn_mutex);

    size_t num_ranges = db_txn_struct_i(txn)->lt_map.size();
    for (size_t i = 0; i < num_ranges; i++) {
        txn_lt_key_ranges ranges;
        int r = db_txn_struct_i(txn)->lt_map.fetch(i, &ranges);
        invariant_zero(r);
        toku_db_release_lt_key_ranges(txn, &ranges);
    }

    toku_mutex_unlock(&db_txn_struct_i(txn)->txn_mutex);
}
{code}

MariaDB Server

Gap Lock support in MyRocks

Details

Description

1. Data structures

1.1 A Global Lock Tree Manager object

1.2 A separate Lock Tree for each table

1.3 Each transaction keeps a track of ranges it is holding locks

2. Functions

2.1 Initializing the Lock Manager

2.2 Create Lock Tree for a table

2.3 Getting a lock

2.4 Releasing a lock.

2.5 Releasing all of the transaction's locks

Attachments

Attachments

Issue Links

Activity

InnoDB's equivalent of Snapshot Checking

Use in Range Locking in MyRocks

People

Dates

Git Integration