1. Data structures
1.1 A Global Lock Tree Manager object
1.2 A separate Lock Tree for each table
1.3 Each transaction keeps a track of ranges it is holding locks
2. Functions
2.1 Initializing the Lock Manager
2.2 Create Lock Tree for a table
2.3 Getting a lock
2.4 Releasing a lock.
2.5 Releasing all of the transaction's locks
1. Data structures
1.1 A Global Lock Tree Manager object
There needs to be a global locktree_manager.
See PerconaFT/src/ydb-internal.h,
struct __toku_db_env_internal {
toku::locktree_manager ltm;
1.2 A separate Lock Tree for each table
TokuDB uses a separate Lock Tree for each table db->i->lt.
1.3 Each transaction keeps a track of ranges it is holding locks
Each transaction has a list of ranges that it is holding locks on. It is referred to like so
db_txn_struct_i(txn)->lt_map
and is stored in this structure, together with a mutex to protect it:
struct __toku_db_txn_internal {
// maps a locktree to a buffer of key ranges that are locked.
// it is protected by the txn_mutex, so hot indexing and a client
// thread can concurrently operate on this txn.
toku::omt<txn_lt_key_ranges> lt_map;
toku_mutex_t txn_mutex;
The mutex is there, because the list may be modified by the lock escalation process (which may be invoked from a different thread).
(See toku_txn_destroy for how to free this)
2. Functions
Most functions that are mentioned here are from storage/tokudb/PerconaFT/src/, ydb_txn.cc, ydb_row_lock.cc - this is TokuDB's layer above the Lock Tree.
2.1 Initializing the Lock Manager
TODO
2.2 Create Lock Tree for a table
TokuDB does it when it opens a table's table_share. It is done like so:
after the last release_lt call, the Lock Tree will be deleted (it is guaranteed to be empty).
(TODO: this is easy to arrange if Toku locks are invoked from MyRocks level. But if they are invoked from RocksDB, this is harder as RocksDB doesn't have any concept of tables or indexes. For start, we can pretend all keys are in one table)
2.3 Getting a lock
This function has an example:
// Get a range lock.
// Return when the range lock is acquired or the default lock tree timeout has expired.
It is also possible to start an asynchronous lock request and then wait for it (see toku_db_start_range_lock, toku_db_wait_range_lock). We don't have a use for this it seems
Point locks are obtained by passing the same key as left_key and right_key.
2.4 Releasing a lock.
TokuDB doesn't seem to release individual locks (all locks are held until transaction either commits or is aborted).
LockTree has a function to release locks from a specified range:
wake up all waiting lock requests. release_locks doesn't wake them up. There is toku::lock_request::retry_all_lock_requests call which retries all pending requests (Which doesn't seem to be efficient... but maybe it is ok?)
Remove the released lock from the list of locks it is holding (which is in db_txn_struct_i(txn)->lt_map). This is actually not essential because that list is only used for the purpose of releasing the locks when the transaction is finished.
2.5 Releasing all of the transaction's locks
See PerconaFT/src/ydb_txn.cc:
staticvoid toku_txn_release_locks(DB_TXN *txn) {
// Prevent access to the locktree map while releasing.
// It is possible for lock escalation to attempt to
// modify this data structure while the txn commits.
update t1 set col1=col1+1000 where (pk between 3 and 7) or (pk between 10 and 15);
Suppose the range locking is ON, the table has `PRIMARY KEY(pk)`, and the query is using the PK.
It will do this:
trx->get_range_lock([3; 7]);
iter = trx->get_iterator(); // (1)
// Use the iter to read the latest commited rows in the [3..7] range
// (2)
trx->get_range_lock([10; 15]); // (3)
Now, the iterator we created at point (1) is reading the snapshot of data taken at that moment.
We need to read the latest-committed (to be precise - we need to see everything that was committed into the 10..15 range before the get_range_lock call marked with (3) was run.
We should call this:
iter->Refresh();
But for me the iterator is `rocksdb::BaseDeltaIterator`, which doesn't override Refresh(), so it uses rocksdb::Iterator::Refresh, which is this:
virtual Status Refresh() {
return Status::NotSupported("Refresh() is not supported");
}
Does this mean
The iterator I've got will return me the latest data (and NOT the "snapshot at the time the iterator was created, (1))
or
The iterator I've got doesnt support Refresh() so I should destroy and re-create it?
Sergei Petrunia
added a comment - Got a question about refreshing the iterator.
Consider a query:
update t1 set col1=col1+1000 where (pk between 3 and 7) or (pk between 10 and 15);
Suppose the range locking is ON, the table has `PRIMARY KEY(pk)`, and the query is using the PK.
It will do this:
trx->get_range_lock([3; 7]);
iter = trx->get_iterator(); // (1)
// Use the iter to read the latest commited rows in the [3..7] range
// (2)
trx->get_range_lock([10; 15]); // (3)
Now, the iterator we created at point (1) is reading the snapshot of data taken at that moment.
We need to read the latest-committed (to be precise - we need to see everything that was committed into the 10..15 range before the get_range_lock call marked with (3) was run.
We should call this:
iter->Refresh();
But for me the iterator is `rocksdb::BaseDeltaIterator`, which doesn't override Refresh(), so it uses rocksdb::Iterator::Refresh, which is this:
virtual Status Refresh() {
return Status::NotSupported( "Refresh() is not supported" );
}
Does this mean
The iterator I've got will return me the latest data (and NOT the "snapshot at the time the iterator was created, (1))
or
The iterator I've got doesnt support Refresh() so I should destroy and re-create it?
The INSERT'ed row was not updated, so it was not visible to the iterator.
For the updated rows, the result looks as if the iterator saw the latest?
40 5100
41 5100
42 5100
43 5100
44 5100
45 5100
(or is this the result of extra GetForUpdate calls?)
Sergei Petrunia
added a comment - An MTR testcase for iterator refresh:
https://gist.github.com/spetrunia/7ead10923d40bf2d9baa960740733945
Result of it:
https://gist.github.com/spetrunia/915cdeeb033251a288ec88509bb04582#file-range-locking-iterator-refresh-result-sql-L22
It shows that the iterator sees the row that has been deleted. When it attempts to read the row, we get the Got error 1 'NotFound: error.
Now, let's remove the DELETE statement from the testcase:
https://gist.github.com/spetrunia/ac3392e8279007eb15411872cbc43241
the output: https://gist.github.com/spetrunia/33ce1b208109c8b0331fc54768de58ec
30 5000
The INSERT'ed row was not updated, so it was not visible to the iterator.
For the updated rows, the result looks as if the iterator saw the latest?
40 5100
41 5100
42 5100
43 5100
44 5100
45 5100
(or is this the result of extra GetForUpdate calls?)
the iterator obtained from TransactionDB->NewIterator() has a non-trivial Refresh implementation, ArenaWrappedDBIter::Refresh().
the iterator obtained from Transaction->GetIterator() doesn't support refresh. It's a BaseDeltaIterator. It has base_iterator_= ArenaWrappedDBIter, delta_iterator_=WBWIIteratorImpl.
Sergei Petrunia
added a comment - Ok,
the iterator obtained from TransactionDB->NewIterator() has a non-trivial Refresh implementation, ArenaWrappedDBIter::Refresh().
the iterator obtained from Transaction->GetIterator() doesn't support refresh. It's a BaseDeltaIterator. It has base_iterator_= ArenaWrappedDBIter, delta_iterator_=WBWIIteratorImpl.
People
Sergei Petrunia
Sergei Petrunia
Votes:
1Vote for this issue
Watchers:
5Start watching this issue
Dates
Created:
Updated:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1021.2000007629395,"ttfb":329.5,"pageVisibility":"visible","entityId":66394,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"ee497c7c-e093-4ea3-afe3-b1c1c971ba70","navigationType":0,"readyForUser":1133.1000003814697,"redirectCount":0,"resourceLoadedEnd":1247.8000001907349,"resourceLoadedStart":336.30000019073486,"resourceTiming":[{"duration":85.80000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":336.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":336.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":422.1000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":85.80000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":336.6000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":336.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":422.4000005722046,"responseStart":0,"secureConnectionStart":0},{"duration":211.0999994277954,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":336.70000076293945,"connectEnd":336.70000076293945,"connectStart":336.70000076293945,"domainLookupEnd":336.70000076293945,"domainLookupStart":336.70000076293945,"fetchStart":336.70000076293945,"redirectEnd":0,"redirectStart":0,"requestStart":336.70000076293945,"responseEnd":547.8000001907349,"responseStart":547.8000001907349,"secureConnectionStart":336.70000076293945},{"duration":289.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":336.9000005722046,"connectEnd":336.9000005722046,"connectStart":336.9000005722046,"domainLookupEnd":336.9000005722046,"domainLookupStart":336.9000005722046,"fetchStart":336.9000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":336.9000005722046,"responseEnd":626.2000007629395,"responseStart":626.2000007629395,"secureConnectionStart":336.9000005722046},{"duration":293,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":337.1000003814697,"connectEnd":337.1000003814697,"connectStart":337.1000003814697,"domainLookupEnd":337.1000003814697,"domainLookupStart":337.1000003814697,"fetchStart":337.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":337.1000003814697,"responseEnd":630.1000003814697,"responseStart":630.1000003814697,"secureConnectionStart":337.1000003814697},{"duration":293.4000005722046,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":337.30000019073486,"connectEnd":337.30000019073486,"connectStart":337.30000019073486,"domainLookupEnd":337.30000019073486,"domainLookupStart":337.30000019073486,"fetchStart":337.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":337.30000019073486,"responseEnd":630.7000007629395,"responseStart":630.7000007629395,"secureConnectionStart":337.30000019073486},{"duration":293.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":337.4000005722046,"connectEnd":337.4000005722046,"connectStart":337.4000005722046,"domainLookupEnd":337.4000005722046,"domainLookupStart":337.4000005722046,"fetchStart":337.4000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":337.4000005722046,"responseEnd":631.2000007629395,"responseStart":631.2000007629395,"secureConnectionStart":337.4000005722046},{"duration":354.6000003814697,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":337.6000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":337.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":692.2000007629395,"responseStart":0,"secureConnectionStart":0},{"duration":294,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":337.70000076293945,"connectEnd":337.70000076293945,"connectStart":337.70000076293945,"domainLookupEnd":337.70000076293945,"domainLookupStart":337.70000076293945,"fetchStart":337.70000076293945,"redirectEnd":0,"redirectStart":0,"requestStart":337.70000076293945,"responseEnd":631.7000007629395,"responseStart":631.7000007629395,"secureConnectionStart":337.70000076293945},{"duration":354.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":337.9000005722046,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":337.9000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":692.4000005722046,"responseStart":0,"secureConnectionStart":0},{"duration":294.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":338,"connectEnd":338,"connectStart":338,"domainLookupEnd":338,"domainLookupStart":338,"fetchStart":338,"redirectEnd":0,"redirectStart":0,"requestStart":338,"responseEnd":632.3000001907349,"responseStart":632.3000001907349,"secureConnectionStart":338},{"duration":410.1000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":339.5,"connectEnd":339.5,"connectStart":339.5,"domainLookupEnd":339.5,"domainLookupStart":339.5,"fetchStart":339.5,"redirectEnd":0,"redirectStart":0,"requestStart":339.5,"responseEnd":749.6000003814697,"responseStart":749.5,"secureConnectionStart":339.5},{"duration":908.0999994277954,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":339.70000076293945,"connectEnd":339.70000076293945,"connectStart":339.70000076293945,"domainLookupEnd":339.70000076293945,"domainLookupStart":339.70000076293945,"fetchStart":339.70000076293945,"redirectEnd":0,"redirectStart":0,"requestStart":339.70000076293945,"responseEnd":1247.8000001907349,"responseStart":1247.8000001907349,"secureConnectionStart":339.70000076293945},{"duration":61,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":720.9000005722046,"connectEnd":720.9000005722046,"connectStart":720.9000005722046,"domainLookupEnd":720.9000005722046,"domainLookupStart":720.9000005722046,"fetchStart":720.9000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":720.9000005722046,"responseEnd":781.9000005722046,"responseStart":781.9000005722046,"secureConnectionStart":720.9000005722046}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":109,"responseStart":329,"responseEnd":332,"domLoading":333,"domInteractive":1281,"domContentLoadedEventStart":1281,"domContentLoadedEventEnd":1334,"domComplete":1846,"loadEventStart":1846,"loadEventEnd":1847,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1249.6000003814697},{"name":"bigPipe.sidebar-id.end","time":1250.4000005722046},{"name":"bigPipe.activity-panel-pipe-id.start","time":1250.6000003814697},{"name":"bigPipe.activity-panel-pipe-id.end","time":1253.9000005722046},{"name":"activityTabFullyLoaded","time":1352}],"measures":[],"correlationId":"bb3a38a499ef76","effectiveType":"4g","downlink":9.2,"rtt":0,"serverDuration":148,"dbReadsTimeInMs":23,"dbConnsTimeInMs":34,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
In tabular form
rangelocking=ON rangelocking=OFF rangelocking-orig
1 307.74 307.58 306.23
10 1576.26 1579.74 1565.1
20 1819.3 1838.34 1811.46
40 1640.48 1620.53 1611.57