Details
-
Task
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
Thoughts:
- inserts should work pretty much as before, little or no overhead (unless there are indexes and history size >> current data size)
- updates should have about the same (2x?) overhead in partitioned and non-partitioned case
- selects should have little or no overhead in partitioned case, and notable overhead in non-partitioned case (with history size >> current data size)
- deletes should have little overhead in non-partitioned case, and notable overhead (2x?) in partitioned case
- versioning by transaction id has more overhead as compared with versioning by timestamps
Attachments
- MDEV-15690-4.ods
- 88 kB
- MDEV-15690-5.ods
- 88 kB
Issue Links
- relates to
-
MDEV-15951 system versioning by trx id doesn't work with partitioning
-
- Closed
-
Activity
midenok you are right. The documentation is not clear on that. I created MDEV-16096.
midenok the performance of selecting historic rows by timestamp from a table versioned by trx id, depends wholly on the implementation of the lookup in mysql.transaction_registry (especially the indexes existing there).
However this task was not so much about performance for accessing historic rows, but about accessing current rows. Because the latter should be the much more frequent operation on a versioned table.
I checked some basic access methods, comparing a system-versioned table with an unversioned one. I setup two tables with 100 current and 100 historic rows:
use test; |
drop table if exists t1; |
drop table if exists t2; |
|
create table t1 ( |
id int unsigned, |
PRIMARY KEY (id) |
) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
|
create table t2 ( |
id int unsigned, |
start_timestamp timestamp(6) GENERATED ALWAYS AS ROW START, |
end_timestamp timestamp(6) GENERATED ALWAYS AS ROW END, |
PRIMARY KEY (id), |
PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`) |
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING; |
|
insert into t1 (id) select seq from seq_1_to_100; |
delete from t1; |
insert into t1 (id) select seq+100 from seq_1_to_100; |
|
insert into t2 (id) select seq from seq_1_to_100; |
delete from t2; |
insert into t2 (id) select seq+100 from seq_1_to_100; |
The first thing to notice is, that the primary key is silently changed:
show create table t2\G |
*************************** 1. row ***************************
|
Table: t2 |
Create Table: CREATE TABLE `t2` ( |
`id` int(10) unsigned NOT NULL, |
`start_timestamp` timestamp(6) GENERATED ALWAYS AS ROW START, |
`end_timestamp` timestamp(6) GENERATED ALWAYS AS ROW END, |
PRIMARY KEY (`id`,`end_timestamp`), |
PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`) |
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING |
The different indexes on versioned/unversioned tables will probably lead to different access patterns. Lets EXPLAIN some basic access methods:
point select outside current rows |
explain select * from t1 where id=50 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: NULL |
type: NULL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: NULL |
Extra: Impossible WHERE noticed after reading const tables |
--------------
|
explain select * from t2 where id=50 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: NULL |
type: NULL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: NULL |
Extra: Impossible WHERE noticed after reading const tables |
Good. Table is no touched at all.
range scan outside current rows |
explain select id from t1 where id between 10 and 20 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t1 |
type: range
|
possible_keys: PRIMARY |
key: PRIMARY |
key_len: 4
|
ref: NULL |
rows: 1 |
Extra: Using where; Using index |
--------------
|
explain select id from t2 where id between 10 and 20 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t2 |
type: range
|
possible_keys: PRIMARY |
key: PRIMARY |
key_len: 11
|
ref: NULL |
rows: 10 |
Extra: Using where; Using index |
--------------
|
explain select id from t1 where id between 90 and 120 |
|
I'm wondering why the unversioned table is touched at all. The optimizer should "see" that the range is completely outside the index min/max (just as in the point select above).
point update outside current rows |
explain update t1 set id=50 where id=50 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t1 |
type: range
|
possible_keys: PRIMARY |
key: PRIMARY |
key_len: 4
|
ref: NULL |
rows: 1 |
Extra: Using where |
--------------
|
explain update t2 set id=50 where id=50 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t2 |
type: range
|
possible_keys: PRIMARY |
key: PRIMARY |
key_len: 4
|
ref: NULL |
rows: 1 |
Extra: Using where; Using buffer |
Here the UPDATE on the versioned table uses the buffer algorithm, but I don't think it needs to. The predicate fits a single row at most (actually none).
min/max of unique field in current rows |
explain select min(id), max(id) from t1 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: NULL |
type: NULL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: NULL |
Extra: Select tables optimized away |
--------------
|
explain select min(id), max(id) from t2 |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t2 |
type: index |
possible_keys: NULL |
key: PRIMARY |
key_len: 11
|
ref: NULL |
rows: 200 |
Extra: Using where; Using index |
This is rather a fail for versioned tables. It also takes considerably longer as expected.
It also can accept timestamp, like:
This might be the bug of documentation that this is not mentioned.