[MDEV-15690] benchmark the overhead caused by system versioning - Jira

Details

Type: Task
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: 10.4(EOL)
Component/s: Versioned Tables
Labels:
None

Description

Thoughts:

inserts should work pretty much as before, little or no overhead (unless there are indexes and history size >> current data size)
updates should have about the same (2x?) overhead in partitioned and non-partitioned case
selects should have little or no overhead in partitioned case, and notable overhead in non-partitioned case (with history size >> current data size)
deletes should have little overhead in non-partitioned case, and notable overhead (2x?) in partitioned case
versioning by transaction id has more overhead as compared with versioning by timestamps

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-15690-4.ods
88 kB
2018-04-24 08:53
MDEV-15690-5.ods
88 kB
2018-05-03 11:20

Issue Links

relates to

MDEV-15951 system versioning by trx id doesn't work with partitioning

Closed

Activity

Ascending order - Click to sort in descending order

View 6 older comments

Aleksey Midenkov added a comment - 2018-05-04 12:05 - edited

It also can accept timestamp, like:

SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP NOW(6);

This might be the bug of documentation that this is not mentioned.

Aleksey Midenkov added a comment - 2018-05-04 12:05 - edited It also can accept timestamp, like: SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP NOW(6); This might be the bug of documentation that this is not mentioned.

Axel Schwenke added a comment - 2018-05-06 12:20

midenok you are right. The documentation is not clear on that. I created ~~MDEV-16096~~.

Axel Schwenke added a comment - 2018-05-06 12:20 midenok you are right. The documentation is not clear on that. I created MDEV-16096 .

Axel Schwenke added a comment - 2018-05-07 11:03

midenok the performance of selecting historic rows by timestamp from a table versioned by trx id, depends wholly on the implementation of the lookup in mysql.transaction_registry (especially the indexes existing there).

However this task was not so much about performance for accessing historic rows, but about accessing current rows. Because the latter should be the much more frequent operation on a versioned table.

Axel Schwenke added a comment - 2018-05-07 11:03 midenok the performance of selecting historic rows by timestamp from a table versioned by trx id, depends wholly on the implementation of the lookup in mysql.transaction_registry (especially the indexes existing there). However this task was not so much about performance for accessing historic rows, but about accessing current rows. Because the latter should be the much more frequent operation on a versioned table.

Axel Schwenke added a comment - 2018-05-07 15:25

I checked some basic access methods, comparing a system-versioned table with an unversioned one. I setup two tables with 100 current and 100 historic rows:

use test;

drop table if exists t1;

drop table if exists t2;

create table t1 (

  id int unsigned,

  PRIMARY KEY (id)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

create table t2 (

  id int unsigned,

  start_timestamp timestamp(6) GENERATED ALWAYS AS ROW START,

  end_timestamp timestamp(6) GENERATED ALWAYS AS ROW END,

  PRIMARY KEY (id),

  PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING;

insert into t1 (id) select seq from seq_1_to_100;

delete from t1;

insert into t1 (id) select seq+100 from seq_1_to_100;

insert into t2 (id) select seq from seq_1_to_100;

delete from t2;

insert into t2 (id) select seq+100 from seq_1_to_100;

The first thing to notice is, that the primary key is silently changed:

show create table t2\G

*************************** 1. row ***************************

       Table: t2

Create Table: CREATE TABLE `t2` (

  `id` int(10) unsigned NOT NULL,

  `start_timestamp` timestamp(6) GENERATED ALWAYS AS ROW START,

  `end_timestamp` timestamp(6) GENERATED ALWAYS AS ROW END,

  PRIMARY KEY (`id`,`end_timestamp`),

  PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING

The different indexes on versioned/unversioned tables will probably lead to different access patterns. Lets EXPLAIN some basic access methods:

point select outside current rows
explain select * from t1 where id=50
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: NULL
type: NULL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Impossible WHERE noticed after reading const tables
--------------
explain select * from t2 where id=50
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: NULL
type: NULL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Impossible WHERE noticed after reading const tables

Good. Table is no touched at all.

range scan outside current rows
explain select id from t1 where id between 10 and 20
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: t1
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 1
Extra: Using where; Using index
--------------
explain select id from t2 where id between 10 and 20
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: t2
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 11
ref: NULL
rows: 10
Extra: Using where; Using index
--------------
explain select id from t1 where id between 90 and 120

I'm wondering why the unversioned table is touched at all. The optimizer should "see" that the range is completely outside the index min/max (just as in the point select above).

point update outside current rows
explain update t1 set id=50 where id=50
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: t1
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 1
Extra: Using where
--------------
explain update t2 set id=50 where id=50
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: t2
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 1
Extra: Using where; Using buffer

Here the UPDATE on the versioned table uses the buffer algorithm, but I don't think it needs to. The predicate fits a single row at most (actually none).

min/max of unique field in current rows
explain select min(id), max(id) from t1
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: NULL
type: NULL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Select tables optimized away
--------------
explain select min(id), max(id) from t2
************************* 1. row *************************
id: 1
select_type: SIMPLE
table: t2
type: index
possible_keys: NULL
key: PRIMARY
key_len: 11
ref: NULL
rows: 200
Extra: Using where; Using index

This is rather a fail for versioned tables. It also takes considerably longer as expected.

Axel Schwenke added a comment - 2018-05-07 15:25 I checked some basic access methods, comparing a system-versioned table with an unversioned one. I setup two tables with 100 current and 100 historic rows: use test; drop table if exists t1; drop table if exists t2; create table t1 ( id int unsigned, PRIMARY KEY (id) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; create table t2 ( id int unsigned, start_timestamp timestamp (6) GENERATED ALWAYS AS ROW START, end_timestamp timestamp (6) GENERATED ALWAYS AS ROW END , PRIMARY KEY (id), PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING; insert into t1 (id) select seq from seq_1_to_100; delete from t1; insert into t1 (id) select seq+100 from seq_1_to_100; insert into t2 (id) select seq from seq_1_to_100; delete from t2; insert into t2 (id) select seq+100 from seq_1_to_100; The first thing to notice is, that the primary key is silently changed: show create table t2\G *************************** 1. row *************************** Table : t2 Create Table : CREATE TABLE `t2` ( `id` int (10) unsigned NOT NULL , `start_timestamp` timestamp (6) GENERATED ALWAYS AS ROW START, `end_timestamp` timestamp (6) GENERATED ALWAYS AS ROW END , PRIMARY KEY (`id`,`end_timestamp`), PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING The different indexes on versioned/unversioned tables will probably lead to different access patterns. Lets EXPLAIN some basic access methods: point select outside current rows explain select * from t1 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : NULL type: NULL possible_keys: NULL key : NULL key_len: NULL ref: NULL rows : NULL Extra: Impossible WHERE noticed after reading const tables -------------- explain select * from t2 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : NULL type: NULL possible_keys: NULL key : NULL key_len: NULL ref: NULL rows : NULL Extra: Impossible WHERE noticed after reading const tables Good. Table is no touched at all. range scan outside current rows explain select id from t1 where id between 10 and 20 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t1 type: range possible_keys: PRIMARY key : PRIMARY key_len: 4 ref: NULL rows : 1 Extra: Using where ; Using index -------------- explain select id from t2 where id between 10 and 20 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t2 type: range possible_keys: PRIMARY key : PRIMARY key_len: 11 ref: NULL rows : 10 Extra: Using where ; Using index -------------- explain select id from t1 where id between 90 and 120 I'm wondering why the unversioned table is touched at all. The optimizer should "see" that the range is completely outside the index min/max (just as in the point select above). point update outside current rows explain update t1 set id=50 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t1 type: range possible_keys: PRIMARY key : PRIMARY key_len: 4 ref: NULL rows : 1 Extra: Using where -------------- explain update t2 set id=50 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t2 type: range possible_keys: PRIMARY key : PRIMARY key_len: 4 ref: NULL rows : 1 Extra: Using where ; Using buffer Here the UPDATE on the versioned table uses the buffer algorithm, but I don't think it needs to. The predicate fits a single row at most (actually none). min/max of unique field in current rows explain select min (id), max (id) from t1 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : NULL type: NULL possible_keys: NULL key : NULL key_len: NULL ref: NULL rows : NULL Extra: Select tables optimized away -------------- explain select min (id), max (id) from t2 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t2 type: index possible_keys: NULL key : PRIMARY key_len: 11 ref: NULL rows : 200 Extra: Using where ; Using index This is rather a fail for versioned tables. It also takes considerably longer as expected.

Axel Schwenke added a comment - 2018-05-08 13:17

I'm done

Axel Schwenke added a comment - 2018-05-08 13:17 I'm done

MariaDB Server

benchmark the overhead caused by system versioning

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration