General Benchmark Setup
- MariaDB 10.3.6
- 32 InnoDB tables on SSD
- sysbench-mariadb with custom Lua scripts, uniform RNG
- all workloads except delete done with 32 threads (machine has 16 cores / 32 threads)
Variants
- plain InnoDB tables (reference)
- system versioned by timestamp, unpartitioned
- system versioned by trx id, unpartitioned
- system versioned by timestamp, partitioned
- (system versioned by trx id, partitioned) had to be skipped, see
MDEV-15951
Benchmark Workload
- create empty tables (OLTP layout)
- insert 8 mio rows (10 inserts per trx)
- OLTP point selects (10 per trx)
- OLTP range queries (4 kinds, 8 per trx)
- OLTP selects by secondary key (10 per trx)
- 4 mio updates of a nonindexed field (10 per trx)
- 4 mio updates of an indexed field (10 per trx)
- (again) OLTP point selects (10 per trx)
- (again) OLTP range queries (4 kinds, 8 per trx)
- (again) OLTP selects by secondary key (10 per trx)
- delete all (current) rows (a single delete statement)
- loop back to 2. for a total of 10 iterations
With system-versioned tables that workload will generate 8+4+4=16 mio historical rows per iteration. The first round of selects will see 8M current rows and an ever growing number of historical rows. The WHERE condition in the selects covers only ids in the current set. For the second round there will be 8M rows that are not current, but would match the WHERE condition otherwise. For the select by secondary the WHERE potentially matches all rows (current and historic).
The whole set was run twice. First time with a huge buffer pool of 64GB. The data set would never be bigger than ~40GB, so with this setting everything would run from memory.
For the second execution I reduced the buffer pool to 8GB. The current rows would still fit into memory (8 mio rows ~= 2GB table space) but the historical data would not. At least not after 3 or 4 iterations.
|
Results
Big Bufferpool
See attached file MDEV-15690-4.ods for details. The 10th iteration should be seen as an outlier. Especially for the partitioned table the data set seems to have grown beyond the buffer pool and become IO-bound.
Observations:
- INSERT performance is ok. There is only a slight regression with growing number of iterations. As expected, the partitioned table does best.
- point-SELECT performance is okeyish. An unexpected result is the poor performance of the partitioned table. I had expected it to be on par with the unversioned table.
- range SELECT. Here only the table versioned by trx id looks good. Tables versioned by timestamp deliver very bad performance. As expected, numbers go down for the second round of selects, except for the partitioned table.
- secondary key SELECT. Again okeyish, except for unpartitioned table versioned by timestamp. This one is especially hindered by the growing amount of historical rows. The second round of select is quite as fast as the first one.
- UPDATE indexed. Performance drops by 50 percent, compared to the unversioned table. This is probably the best to expect, because in a versioned table an UPDATE touches two rows. The partitioned table behaves a bit worse, probably because the old row has to be moved between partitions.
- UPDATE non-indexed. This one is even slower than the indexed one. The reason could be, that the non-index update touches a CHAR(120) field, where the indexed one uses an INT field.
- DELETE performance is again bad. 3 times longer (meaning 66% loss) even without any historic rows is too slow IMHO. I had also expected the partitioned table to take longer, but this wasn't so.
|
Results
Small Bufferpool
See attached file MDEV-15690-5.ods for details. The buffer pool was only 8GB while the table space grew to 2.4G, 46G, 47G, 77G (unversioned, by ts, ty trx, partitioned)
Observations:
Pretty much the same as for a small buffer pool. There is however one notable exception: range queries for the table versioned by trx-id were actually faster than on the unversioned table and - what's more - got faster with additional iterations. For UPDATE the unpartitioned versioned tables also got faster after several iterations. I have no idea how to explain this. DELETE Performance was heavily impacted by the IO bound operation. Time to delete current rows grew to over 35 times of that of the unversioned table.
|
|
Note to axel: There is one thing more to investigate: unique index search for rows where there are no hits in current data, but possible hits in historic data seemed slow.
|
|
SELECT by timestamp from TRX_ID-based is now very slow. Was it tested?
|
|
midenok can you elaborate? What do you mean with "SELECT by timestamp"? If a table is versioned by TRX_ID, the AS OF clause takes a transaction ID, not a timestamp. See https://mariadb.com/kb/en/library/system-versioned-tables/#transaction-precise-history-in-innodb
|
|
It also can accept timestamp, like:
SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP NOW(6);
|
This might be the bug of documentation that this is not mentioned.
|
|
midenok you are right. The documentation is not clear on that. I created MDEV-16096.
|
|
midenok the performance of selecting historic rows by timestamp from a table versioned by trx id, depends wholly on the implementation of the lookup in mysql.transaction_registry (especially the indexes existing there).
However this task was not so much about performance for accessing historic rows, but about accessing current rows. Because the latter should be the much more frequent operation on a versioned table.
|
|
I checked some basic access methods, comparing a system-versioned table with an unversioned one. I setup two tables with 100 current and 100 historic rows:
use test;
|
drop table if exists t1;
|
drop table if exists t2;
|
|
create table t1 (
|
id int unsigned,
|
PRIMARY KEY (id)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
|
|
create table t2 (
|
id int unsigned,
|
start_timestamp timestamp(6) GENERATED ALWAYS AS ROW START,
|
end_timestamp timestamp(6) GENERATED ALWAYS AS ROW END,
|
PRIMARY KEY (id),
|
PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING;
|
|
insert into t1 (id) select seq from seq_1_to_100;
|
delete from t1;
|
insert into t1 (id) select seq+100 from seq_1_to_100;
|
|
insert into t2 (id) select seq from seq_1_to_100;
|
delete from t2;
|
insert into t2 (id) select seq+100 from seq_1_to_100;
|
The first thing to notice is, that the primary key is silently changed:
show create table t2\G
|
*************************** 1. row ***************************
|
Table: t2
|
Create Table: CREATE TABLE `t2` (
|
`id` int(10) unsigned NOT NULL,
|
`start_timestamp` timestamp(6) GENERATED ALWAYS AS ROW START,
|
`end_timestamp` timestamp(6) GENERATED ALWAYS AS ROW END,
|
PRIMARY KEY (`id`,`end_timestamp`),
|
PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
|
The different indexes on versioned/unversioned tables will probably lead to different access patterns. Lets EXPLAIN some basic access methods:
|
point select outside current rows
|
explain select * from t1 where id=50
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: NULL
|
type: NULL
|
possible_keys: NULL
|
key: NULL
|
key_len: NULL
|
ref: NULL
|
rows: NULL
|
Extra: Impossible WHERE noticed after reading const tables
|
--------------
|
explain select * from t2 where id=50
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: NULL
|
type: NULL
|
possible_keys: NULL
|
key: NULL
|
key_len: NULL
|
ref: NULL
|
rows: NULL
|
Extra: Impossible WHERE noticed after reading const tables
|
Good. Table is no touched at all.
|
range scan outside current rows
|
explain select id from t1 where id between 10 and 20
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t1
|
type: range
|
possible_keys: PRIMARY
|
key: PRIMARY
|
key_len: 4
|
ref: NULL
|
rows: 1
|
Extra: Using where; Using index
|
--------------
|
explain select id from t2 where id between 10 and 20
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t2
|
type: range
|
possible_keys: PRIMARY
|
key: PRIMARY
|
key_len: 11
|
ref: NULL
|
rows: 10
|
Extra: Using where; Using index
|
--------------
|
explain select id from t1 where id between 90 and 120
|
|
I'm wondering why the unversioned table is touched at all. The optimizer should "see" that the range is completely outside the index min/max (just as in the point select above).
|
point update outside current rows
|
explain update t1 set id=50 where id=50
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t1
|
type: range
|
possible_keys: PRIMARY
|
key: PRIMARY
|
key_len: 4
|
ref: NULL
|
rows: 1
|
Extra: Using where
|
--------------
|
explain update t2 set id=50 where id=50
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t2
|
type: range
|
possible_keys: PRIMARY
|
key: PRIMARY
|
key_len: 4
|
ref: NULL
|
rows: 1
|
Extra: Using where; Using buffer
|
Here the UPDATE on the versioned table uses the buffer algorithm, but I don't think it needs to. The predicate fits a single row at most (actually none).
|
min/max of unique field in current rows
|
explain select min(id), max(id) from t1
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: NULL
|
type: NULL
|
possible_keys: NULL
|
key: NULL
|
key_len: NULL
|
ref: NULL
|
rows: NULL
|
Extra: Select tables optimized away
|
--------------
|
explain select min(id), max(id) from t2
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t2
|
type: index
|
possible_keys: NULL
|
key: PRIMARY
|
key_len: 11
|
ref: NULL
|
rows: 200
|
Extra: Using where; Using index
|
This is rather a fail for versioned tables. It also takes considerably longer as expected.
|
|
I'm done 
|