[MDEV-15690] benchmark the overhead caused by system versioning Created: 2018-03-27  Updated: 2023-04-27

Status: Stalled
Project: MariaDB Server
Component/s: Versioned Tables
Fix Version/s: 10.4

Type: Task Priority: Major
Reporter: Sergei Golubchik Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 2
Labels: None

Attachments: File MDEV-15690-4.ods     File MDEV-15690-5.ods    
Issue Links:
Relates
relates to MDEV-15951 system versioning by trx id doesn't w... Closed

 Description   

Thoughts:

  • inserts should work pretty much as before, little or no overhead (unless there are indexes and history size >> current data size)
  • updates should have about the same (2x?) overhead in partitioned and non-partitioned case
  • selects should have little or no overhead in partitioned case, and notable overhead in non-partitioned case (with history size >> current data size)
  • deletes should have little overhead in non-partitioned case, and notable overhead (2x?) in partitioned case
  • versioning by transaction id has more overhead as compared with versioning by timestamps


 Comments   
Comment by Axel Schwenke [ 2018-04-23 ]

General Benchmark Setup

  • MariaDB 10.3.6
  • 32 InnoDB tables on SSD
  • sysbench-mariadb with custom Lua scripts, uniform RNG
  • all workloads except delete done with 32 threads (machine has 16 cores / 32 threads)

Variants

  • plain InnoDB tables (reference)
  • system versioned by timestamp, unpartitioned
  • system versioned by trx id, unpartitioned
  • system versioned by timestamp, partitioned
  • (system versioned by trx id, partitioned) had to be skipped, see MDEV-15951

Benchmark Workload

  1. create empty tables (OLTP layout)
  2. insert 8 mio rows (10 inserts per trx)
  3. OLTP point selects (10 per trx)
  4. OLTP range queries (4 kinds, 8 per trx)
  5. OLTP selects by secondary key (10 per trx)
  6. 4 mio updates of a nonindexed field (10 per trx)
  7. 4 mio updates of an indexed field (10 per trx)
  8. (again) OLTP point selects (10 per trx)
  9. (again) OLTP range queries (4 kinds, 8 per trx)
  10. (again) OLTP selects by secondary key (10 per trx)
  11. delete all (current) rows (a single delete statement)
  12. loop back to 2. for a total of 10 iterations

With system-versioned tables that workload will generate 8+4+4=16 mio historical rows per iteration. The first round of selects will see 8M current rows and an ever growing number of historical rows. The WHERE condition in the selects covers only ids in the current set. For the second round there will be 8M rows that are not current, but would match the WHERE condition otherwise. For the select by secondary the WHERE potentially matches all rows (current and historic).

The whole set was run twice. First time with a huge buffer pool of 64GB. The data set would never be bigger than ~40GB, so with this setting everything would run from memory.
For the second execution I reduced the buffer pool to 8GB. The current rows would still fit into memory (8 mio rows ~= 2GB table space) but the historical data would not. At least not after 3 or 4 iterations.

Comment by Axel Schwenke [ 2018-04-24 ]

Results

Big Bufferpool

See attached file MDEV-15690-4.ods for details. The 10th iteration should be seen as an outlier. Especially for the partitioned table the data set seems to have grown beyond the buffer pool and become IO-bound.

Observations:

  • INSERT performance is ok. There is only a slight regression with growing number of iterations. As expected, the partitioned table does best.
  • point-SELECT performance is okeyish. An unexpected result is the poor performance of the partitioned table. I had expected it to be on par with the unversioned table.
  • range SELECT. Here only the table versioned by trx id looks good. Tables versioned by timestamp deliver very bad performance. As expected, numbers go down for the second round of selects, except for the partitioned table.
  • secondary key SELECT. Again okeyish, except for unpartitioned table versioned by timestamp. This one is especially hindered by the growing amount of historical rows. The second round of select is quite as fast as the first one.
  • UPDATE indexed. Performance drops by 50 percent, compared to the unversioned table. This is probably the best to expect, because in a versioned table an UPDATE touches two rows. The partitioned table behaves a bit worse, probably because the old row has to be moved between partitions.
  • UPDATE non-indexed. This one is even slower than the indexed one. The reason could be, that the non-index update touches a CHAR(120) field, where the indexed one uses an INT field.
  • DELETE performance is again bad. 3 times longer (meaning 66% loss) even without any historic rows is too slow IMHO. I had also expected the partitioned table to take longer, but this wasn't so.
Comment by Axel Schwenke [ 2018-05-03 ]

Results

Small Bufferpool

See attached file MDEV-15690-5.ods for details. The buffer pool was only 8GB while the table space grew to 2.4G, 46G, 47G, 77G (unversioned, by ts, ty trx, partitioned)

Observations:

Pretty much the same as for a small buffer pool. There is however one notable exception: range queries for the table versioned by trx-id were actually faster than on the unversioned table and - what's more - got faster with additional iterations. For UPDATE the unpartitioned versioned tables also got faster after several iterations. I have no idea how to explain this. DELETE Performance was heavily impacted by the IO bound operation. Time to delete current rows grew to over 35 times of that of the unversioned table.

Comment by Axel Schwenke [ 2018-05-03 ]

Note to axel: There is one thing more to investigate: unique index search for rows where there are no hits in current data, but possible hits in historic data seemed slow.

Comment by Aleksey Midenkov [ 2018-05-04 ]

SELECT by timestamp from TRX_ID-based is now very slow. Was it tested?

Comment by Axel Schwenke [ 2018-05-04 ]

midenok can you elaborate? What do you mean with "SELECT by timestamp"? If a table is versioned by TRX_ID, the AS OF clause takes a transaction ID, not a timestamp. See https://mariadb.com/kb/en/library/system-versioned-tables/#transaction-precise-history-in-innodb

Comment by Aleksey Midenkov [ 2018-05-04 ]

It also can accept timestamp, like:

SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP NOW(6);

This might be the bug of documentation that this is not mentioned.

Comment by Axel Schwenke [ 2018-05-06 ]

midenok you are right. The documentation is not clear on that. I created MDEV-16096.

Comment by Axel Schwenke [ 2018-05-07 ]

midenok the performance of selecting historic rows by timestamp from a table versioned by trx id, depends wholly on the implementation of the lookup in mysql.transaction_registry (especially the indexes existing there).

However this task was not so much about performance for accessing historic rows, but about accessing current rows. Because the latter should be the much more frequent operation on a versioned table.

Comment by Axel Schwenke [ 2018-05-07 ]

I checked some basic access methods, comparing a system-versioned table with an unversioned one. I setup two tables with 100 current and 100 historic rows:

use test;
drop table if exists t1;
drop table if exists t2;
 
create table t1 (
  id int unsigned,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
create table t2 (
  id int unsigned,
  start_timestamp timestamp(6) GENERATED ALWAYS AS ROW START,
  end_timestamp timestamp(6) GENERATED ALWAYS AS ROW END,
  PRIMARY KEY (id),
  PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING;
 
insert into t1 (id) select seq from seq_1_to_100;
delete from t1;
insert into t1 (id) select seq+100 from seq_1_to_100;
 
insert into t2 (id) select seq from seq_1_to_100;
delete from t2;
insert into t2 (id) select seq+100 from seq_1_to_100;

The first thing to notice is, that the primary key is silently changed:

show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(10) unsigned NOT NULL,
  `start_timestamp` timestamp(6) GENERATED ALWAYS AS ROW START,
  `end_timestamp` timestamp(6) GENERATED ALWAYS AS ROW END,
  PRIMARY KEY (`id`,`end_timestamp`),
  PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING

The different indexes on versioned/unversioned tables will probably lead to different access patterns. Lets EXPLAIN some basic access methods:

point select outside current rows

explain select * from t1 where id=50
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE noticed after reading const tables
--------------
explain select * from t2 where id=50
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE noticed after reading const tables

Good. Table is no touched at all.

range scan outside current rows

explain select id from t1 where id between 10 and 20
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using where; Using index
--------------
explain select id from t2 where id between 10 and 20
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 11
          ref: NULL
         rows: 10
        Extra: Using where; Using index
--------------
explain select id from t1 where id between 90 and 120

I'm wondering why the unversioned table is touched at all. The optimizer should "see" that the range is completely outside the index min/max (just as in the point select above).

point update outside current rows

explain update t1 set id=50 where id=50
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using where
--------------
explain update t2 set id=50 where id=50
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using where; Using buffer

Here the UPDATE on the versioned table uses the buffer algorithm, but I don't think it needs to. The predicate fits a single row at most (actually none).

min/max of unique field in current rows

explain select min(id), max(id) from t1
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
--------------
explain select min(id), max(id) from t2
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 11
          ref: NULL
         rows: 200
        Extra: Using where; Using index

This is rather a fail for versioned tables. It also takes considerably longer as expected.

Comment by Axel Schwenke [ 2018-05-08 ]

I'm done

Generated at Thu Feb 08 08:23:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.