Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15690

benchmark the overhead caused by system versioning

Details

    Description

      Thoughts:

      • inserts should work pretty much as before, little or no overhead (unless there are indexes and history size >> current data size)
      • updates should have about the same (2x?) overhead in partitioned and non-partitioned case
      • selects should have little or no overhead in partitioned case, and notable overhead in non-partitioned case (with history size >> current data size)
      • deletes should have little overhead in non-partitioned case, and notable overhead (2x?) in partitioned case
      • versioning by transaction id has more overhead as compared with versioning by timestamps

      Attachments

        Issue Links

          Activity

            midenok Aleksey Midenkov added a comment - - edited

            It also can accept timestamp, like:

            SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP NOW(6);
            

            This might be the bug of documentation that this is not mentioned.

            midenok Aleksey Midenkov added a comment - - edited It also can accept timestamp, like: SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP NOW(6); This might be the bug of documentation that this is not mentioned.
            axel Axel Schwenke added a comment -

            midenok you are right. The documentation is not clear on that. I created MDEV-16096.

            axel Axel Schwenke added a comment - midenok you are right. The documentation is not clear on that. I created MDEV-16096 .
            axel Axel Schwenke added a comment -

            midenok the performance of selecting historic rows by timestamp from a table versioned by trx id, depends wholly on the implementation of the lookup in mysql.transaction_registry (especially the indexes existing there).

            However this task was not so much about performance for accessing historic rows, but about accessing current rows. Because the latter should be the much more frequent operation on a versioned table.

            axel Axel Schwenke added a comment - midenok the performance of selecting historic rows by timestamp from a table versioned by trx id, depends wholly on the implementation of the lookup in mysql.transaction_registry (especially the indexes existing there). However this task was not so much about performance for accessing historic rows, but about accessing current rows. Because the latter should be the much more frequent operation on a versioned table.
            axel Axel Schwenke added a comment -

            I checked some basic access methods, comparing a system-versioned table with an unversioned one. I setup two tables with 100 current and 100 historic rows:

            use test;
            drop table if exists t1;
            drop table if exists t2;
             
            create table t1 (
              id int unsigned,
              PRIMARY KEY (id)
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
             
            create table t2 (
              id int unsigned,
              start_timestamp timestamp(6) GENERATED ALWAYS AS ROW START,
              end_timestamp timestamp(6) GENERATED ALWAYS AS ROW END,
              PRIMARY KEY (id),
              PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING;
             
            insert into t1 (id) select seq from seq_1_to_100;
            delete from t1;
            insert into t1 (id) select seq+100 from seq_1_to_100;
             
            insert into t2 (id) select seq from seq_1_to_100;
            delete from t2;
            insert into t2 (id) select seq+100 from seq_1_to_100;
            

            The first thing to notice is, that the primary key is silently changed:

            show create table t2\G
            *************************** 1. row ***************************
                   Table: t2
            Create Table: CREATE TABLE `t2` (
              `id` int(10) unsigned NOT NULL,
              `start_timestamp` timestamp(6) GENERATED ALWAYS AS ROW START,
              `end_timestamp` timestamp(6) GENERATED ALWAYS AS ROW END,
              PRIMARY KEY (`id`,`end_timestamp`),
              PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`)
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
            

            The different indexes on versioned/unversioned tables will probably lead to different access patterns. Lets EXPLAIN some basic access methods:

            point select outside current rows

            explain select * from t1 where id=50
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: NULL
                     type: NULL
            possible_keys: NULL
                      key: NULL
                  key_len: NULL
                      ref: NULL
                     rows: NULL
                    Extra: Impossible WHERE noticed after reading const tables
            --------------
            explain select * from t2 where id=50
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: NULL
                     type: NULL
            possible_keys: NULL
                      key: NULL
                  key_len: NULL
                      ref: NULL
                     rows: NULL
                    Extra: Impossible WHERE noticed after reading const tables
            

            Good. Table is no touched at all.

            range scan outside current rows

            explain select id from t1 where id between 10 and 20
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: t1
                     type: range
            possible_keys: PRIMARY
                      key: PRIMARY
                  key_len: 4
                      ref: NULL
                     rows: 1
                    Extra: Using where; Using index
            --------------
            explain select id from t2 where id between 10 and 20
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: t2
                     type: range
            possible_keys: PRIMARY
                      key: PRIMARY
                  key_len: 11
                      ref: NULL
                     rows: 10
                    Extra: Using where; Using index
            --------------
            explain select id from t1 where id between 90 and 120
            
            

            I'm wondering why the unversioned table is touched at all. The optimizer should "see" that the range is completely outside the index min/max (just as in the point select above).

            point update outside current rows

            explain update t1 set id=50 where id=50
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: t1
                     type: range
            possible_keys: PRIMARY
                      key: PRIMARY
                  key_len: 4
                      ref: NULL
                     rows: 1
                    Extra: Using where
            --------------
            explain update t2 set id=50 where id=50
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: t2
                     type: range
            possible_keys: PRIMARY
                      key: PRIMARY
                  key_len: 4
                      ref: NULL
                     rows: 1
                    Extra: Using where; Using buffer
            

            Here the UPDATE on the versioned table uses the buffer algorithm, but I don't think it needs to. The predicate fits a single row at most (actually none).

            min/max of unique field in current rows

            explain select min(id), max(id) from t1
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: NULL
                     type: NULL
            possible_keys: NULL
                      key: NULL
                  key_len: NULL
                      ref: NULL
                     rows: NULL
                    Extra: Select tables optimized away
            --------------
            explain select min(id), max(id) from t2
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: t2
                     type: index
            possible_keys: NULL
                      key: PRIMARY
                  key_len: 11
                      ref: NULL
                     rows: 200
                    Extra: Using where; Using index
            

            This is rather a fail for versioned tables. It also takes considerably longer as expected.

            axel Axel Schwenke added a comment - I checked some basic access methods, comparing a system-versioned table with an unversioned one. I setup two tables with 100 current and 100 historic rows: use test; drop table if exists t1; drop table if exists t2;   create table t1 ( id int unsigned, PRIMARY KEY (id) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;   create table t2 ( id int unsigned, start_timestamp timestamp (6) GENERATED ALWAYS AS ROW START, end_timestamp timestamp (6) GENERATED ALWAYS AS ROW END , PRIMARY KEY (id), PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING;   insert into t1 (id) select seq from seq_1_to_100; delete from t1; insert into t1 (id) select seq+100 from seq_1_to_100;   insert into t2 (id) select seq from seq_1_to_100; delete from t2; insert into t2 (id) select seq+100 from seq_1_to_100; The first thing to notice is, that the primary key is silently changed: show create table t2\G *************************** 1. row *************************** Table : t2 Create Table : CREATE TABLE `t2` ( `id` int (10) unsigned NOT NULL , `start_timestamp` timestamp (6) GENERATED ALWAYS AS ROW START, `end_timestamp` timestamp (6) GENERATED ALWAYS AS ROW END , PRIMARY KEY (`id`,`end_timestamp`), PERIOD FOR SYSTEM_TIME (`start_timestamp`, `end_timestamp`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING The different indexes on versioned/unversioned tables will probably lead to different access patterns. Lets EXPLAIN some basic access methods: point select outside current rows explain select * from t1 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : NULL type: NULL possible_keys: NULL key : NULL key_len: NULL ref: NULL rows : NULL Extra: Impossible WHERE noticed after reading const tables -------------- explain select * from t2 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : NULL type: NULL possible_keys: NULL key : NULL key_len: NULL ref: NULL rows : NULL Extra: Impossible WHERE noticed after reading const tables Good. Table is no touched at all. range scan outside current rows explain select id from t1 where id between 10 and 20 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t1 type: range possible_keys: PRIMARY key : PRIMARY key_len: 4 ref: NULL rows : 1 Extra: Using where ; Using index -------------- explain select id from t2 where id between 10 and 20 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t2 type: range possible_keys: PRIMARY key : PRIMARY key_len: 11 ref: NULL rows : 10 Extra: Using where ; Using index -------------- explain select id from t1 where id between 90 and 120 I'm wondering why the unversioned table is touched at all. The optimizer should "see" that the range is completely outside the index min/max (just as in the point select above). point update outside current rows explain update t1 set id=50 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t1 type: range possible_keys: PRIMARY key : PRIMARY key_len: 4 ref: NULL rows : 1 Extra: Using where -------------- explain update t2 set id=50 where id=50 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t2 type: range possible_keys: PRIMARY key : PRIMARY key_len: 4 ref: NULL rows : 1 Extra: Using where ; Using buffer Here the UPDATE on the versioned table uses the buffer algorithm, but I don't think it needs to. The predicate fits a single row at most (actually none). min/max of unique field in current rows explain select min (id), max (id) from t1 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : NULL type: NULL possible_keys: NULL key : NULL key_len: NULL ref: NULL rows : NULL Extra: Select tables optimized away -------------- explain select min (id), max (id) from t2 *************************** 1. row *************************** id: 1 select_type: SIMPLE table : t2 type: index possible_keys: NULL key : PRIMARY key_len: 11 ref: NULL rows : 200 Extra: Using where ; Using index This is rather a fail for versioned tables. It also takes considerably longer as expected.
            axel Axel Schwenke added a comment -

            I'm done

            axel Axel Schwenke added a comment - I'm done

            People

              serg Sergei Golubchik
              serg Sergei Golubchik
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.