[MDEV-16226] TRX_ID-based System Versioning refactoring - Jira

Aleksey Midenkov created issue - 2018-05-21 07:57

Aleksey Midenkov made changes - 2018-05-21 07:58

Field	Original Value	New Value
Description	Current performance of {{SELECT}} by timestamp from TRX_ID-based tables is very low due to linear scan of TRANSACTION_REGISTRY table when doing timestamp -> TRX_ID translation. Utilize join optimizer for querying TRANSACTION_REGISTRY: 1. For each timestamped {{Vers_history_point}} selector forge {{TRANSACTION_REGISTRY}} subquery and add it to {{SELECT}}. h4. Example Query {code:sql} select *, row_start, row_end from i1 for system_time as of timestamp @t1; {code} is transformed into {code:sql} select i1.x as x, i1.row_start as row_start, i1.row_end as row_end from i1 for SYSTEM_TIME all join ( select transaction_id from mysql.transaction_registry where commit_timestamp <= @t1 order by commit_timestamp desc limit 1 ) __trt_0 where trt_trx_sees(i1.row_end, __trt_0.transaction_id) and trt_trx_sees_eq(__trt_0.transaction_id, i1.row_start) {code} 2. subquery must be excluded from wildcard expansion, i.e. {code:sql} select transaction_id from i1 for system_time as of timestamp @t1; {code} must be resolved to {{i1.transaction_id}}.	Current performance of {{SELECT}} by timestamp from TRX_ID-based tables is very low due to linear scan of TRANSACTION_REGISTRY table when doing timestamp -> TRX_ID translation. Utilize join optimizer for querying TRANSACTION_REGISTRY: h3. 1. For each timestamped {{Vers_history_point}} selector forge {{TRANSACTION_REGISTRY}} subquery and add it to {{SELECT}}. h4. Example Query {code:sql} select *, row_start, row_end from i1 for system_time as of timestamp @t1; {code} is transformed into {code:sql} select i1.x as x, i1.row_start as row_start, i1.row_end as row_end from i1 for SYSTEM_TIME all join ( select transaction_id from mysql.transaction_registry where commit_timestamp <= @t1 order by commit_timestamp desc limit 1 ) __trt_0 where trt_trx_sees(i1.row_end, __trt_0.transaction_id) and trt_trx_sees_eq(__trt_0.transaction_id, i1.row_start) {code} h3. 2. subquery must be excluded from wildcard expansion, i.e. {code:sql} select transaction_id from i1 for system_time as of timestamp @t1; {code} must be resolved to {{i1.transaction_id}}. [Detailed analysis and work progress\|https://github.com/tempesta-tech/mariadb/issues/314]

Aleksey Midenkov made changes - 2018-05-21 07:58

Description

Current performance of {{SELECT}} by timestamp from TRX_ID-based tables is very low due to linear scan of TRANSACTION_REGISTRY table when doing timestamp -> TRX_ID translation. Utilize join optimizer for querying TRANSACTION_REGISTRY:

h3. 1. For each timestamped {{Vers_history_point}} selector forge {{TRANSACTION_REGISTRY}} subquery and add it to {{SELECT}}.

h4. Example
Query
{code:sql}
select *, row_start, row_end from i1 for system_time as of timestamp @t1;
{code}
is transformed into
{code:sql}
select i1.x as x,
       i1.row_start as row_start,
       i1.row_end as row_end
from i1
for SYSTEM_TIME all
join (
   select transaction_id
   from mysql.transaction_registry
   where commit_timestamp <= @t1
   order by commit_timestamp desc
   limit 1
) __trt_0
where trt_trx_sees(i1.row_end, __trt_0.transaction_id)
  and trt_trx_sees_eq(__trt_0.transaction_id, i1.row_start)
{code}

h3. 2. subquery must be excluded from wildcard expansion, i.e.
{code:sql}
select transaction_id from i1 for system_time as of timestamp @t1;
{code}
must be resolved to {{i1.transaction_id}}.

[Detailed analysis and work progress|https://github.com/tempesta-tech/mariadb/issues/314]

Current performance of {{SELECT}} by timestamp from TRX_ID-based tables is very low due to linear scan of TRANSACTION_REGISTRY table when doing timestamp -> TRX_ID translation. Utilize join optimizer for querying TRANSACTION_REGISTRY:

h3. 1. For each timestamped {{Vers_history_point}} selector forge {{TRANSACTION_REGISTRY}} subquery and add it to {{SELECT}}.

h4. Example
Query
{code:sql}
select *, row_start, row_end from i1 for system_time as of timestamp @t1;
{code}
is transformed into
{code:sql}
select i1.x as x,
       i1.row_start as row_start,
       i1.row_end as row_end
from i1
for SYSTEM_TIME all
join (
   select transaction_id
   from mysql.transaction_registry
   where commit_timestamp <= @t1
   order by commit_timestamp desc
   limit 1
) __trt_0
where trt_trx_sees(i1.row_end, __trt_0.transaction_id)
  and trt_trx_sees_eq(__trt_0.transaction_id, i1.row_start)
{code}

h3. 2. subquery must be excluded from wildcard expansion, i.e.
{code:sql}
select transaction_id from i1 for system_time as of timestamp @t1;
{code}
must be resolved to {{i1.transaction_id}}.

h2. [Detailed analysis and work progress|https://github.com/tempesta-tech/mariadb/issues/314]

Aleksey Midenkov made changes - 2018-05-21 07:59

Status

Open [ 1 ]

In Progress [ 3 ]

Aleksey Midenkov made changes - 2018-05-23 16:11

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Aleksey Midenkov made changes - 2018-05-23 16:12

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Aleksey Midenkov made changes - 2018-07-25 11:48

Link

This issue relates to ~~MDEV-16825~~ [ ~~MDEV-16825~~ ]

Aleksey Midenkov made changes - 2018-08-21 19:29

Comment

[ 4h ]

Aleksey Midenkov made changes - 2018-08-24 13:00

Link

This issue is blocked by ~~MDEV-16144~~ [ ~~MDEV-16144~~ ]

Aleksey Midenkov made changes - 2018-09-19 12:58

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Sergei Golubchik made changes - 2019-03-29 12:04

Fix Version/s

10.4 [ 22408 ]

Sergei Golubchik made changes - 2019-04-09 08:24

NRE Projects

RM_105_CANDIDATE

Aleksey Midenkov made changes - 2019-07-22 11:58

Priority

Major [ 3 ]

Minor [ 4 ]

Aleksey Midenkov made changes - 2019-12-04 07:34

Labels

trx-versioning

Ralf Gebhardt made changes - 2020-02-20 19:08

Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]

Ralf Gebhardt made changes - 2020-02-20 19:09

Fix Version/s

10.6 [ 24028 ]

Sergei Golubchik made changes - 2020-08-16 19:23

Rank

Ranked lower

Sergei Golubchik made changes - 2020-08-18 09:01

Priority

Minor [ 4 ]

Major [ 3 ]

Aleksey Midenkov made changes - 2020-12-09 16:18

Summary

TRX_ID-based versioned tables performance improvement

TRX_ID-based System Versioning refactoring

Aleksey Midenkov made changes - 2020-12-09 16:36

Description

Current performance of {{SELECT}} by timestamp from TRX_ID-based tables is very low due to linear scan of TRANSACTION_REGISTRY table when doing timestamp -> TRX_ID translation. Utilize join optimizer for querying TRANSACTION_REGISTRY:

h3. 1. For each timestamped {{Vers_history_point}} selector forge {{TRANSACTION_REGISTRY}} subquery and add it to {{SELECT}}.

h4. Example
Query
{code:sql}
select *, row_start, row_end from i1 for system_time as of timestamp @t1;
{code}
is transformed into
{code:sql}
select i1.x as x,
       i1.row_start as row_start,
       i1.row_end as row_end
from i1
for SYSTEM_TIME all
join (
   select transaction_id
   from mysql.transaction_registry
   where commit_timestamp <= @t1
   order by commit_timestamp desc
   limit 1
) __trt_0
where trt_trx_sees(i1.row_end, __trt_0.transaction_id)
  and trt_trx_sees_eq(__trt_0.transaction_id, i1.row_start)
{code}

h3. 2. subquery must be excluded from wildcard expansion, i.e.
{code:sql}
select transaction_id from i1 for system_time as of timestamp @t1;
{code}
must be resolved to {{i1.transaction_id}}.

h2. [Detailed analysis and work progress|https://github.com/tempesta-tech/mariadb/issues/314]

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

Aleksey Midenkov made changes - 2020-12-09 16:37

Attachment

trx_id_versioning_talk.txt [ 55155 ]

Aleksey Midenkov made changes - 2020-12-09 16:38

Fix Version/s		10.7 [ 24805 ]
Fix Version/s	10.6 [ 24028 ]

Aleksey Midenkov made changes - 2020-12-09 16:39

Comment

[ Transaction registry subquery is added at early stage right after query tables are opened. Whether the subquery is added is considered from the following conditions:

1. query table is TRX_ID-based;
2. {{SYSTEM_TIME}} specifier is TRX_ID.

The condition 2. puts the requirement of explicit TRX_ID resolution, because after the subquery is added it is impossible to resolute the expression to timestamp.
]

Aleksey Midenkov made changes - 2020-12-25 07:30

Link

This issue relates to ~~MDEV-16825~~ [ ~~MDEV-16825~~ ]

Aleksey Midenkov made changes - 2020-12-25 07:31

Link

This issue relates to ~~MDEV-23446~~ [ ~~MDEV-23446~~ ]

Aleksey Midenkov made changes - 2020-12-25 07:32

Link

This issue relates to ~~MDEV-17089~~ [ ~~MDEV-17089~~ ]

Aleksey Midenkov made changes - 2020-12-25 07:34

Link

This issue relates to ~~MDEV-22540~~ [ ~~MDEV-22540~~ ]

Aleksey Midenkov made changes - 2021-04-21 10:11

Link

This issue blocks ~~MDEV-20842~~ [ ~~MDEV-20842~~ ]

Aleksey Midenkov made changes - 2021-04-24 05:20

Link

This issue relates to ~~MDEV-20842~~ [ ~~MDEV-20842~~ ]

Aleksey Midenkov made changes - 2021-04-24 05:34

Link

This issue blocks ~~MDEV-20842~~ [ ~~MDEV-20842~~ ]

Aleksey Midenkov made changes - 2021-08-24 15:31

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Aleksey Midenkov made changes - 2021-08-27 08:10

Description

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

Clustered index recoreds contain DB_TRX_ID, DB_ROLL_PTR fields with non-null values in case they was changed by some open transaction. Clustered index is ordered by PK, explicit or implicit. In case there is no explicit PK in table, first non-null UK is then used or if no such UK exists then auto-generated DB_ROW_ID is added to clustered index. So there are 3 variants of ordering: PK, non-null UK, DB_ROW_ID. We cannot search quickly by DB_TRX_ID.

Aleksey Midenkov made changes - 2021-08-27 08:42

Description

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

Clustered index recoreds contain DB_TRX_ID, DB_ROLL_PTR fields with non-null values in case they was changed by some open transaction. Clustered index is ordered by PK, explicit or implicit. In case there is no explicit PK in table, first non-null UK is then used or if no such UK exists then auto-generated DB_ROW_ID is added to clustered index. So there are 3 variants of ordering: PK, non-null UK, DB_ROW_ID. We cannot search quickly by DB_TRX_ID.

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

Clustered index recoreds contain DB_TRX_ID, DB_ROLL_PTR fields with non-null values in case they was changed by some open transaction. Clustered index is ordered by PK, explicit or implicit. In case there is no explicit PK in table, first non-null UK is then used or if no such UK exists then auto-generated DB_ROW_ID is added to clustered index. So there are 3 variants of ordering: explicit PK, non-null UK, DB_ROW_ID. We cannot search quickly by DB_TRX_ID.

Every open transaction has 2 undo logs attached per each changed table. Undo log cannot be attached to more than 1 transaction. One of undo logs is used for non-temporary tables, another one is for temporary ones. Non-temporary undo log must be scanned and corresponding clustered index rows selected by PK (one of 3 variants above).

Aleksey Midenkov made changes - 2021-08-30 20:39

Description

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

Clustered index recoreds contain DB_TRX_ID, DB_ROLL_PTR fields with non-null values in case they was changed by some open transaction. Clustered index is ordered by PK, explicit or implicit. In case there is no explicit PK in table, first non-null UK is then used or if no such UK exists then auto-generated DB_ROW_ID is added to clustered index. So there are 3 variants of ordering: explicit PK, non-null UK, DB_ROW_ID. We cannot search quickly by DB_TRX_ID.

Every open transaction has 2 undo logs attached per each changed table. Undo log cannot be attached to more than 1 transaction. One of undo logs is used for non-temporary tables, another one is for temporary ones. Non-temporary undo log must be scanned and corresponding clustered index rows selected by PK (one of 3 variants above).

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

Clustered index records contain DB_TRX_ID, DB_ROLL_PTR fields with non-null values in case they was changed by some open transaction. Clustered index is ordered by PK, explicit or implicit. In case there is no explicit PK in table, first non-null UK is then used or if no such UK exists then auto-generated DB_ROW_ID is added to clustered index. So there are 3 variants of ordering: explicit PK, non-null UK, DB_ROW_ID. We cannot search quickly by DB_TRX_ID.

Every open transaction has 2 undo logs attached per each changed table. Undo log cannot be attached to more than 1 transaction. One of undo logs is used for non-temporary tables, another one is for temporary ones. Non-temporary undo log must be scanned and corresponding clustered index rows selected by PK (one of 3 variants above).

Aleksey Midenkov made changes - 2021-10-01 10:42

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Ralf Gebhardt made changes - 2021-10-25 12:39

Fix Version/s		10.8 [ 26121 ]
Fix Version/s	10.7 [ 24805 ]

Aleksey Midenkov made changes - 2021-11-02 10:27

Description

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: we store timestamps as usual, but update row_end of changed rows at commit time.

Clustered index records contain DB_TRX_ID, DB_ROLL_PTR fields with non-null values in case they was changed by some open transaction. Clustered index is ordered by PK, explicit or implicit. In case there is no explicit PK in table, first non-null UK is then used or if no such UK exists then auto-generated DB_ROW_ID is added to clustered index. So there are 3 variants of ordering: explicit PK, non-null UK, DB_ROW_ID. We cannot search quickly by DB_TRX_ID.

Every open transaction has 2 undo logs attached per each changed table. Undo log cannot be attached to more than 1 transaction. One of undo logs is used for non-temporary tables, another one is for temporary ones. Non-temporary undo log must be scanned and corresponding clustered index rows selected by PK (one of 3 variants above).

As was discussed in https://mariadb.slack.com/archives/CHTLSLQEP/p1575543442067300 (attached here) transaction_registry table has many drawbacks: performance, hard to backup, not node-portable. The feature of transaction-precise versioning as described in https://mariadb.com/kb/en/library/temporal-data-tables/ actually does not require transaction_registry translation: *we store timestamps as usual*, but update row_end of changed rows at commit time.

Clustered index records contain DB_TRX_ID, DB_ROLL_PTR fields with non-null values in case they was changed by some open transaction. Clustered index is ordered by PK, explicit or implicit. In case there is no explicit PK in table, first non-null UK is then used or if no such UK exists then auto-generated DB_ROW_ID is added to clustered index. So there are 3 variants of ordering: explicit PK, non-null UK, DB_ROW_ID. We cannot search quickly by DB_TRX_ID.

Every open transaction has 2 undo logs attached per each changed table. Undo log cannot be attached to more than 1 transaction. One of undo logs is used for non-temporary tables, another one is for temporary ones. Non-temporary undo log must be scanned and corresponding clustered index rows selected by PK (one of 3 variants above).

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 87354 ]

MariaDB v4 [ 131688 ]

Sergei Golubchik made changes - 2022-02-01 11:32

Fix Version/s		10.9 [ 26905 ]
Fix Version/s	10.8 [ 26121 ]

Sergei Golubchik made changes - 2022-04-06 10:48

Fix Version/s		10.10 [ 27530 ]
Fix Version/s	10.9 [ 26905 ]

Ralf Gebhardt made changes - 2022-04-12 18:56

Fix Version/s

10.10 [ 27530 ]

Ralf Gebhardt made changes - 2022-04-12 18:56

Fix Version/s

10.11 [ 27614 ]

Sergei Golubchik made changes - 2022-08-02 19:57

Fix Version/s		10.12 [ 28320 ]
Fix Version/s	10.11 [ 27614 ]

Ralf Gebhardt made changes - 2023-05-04 18:00

Fix Version/s		11.3 [ 28565 ]
Fix Version/s	11.0 [ 28320 ]

Sergei Golubchik made changes - 2023-09-17 18:02

Fix Version/s		11.4 [ 29301 ]
Fix Version/s	11.3 [ 28565 ]

Aleksey Midenkov made changes - 2023-10-06 21:43

Priority

Major [ 3 ]

Critical [ 2 ]

Aleksey Midenkov made changes - 2023-10-06 21:47

Link

This issue relates to MDEV-19131 [ MDEV-19131 ]

Aleksey Midenkov made changes - 2023-10-06 21:47

Link

This issue relates to MDEV-30035 [ MDEV-30035 ]

Aleksey Midenkov made changes - 2023-10-06 21:48

Link

This issue relates to MDEV-23145 [ MDEV-23145 ]

Aleksey Midenkov made changes - 2023-10-06 21:48

Link

This issue relates to MDEV-29726 [ MDEV-29726 ]

Aleksey Midenkov made changes - 2023-10-06 21:48

Link

This issue relates to MDEV-17404 [ MDEV-17404 ]

Aleksey Midenkov made changes - 2023-10-06 21:49

Link

This issue relates to MDEV-21016 [ MDEV-21016 ]

Aleksey Midenkov made changes - 2023-10-06 21:49

Link

This issue relates to MDEV-23285 [ MDEV-23285 ]

Aleksey Midenkov made changes - 2023-10-06 21:49

Link

This issue relates to MDEV-30701 [ MDEV-30701 ]

Aleksey Midenkov made changes - 2023-10-06 21:49

Link

This issue relates to MDEV-27040 [ MDEV-27040 ]