[MDEV-16329] Engine-independent online ALTER TABLE - Jira

Marko Mäkelä created issue - 2018-05-30 09:12

Marko Mäkelä made changes - 2018-05-30 09:12

Field	Original Value	New Value
Link		This issue blocks MDEV-16291 [ MDEV-16291 ]

Marko Mäkelä made changes - 2018-05-30 09:12

Link

This issue blocks ~~MDEV-11424~~ [ ~~MDEV-11424~~ ]

Julien Fritsch made changes - 2018-07-05 14:24

Epic Link

PT-80 [ 68561 ]

Ralf Gebhardt made changes - 2018-07-24 10:00

Fix Version/s

10.5 [ 23123 ]

Ralf Gebhardt made changes - 2018-08-21 16:38

Priority

Major [ 3 ]

Critical [ 2 ]

Marko Mäkelä made changes - 2018-08-22 11:38

Assignee

Thirunarayanan B [ thiru ]

Marko Mäkelä [ marko ]

Marko Mäkelä made changes - 2018-08-27 02:49

Link

This issue blocks ~~MDEV-11424~~ [ ~~MDEV-11424~~ ]

Marko Mäkelä made changes - 2018-08-27 02:49

Fix Version/s

10.4 [ 22408 ]

Sergei Golubchik made changes - 2018-08-28 12:09

Fix Version/s

10.4 [ 22408 ]

Ralf Gebhardt made changes - 2018-09-13 10:52

Target end

12/Feb/19 [ 2019-02-12 ]

Marko Mäkelä made changes - 2018-09-20 14:18

Priority

Critical [ 2 ]

Major [ 3 ]

Marko Mäkelä made changes - 2018-09-20 14:18

Fix Version/s

10.4 [ 22408 ]

Ralf Gebhardt made changes - 2018-09-27 07:45

Rank

Ranked lower

Ralf Gebhardt made changes - 2018-10-23 15:50

Epic Link

PT-80 [ 68561 ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-515~~ [ ~~MDEV-515~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-11675~~ [ ~~MDEV-11675~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-13795~~ [ ~~MDEV-13795~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-14332~~ [ ~~MDEV-14332~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Component/s		Data Definition - Alter Table [ 10114 ]
Component/s	Storage Engine - InnoDB [ 10129 ]
Assignee	Marko Mäkelä [ marko ]	Alexey Botchkov [ holyfoot ]
Description	If an {{ALTER TABLE}} operation involves a column type change (such as changing {{INT}} to {{INT UNSIGNED}}) InnoDB will fall back to {{ALGORITHM=COPY}}, which prevents any concurrent modification to the table. If we support {{ALGORITHM=INPLACE}} for column type conversions ({{ALTER_STORED_COLUMN_TYPE}}) inside InnoDB, we would automatically support {{LOCK=NONE}} as well. Lifting this restriction (and invoking the column data conversions inside InnoDB) is a prerequisite for fixing MDEV-16291, that is, supporting column type changes without changing the data format). Some column type changes (such as {{INT}} to {{BIGINT}}) could be performed instantly, because they cannot fail. This would be within the scope of ~~MDEV-11424~~.	Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0: # Exclusively lock the table. # Set up ‘row event listeners’ for tracking changes from concurrent DDL. # Downgrade the lock. # Copy the table contents (using a non-locking read if supported by the storage engine). # Apply changes from the ‘row event listeners’. # Exclusively lock the table. # Apply any remaining changes from the ‘row event listeners’. # Swap the old and new table, unlock, drop the old table. This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples: # Arbitrary changes of column type will be possible, without duplicating any conversion logic. # It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~). # The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute. We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression. h1. Challenges We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table. In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events. We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’. Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}. If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table. Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.
Summary	Allow online ALTER TABLE for column type changes	Cross-engine ALTER ONLINE TABLE

Marko Mäkelä made changes - 2018-11-01 06:36

Link

This issue blocks MDEV-16291 [ MDEV-16291 ]

Marko Mäkelä made changes - 2018-11-10 12:51

Attachment

Remove-InnoDB-online-table-rebuild.patch [ 46678 ]

Marko Mäkelä made changes - 2018-11-16 16:18

Link

This issue relates to MDEV-16354 [ MDEV-16354 ]

Marko Mäkelä made changes - 2019-01-17 09:05

Link

This issue relates to MDEV-18127 [ MDEV-18127 ]

Marko Mäkelä made changes - 2019-01-28 11:59

Link

This issue relates to ~~MDEV-15641~~ [ ~~MDEV-15641~~ ]

Marko Mäkelä made changes - 2019-03-07 06:14

Link

This issue relates to MDEV-18845 [ MDEV-18845 ]

Marko Mäkelä made changes - 2019-03-09 12:46

Link

This issue relates to MDEV-12512 [ MDEV-12512 ]

Marko Mäkelä made changes - 2019-03-22 08:49

Link

This issue relates to MDEV-15471 [ MDEV-15471 ]

Marko Mäkelä made changes - 2019-03-22 15:01

Link

This issue relates to MDEV-10453 [ MDEV-10453 ]

Marko Mäkelä made changes - 2019-03-22 15:10

Link

This issue relates to MDEV-9260 [ MDEV-9260 ]

Ralf Gebhardt made changes - 2019-07-11 11:01

Target end

12/Feb/19 [ 2019-02-12 ]

Ralf Gebhardt made changes - 2019-08-08 20:18

Fix Version/s

10.5 [ 23123 ]

Sergei Golubchik made changes - 2019-08-09 12:25

Assignee

Alexey Botchkov [ holyfoot ]

Sergei Golubchik made changes - 2019-08-19 16:51

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Golubchik made changes - 2019-08-20 10:38

Assignee

Nikita Malyavin [ nikitamalyavin ]

Andrei Elkin made changes - 2019-10-20 17:26

Labels

alter online-ddl performance

alter online-ddl performance replication

Ralf Gebhardt made changes - 2020-02-14 16:07

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.5 [ 23123 ]

Nikita Malyavin made changes - 2020-02-18 04:22

Status

Open [ 1 ]

In Progress [ 3 ]

Nikita Malyavin made changes - 2020-02-20 06:08

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Marko Mäkelä made changes - 2020-02-20 07:54

Link

This issue is blocked by MENT-651 [ MENT-651 ]

Ralf Gebhardt made changes - 2020-02-20 08:01

Priority

Critical [ 2 ]

Major [ 3 ]

Marko Mäkelä made changes - 2020-02-24 09:15

Remote Link

This issue links to "Bug #77097 InnoDB Online DDL should support change data type (Web Link)" [ 29417 ]

Marko Mäkelä made changes - 2020-02-24 09:20

Remote Link

This issue links to "Bug #98600 Optimize table fails with duplicate entry on UNIQUE KEY (Web Link)" [ 29418 ]

Sergei Golubchik made changes - 2020-08-18 14:53

Rank

Ranked lower

Nikita Malyavin made changes - 2020-09-18 12:04

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Ralf Gebhardt made changes - 2020-09-18 14:24

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.6 [ 24028 ]

Nikita Malyavin made changes - 2020-09-18 16:29

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Ralf Gebhardt made changes - 2021-08-17 21:05

Fix Version/s		10.7 [ 24805 ]
Fix Version/s	N/A [ 14700 ]

Ralf Gebhardt made changes - 2021-08-17 21:05

Summary

Cross-engine ALTER ONLINE TABLE

ALTER ONLINE TABLE

Ralf Gebhardt made changes - 2021-08-17 21:11

Priority

Major [ 3 ]

Critical [ 2 ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Assignee

Sergei Golubchik [ serg ]

Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Assignee	Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]	Sergei Golubchik [ serg ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Ralf Gebhardt made changes - 2021-08-17 21:15

Link

This issue is blocked by MENT-651 [ MENT-651 ]

Sergei Golubchik made changes - 2021-08-19 14:07

Priority

Critical [ 2 ]

Major [ 3 ]

Ralf Gebhardt made changes - 2021-09-28 13:39

Fix Version/s		10.8 [ 26121 ]
Fix Version/s	10.7 [ 24805 ]

Ralf Gebhardt made changes - 2021-10-22 12:56

Priority

Major [ 3 ]

Critical [ 2 ]

Marko Mäkelä made changes - 2021-11-12 12:56

Link

This issue relates to ~~MDEV-15250~~ [ ~~MDEV-15250~~ ]

Sergei Golubchik made changes - 2021-11-23 13:53

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Nikita Malyavin made changes - 2021-11-24 18:53

Assignee	Nikita Malyavin [ nikitamalyavin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-11-26 08:45

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Rob Schwyzer (Inactive) made changes - 2021-11-26 21:01

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 32618 ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 87561 ]

MariaDB v4 [ 131690 ]

Nikita Malyavin made changes - 2021-12-08 01:34

Assignee	Nikita Malyavin [ nikitamalyavin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-12-10 23:01

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Nikita Malyavin made changes - 2021-12-24 14:13

Assignee	Nikita Malyavin [ nikitamalyavin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-12-25 23:23

Fix Version/s		10.9 [ 26905 ]
Fix Version/s	10.8 [ 26121 ]

Sergei Golubchik made changes - 2022-02-27 15:44

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Jan Lindström (Inactive) made changes - 2022-03-03 05:39

Link

This issue is blocked by ~~MDEV-27986~~ [ ~~MDEV-27986~~ ]

Sergei Golubchik made changes - 2022-03-09 15:54

Link

This issue includes ~~MDEV-27986~~ [ ~~MDEV-27986~~ ]

Sergei Golubchik made changes - 2022-03-09 15:54

Link

This issue is blocked by ~~MDEV-27986~~ [ ~~MDEV-27986~~ ]

Sergei Golubchik made changes - 2022-03-15 19:41

Fix Version/s		10.10 [ 27530 ]
Fix Version/s	10.9 [ 26905 ]

Sergei Golubchik made changes - 2022-05-19 18:21

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Sergei Golubchik made changes - 2022-06-07 13:57

Status

Stalled [ 10000 ]

In Testing [ 10301 ]

Sergei Golubchik made changes - 2022-06-07 13:57

Assignee

Sergei Golubchik [ serg ]

Lena Startseva [ JIRAUSER50478 ]

Roel Van de Paar made changes - 2022-06-08 10:00

Link

This issue causes ~~MDEV-28771~~ [ ~~MDEV-28771~~ ]

Ramesh Sivaraman made changes - 2022-06-08 11:35

Link

This issue relates to ~~MDEV-28774~~ [ ~~MDEV-28774~~ ]

Ramesh Sivaraman made changes - 2022-06-13 07:33

Link

This issue relates to ~~MDEV-28198~~ [ ~~MDEV-28198~~ ]

Ramesh Sivaraman made changes - 2022-06-13 08:48

Link

This issue relates to ~~MDEV-28198~~ [ ~~MDEV-28198~~ ]

Ramesh Sivaraman made changes - 2022-06-13 08:48

Link

This issue relates to ~~MDEV-28816~~ [ ~~MDEV-28816~~ ]

Elena Stepanova made changes - 2022-06-13 23:41

Link

This issue causes ~~MDEV-28825~~ [ ~~MDEV-28825~~ ]

Elena Stepanova made changes - 2022-06-15 12:02

Assignee

Lena Startseva [ JIRAUSER50478 ]

Elena Stepanova [ elenst ]

Elena Stepanova made changes - 2022-06-24 16:54

Link

This issue relates to ~~MDEV-28942~~ [ ~~MDEV-28942~~ ]

Elena Stepanova made changes - 2022-06-24 18:26

Link

This issue relates to ~~MDEV-28943~~ [ ~~MDEV-28943~~ ]

Elena Stepanova made changes - 2022-06-24 22:35

Link

This issue relates to ~~MDEV-28944~~ [ ~~MDEV-28944~~ ]

Elena Stepanova made changes - 2022-06-26 22:07

Link

This issue causes ~~MDEV-28949~~ [ ~~MDEV-28949~~ ]

Nikita Malyavin made changes - 2022-06-27 10:35

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up ‘row event listeners’ for tracking changes from concurrent DDL.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the ‘row event listeners’.
# Exclusively lock the table.
# Apply any remaining changes from the ‘row event listeners’.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2022-06-27 10:40

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-06-27 11:09

Link

This issue relates to ~~MDEV-28959~~ [ ~~MDEV-28959~~ ]

Nikita Malyavin made changes - 2022-06-27 15:49

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-06-27 22:44

Link

This issue relates to ~~MDEV-28966~~ [ ~~MDEV-28966~~ ]

Elena Stepanova made changes - 2022-06-27 23:01

Link

This issue relates to ~~MDEV-28967~~ [ ~~MDEV-28967~~ ]

Sergei Golubchik made changes - 2022-06-29 16:42

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-07-02 11:18

Link

This issue relates to ~~MDEV-29007~~ [ ~~MDEV-29007~~ ]

Elena Stepanova made changes - 2022-07-05 22:03

Link

This issue relates to ~~MDEV-29038~~ [ ~~MDEV-29038~~ ]

Angelique Sklavounos (Inactive) made changes - 2022-07-07 11:56

Link

This issue relates to ~~MDEV-29056~~ [ ~~MDEV-29056~~ ]

Elena Stepanova made changes - 2022-07-08 22:13

Link

This issue relates to ~~MDEV-29067~~ [ ~~MDEV-29067~~ ]

Elena Stepanova made changes - 2022-07-08 22:28

Link

This issue relates to ~~MDEV-29068~~ [ ~~MDEV-29068~~ ]

Elena Stepanova made changes - 2022-07-08 22:36

Link

This issue relates to ~~MDEV-29013~~ [ ~~MDEV-29013~~ ]

Elena Stepanova made changes - 2022-07-08 22:58

Link

This issue relates to ~~MDEV-28930~~ [ ~~MDEV-28930~~ ]

Elena Stepanova made changes - 2022-07-08 23:30

Link

This issue relates to ~~MDEV-29069~~ [ ~~MDEV-29069~~ ]

Sergei Golubchik made changes - 2022-07-25 20:55

Fix Version/s		10.11 [ 27614 ]
Fix Version/s	10.10 [ 27530 ]

Elena Stepanova made changes - 2022-07-29 17:36

Status

In Testing [ 10301 ]

Stalled [ 10000 ]

AirFocus made changes - 2022-08-09 16:11

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to MDEV\-11675, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future MDEV\-515 code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Julien Fritsch made changes - 2022-08-10 08:22

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to MDEV\-11675, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future MDEV\-515 code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-08-16 13:18

Assignee

Elena Stepanova [ elenst ]

Nikita Malyavin [ nikitamalyavin ]

Nikita Malyavin made changes - 2022-08-24 11:01

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Roel Van de Paar made changes - 2022-08-26 10:23

Link

This issue causes ~~MDEV-29393~~ [ ~~MDEV-29393~~ ]

Roel Van de Paar made changes - 2022-08-26 10:45

Link

This issue causes ~~MDEV-29394~~ [ ~~MDEV-29394~~ ]

Roel Van de Paar made changes - 2022-09-10 03:43

Link

This issue causes ~~MDEV-29393~~ [ ~~MDEV-29393~~ ]

Roel Van de Paar made changes - 2022-09-10 03:43

Link

This issue causes ~~MDEV-29394~~ [ ~~MDEV-29394~~ ]

Roel Van de Paar made changes - 2022-09-10 03:45

Link

This issue causes ~~MDEV-29506~~ [ ~~MDEV-29506~~ ]

Sergei Golubchik made changes - 2022-09-28 16:23

Link

This issue causes ~~MDEV-28816~~ [ ~~MDEV-28816~~ ]

Sergei Golubchik made changes - 2022-09-28 16:23

Link

This issue relates to ~~MDEV-28816~~ [ ~~MDEV-28816~~ ]

Sergei Golubchik made changes - 2022-09-28 16:24

Link

This issue causes ~~MDEV-28943~~ [ ~~MDEV-28943~~ ]

Sergei Golubchik made changes - 2022-09-28 16:25

Link

This issue relates to ~~MDEV-28943~~ [ ~~MDEV-28943~~ ]

Sergei Golubchik made changes - 2022-09-28 16:25

Link

This issue causes ~~MDEV-28944~~ [ ~~MDEV-28944~~ ]

Sergei Golubchik made changes - 2022-09-28 16:25

Link

This issue relates to ~~MDEV-28944~~ [ ~~MDEV-28944~~ ]

Sergei Golubchik made changes - 2022-09-28 16:26

Link

This issue causes ~~MDEV-28959~~ [ ~~MDEV-28959~~ ]

Sergei Golubchik made changes - 2022-09-28 16:26

Link

This issue relates to ~~MDEV-28959~~ [ ~~MDEV-28959~~ ]

Sergei Golubchik made changes - 2022-09-28 16:27

Link

This issue causes ~~MDEV-28967~~ [ ~~MDEV-28967~~ ]

Sergei Golubchik made changes - 2022-09-28 16:27

Link

This issue relates to ~~MDEV-28967~~ [ ~~MDEV-28967~~ ]

Sergei Golubchik made changes - 2022-09-28 16:28

Link

This issue causes ~~MDEV-29038~~ [ ~~MDEV-29038~~ ]

Sergei Golubchik made changes - 2022-09-28 16:28

Link

This issue relates to ~~MDEV-29038~~ [ ~~MDEV-29038~~ ]

Sergei Golubchik made changes - 2022-09-28 16:29

Link

This issue causes ~~MDEV-29067~~ [ ~~MDEV-29067~~ ]

Sergei Golubchik made changes - 2022-09-28 16:29

Link

This issue relates to ~~MDEV-29067~~ [ ~~MDEV-29067~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue causes ~~MDEV-29068~~ [ ~~MDEV-29068~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue relates to ~~MDEV-29068~~ [ ~~MDEV-29068~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue causes ~~MDEV-29069~~ [ ~~MDEV-29069~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue relates to ~~MDEV-29069~~ [ ~~MDEV-29069~~ ]

Sergei Golubchik made changes - 2022-09-28 16:32

Link

This issue causes ~~MDEV-28930~~ [ ~~MDEV-28930~~ ]

Sergei Golubchik made changes - 2022-09-28 16:32

Link

This issue relates to ~~MDEV-28930~~ [ ~~MDEV-28930~~ ]

Sergei Golubchik made changes - 2022-09-28 16:33

Link

This issue causes ~~MDEV-29013~~ [ ~~MDEV-29013~~ ]

Sergei Golubchik made changes - 2022-09-28 16:33

Link

This issue relates to ~~MDEV-29013~~ [ ~~MDEV-29013~~ ]

Sergei Golubchik made changes - 2022-09-28 16:34

Link

This issue causes ~~MDEV-29056~~ [ ~~MDEV-29056~~ ]

Sergei Golubchik made changes - 2022-09-28 16:34

Link

This issue relates to ~~MDEV-29056~~ [ ~~MDEV-29056~~ ]

Sergei Golubchik made changes - 2022-10-30 15:21

Fix Version/s		10.12 [ 28320 ]
Fix Version/s	10.11 [ 27614 ]

Ralf Gebhardt made changes - 2022-12-27 14:03

Labels

alter online-ddl performance replication

Preview_removed_10.10 alter online-ddl performance replication

Julien Fritsch made changes - 2022-12-28 12:50

Fix Version/s		11.1 [ 28549 ]
Fix Version/s	11.0 [ 28320 ]

Sergei Golubchik made changes - 2022-12-28 12:53

Fix Version/s		11.0 [ 28320 ]
Fix Version/s	11.1 [ 28549 ]

Sergei Golubchik made changes - 2023-01-19 10:57

Fix Version/s		11.1 [ 28549 ]
Fix Version/s	11.0 [ 28320 ]

Nikita Malyavin made changes - 2023-02-17 12:53

Status

Stalled [ 10000 ]

In Testing [ 10301 ]

Nikita Malyavin made changes - 2023-02-17 12:57

Assignee

Sergei Golubchik [ serg ]

Nikita Malyavin [ nikitamalyavin ]

Nikita Malyavin made changes - 2023-02-20 14:39

Assignee

Nikita Malyavin [ nikitamalyavin ]

Elena Stepanova [ elenst ]

Ralf Gebhardt made changes - 2023-03-21 09:28

Labels

Preview_removed_10.10 alter online-ddl performance replication

Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Ralf Gebhardt made changes - 2023-03-21 09:33

Labels

Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Preview_10.10 Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Ralf Gebhardt made changes - 2023-03-21 09:35

Labels

Preview_10.10 Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Preview_10.10 Preview_11.1 alter online-ddl performance replication

Alice Sherepa made changes - 2023-03-21 11:13

Link

This issue relates to ~~MDEV-30891~~ [ ~~MDEV-30891~~ ]

Elena Stepanova made changes - 2023-03-22 15:50

Link

This issue relates to ~~MDEV-30902~~ [ ~~MDEV-30902~~ ]

Alice Sherepa made changes - 2023-03-24 12:36

Link

This issue causes ~~MDEV-30891~~ [ ~~MDEV-30891~~ ]

Alice Sherepa made changes - 2023-03-24 12:36

Link

This issue relates to ~~MDEV-30891~~ [ ~~MDEV-30891~~ ]

Elena Stepanova made changes - 2023-03-24 12:57

Link

This issue causes ~~MDEV-30902~~ [ ~~MDEV-30902~~ ]

Elena Stepanova made changes - 2023-03-24 12:58

Link

This issue relates to ~~MDEV-30902~~ [ ~~MDEV-30902~~ ]

Elena Stepanova made changes - 2023-03-24 15:23

Link

This issue causes ~~MDEV-30924~~ [ ~~MDEV-30924~~ ]

Elena Stepanova made changes - 2023-03-24 19:53

Link

This issue causes ~~MDEV-30925~~ [ ~~MDEV-30925~~ ]

Angelique Sklavounos (Inactive) made changes - 2023-03-28 12:56

Link

This issue causes ~~MDEV-30945~~ [ ~~MDEV-30945~~ ]

Elena Stepanova made changes - 2023-03-28 21:45

Link

This issue causes ~~MDEV-30949~~ [ ~~MDEV-30949~~ ]

Elena Stepanova made changes - 2023-04-01 13:02

Link

This issue causes ~~MDEV-30983~~ [ ~~MDEV-30983~~ ]

Elena Stepanova made changes - 2023-04-01 15:57

Link

This issue relates to ~~MDEV-30984~~ [ ~~MDEV-30984~~ ]

Nikita Malyavin made changes - 2023-04-01 16:28

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-04-01 16:44

Summary

ALTER ONLINE TABLE

Engine-independent ALTER ONLINE TABLE

Nikita Malyavin made changes - 2023-04-01 16:45

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-04-01 16:49

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Sergei Golubchik made changes - 2023-04-01 19:45

Summary

Engine-independent ALTER ONLINE TABLE

Engine-independent online ALTER TABLE

Sergei Golubchik made changes - 2023-04-01 19:46

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Sergei Golubchik made changes - 2023-04-02 11:09

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-04-02 17:57

Link

This issue causes ~~MDEV-30987~~ [ ~~MDEV-30987~~ ]

Elena Stepanova made changes - 2023-04-10 16:50

Link

This issue causes ~~MDEV-31033~~ [ ~~MDEV-31033~~ ]

Alice Sherepa made changes - 2023-04-12 11:31

Link

This issue causes ~~MDEV-31040~~ [ ~~MDEV-31040~~ ]

Elena Stepanova made changes - 2023-04-12 14:21

Link

This issue causes ~~MDEV-30983~~ [ ~~MDEV-30983~~ ]

Elena Stepanova made changes - 2023-04-12 22:52

Link

This issue causes ~~MDEV-31043~~ [ ~~MDEV-31043~~ ]

Nikita Malyavin made changes - 2023-04-14 15:08

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* Myisam/Aria only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-04-14 15:09

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* Myisam/Aria only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-04-15 19:02

Link

This issue causes ~~MDEV-31058~~ [ ~~MDEV-31058~~ ]

Elena Stepanova made changes - 2023-04-16 11:40

Link

This issue causes ~~MDEV-31059~~ [ ~~MDEV-31059~~ ]

Ralf Gebhardt made changes - 2023-04-19 10:53

Link

This issue causes ~~MDEV-30984~~ [ ~~MDEV-30984~~ ]

Ralf Gebhardt made changes - 2023-04-19 10:53

Link

This issue relates to ~~MDEV-30984~~ [ ~~MDEV-30984~~ ]

Elena Stepanova made changes - 2023-04-25 19:04

Link

This issue causes ~~MDEV-31128~~ [ ~~MDEV-31128~~ ]

Elena Stepanova made changes - 2023-04-26 23:28

Link

This issue causes ~~MDEV-31136~~ [ ~~MDEV-31136~~ ]

Elena Stepanova made changes - 2023-05-03 00:22

Link

This issue causes ~~MDEV-31172~~ [ ~~MDEV-31172~~ ]

Ralf Gebhardt made changes - 2023-05-30 07:01

Link

This issue causes ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Ralf Gebhardt made changes - 2023-05-30 08:18

Fix Version/s		11.2 [ 28603 ]
Fix Version/s	11.1 [ 28549 ]

Nikita Malyavin made changes - 2023-05-31 11:27

Link

This issue relates to ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Nikita Malyavin made changes - 2023-05-31 11:27

Link

This issue causes ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Nikita Malyavin made changes - 2023-06-02 13:32

Link

This issue is blocked by ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Elena Stepanova made changes - 2023-06-27 16:10

Link

This issue causes ~~MDEV-31563~~ [ ~~MDEV-31563~~ ]

Elena Stepanova made changes - 2023-07-02 13:15

Link

This issue causes ~~MDEV-31601~~ [ ~~MDEV-31601~~ ]

Nikita Malyavin made changes - 2023-07-04 15:36

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-07-04 15:43

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-07-04 15:49

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-07-04 19:16

Link

This issue causes MDEV-31624 [ MDEV-31624 ]

Elena Stepanova made changes - 2023-07-05 14:15

Link

This issue causes ~~MDEV-31631~~ [ ~~MDEV-31631~~ ]

Elena Stepanova made changes - 2023-07-07 23:47

Link

This issue causes ~~MDEV-31646~~ [ ~~MDEV-31646~~ ]

Elena Stepanova made changes - 2023-07-13 11:38

Link

This issue causes ~~MDEV-31677~~ [ ~~MDEV-31677~~ ]

Nikita Malyavin made changes - 2023-07-20 17:07

Link

This issue causes ~~MDEV-31755~~ [ ~~MDEV-31755~~ ]

Elena Stepanova made changes - 2023-07-25 19:00

Link

This issue causes ~~MDEV-31775~~ [ ~~MDEV-31775~~ ]

Elena Stepanova made changes - 2023-07-25 22:17

Link

This issue causes ~~MDEV-31776~~ [ ~~MDEV-31776~~ ]

Elena Stepanova made changes - 2023-07-25 23:15

Link

This issue causes ~~MDEV-31777~~ [ ~~MDEV-31777~~ ]

Elena Stepanova made changes - 2023-07-26 22:43

Link

This issue causes ~~MDEV-31781~~ [ ~~MDEV-31781~~ ]

Nikita Malyavin made changes - 2023-07-26 23:30

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-07-27 11:42

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-07-29 22:35

Link

This issue causes ~~MDEV-31799~~ [ ~~MDEV-31799~~ ]

Elena Stepanova made changes - 2023-07-31 11:24

Link

This issue causes ~~MDEV-31804~~ [ ~~MDEV-31804~~ ]

Elena Stepanova made changes - 2023-07-31 14:27

Link

This issue causes ~~MDEV-31799~~ [ ~~MDEV-31799~~ ]

Elena Stepanova made changes - 2023-07-31 22:30

Link

This issue causes ~~MDEV-31812~~ [ ~~MDEV-31812~~ ]

Elena Stepanova made changes - 2023-08-03 16:10

Link

This issue causes ~~MDEV-31838~~ [ ~~MDEV-31838~~ ]

Nikita Malyavin made changes - 2023-08-07 13:02

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable
* Sequences
* Engines S3 and CONNECT

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-08-12 18:12

Link

This issue causes MDEV-31906 [ MDEV-31906 ]

Elena Stepanova made changes - 2023-08-12 22:27

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable
* Sequences
* Engines S3 and CONNECT

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should 'just work' (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable
* Sequences
* Engines S3 and CONNECT

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The 'row event log' {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the 'row event log', so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the 'row event log'.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE...ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-08-12 22:29

Assignee	Elena Stepanova [ elenst ]	Sergei Golubchik [ serg ]
Status	In Testing [ 10301 ]	Stalled [ 10000 ]

Sergei Golubchik made changes - 2023-08-13 09:56

Assignee

Sergei Golubchik [ serg ]

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik made changes - 2023-08-13 10:03

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Sergei Golubchik made changes - 2023-08-15 12:26

Priority

Critical [ 2 ]

Blocker [ 1 ]

Sergei Golubchik made changes - 2023-08-16 09:41

Fix Version/s		11.2.1 [ 29034 ]
Fix Version/s	11.2 [ 28603 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2023-08-16 09:42

Assignee

Sergei Golubchik [ serg ]

Nikita Malyavin [ nikitamalyavin ]

Nikita Malyavin made changes - 2023-08-17 12:59

Link

This issue split to MDEV-31942 [ MDEV-31942 ]

Elena Stepanova made changes - 2023-09-07 17:39

Link

This issue causes ~~MDEV-32126~~ [ ~~MDEV-32126~~ ]

Nikita Malyavin made changes - 2023-10-11 08:48

Link

This issue causes ~~MDEV-32444~~ [ ~~MDEV-32444~~ ]

Nikita Malyavin made changes - 2023-10-17 07:42

Link

This issue relates to TODO-4300 [ TODO-4300 ]

Nikita Malyavin made changes - 2023-10-18 12:27

Link

This issue causes MDEV-32510 [ MDEV-32510 ]

Nikita Malyavin made changes - 2023-11-07 16:03

Link

This issue causes MCOL-5603 [ MCOL-5603 ]

Nikita Malyavin made changes - 2023-12-20 13:15

Link

This issue causes MDEV-33094 [ MDEV-33094 ]

BJ Quinn made changes - 2024-01-11 21:44

Attachment

output.txt [ 72780 ]

Elena Stepanova made changes - 2024-01-30 13:57

Link

This issue causes ~~MDEV-33330~~ [ ~~MDEV-33330~~ ]

Rob Schwyzer (Inactive) made changes - 2024-04-04 22:00

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36750 ]

Rob Schwyzer (Inactive) made changes - 2024-04-05 18:34

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36750 ]

Rob Schwyzer (Inactive) made changes - 2024-04-11 19:02

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36796 ]

Jira Automation (IT) made changes - 2024-07-04 07:39

Zendesk Related Tickets

187921

Elena Stepanova made changes - 2024-08-18 20:27

Link

This issue relates to MDEV-34768 [ MDEV-34768 ]

Elena Stepanova made changes - 2024-11-17 22:12

Link

This issue relates to MDEV-34768 [ MDEV-34768 ]

MariaDB Server

Engine-independent online ALTER TABLE

Details

Description

Intro

Implementation

Behavior of different engines

Limitations

[Old part] Challenges

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Git Integration