[MDEV-16329] Engine-independent online ALTER TABLE - Jira

Marko Mäkelä created issue - 2018-05-30 09:12

Marko Mäkelä made changes - 2018-05-30 09:12

Field	Original Value	New Value
Link		This issue blocks MDEV-16291 [ MDEV-16291 ]

Marko Mäkelä made changes - 2018-05-30 09:12

Link

This issue blocks ~~MDEV-11424~~ [ ~~MDEV-11424~~ ]

Julien Fritsch made changes - 2018-07-05 14:24

Epic Link

PT-80 [ 68561 ]

Ralf Gebhardt made changes - 2018-07-24 10:00

Fix Version/s

10.5 [ 23123 ]

Ralf Gebhardt made changes - 2018-08-21 16:38

Priority

Major [ 3 ]

Critical [ 2 ]

Marko Mäkelä made changes - 2018-08-22 11:38

Assignee

Thirunarayanan B [ thiru ]

Marko Mäkelä [ marko ]

Marko Mäkelä added a comment - 2018-08-27 02:49

In ALGORITHM=COPY, column type conversions are implemented in Copy_field::do_copy(), which is called by copy_data_between_tables(). This makes use of a function pointer, pointing to a conversion function, such as Field_long::store() or do_copy_not_null(). These conversion functions require that the data be available in Field::ptr.

InnoDB stores data in a different format internally. Integers are stored in big-endian format, and the sign bit is inverted, so that data can be compared with memcmp(). In order to use Copy_field, the ALGORITHM=INPLACE code in InnoDB would have to convert both the source data and the copied data. It seems that we would have to refactor Copy_field and Field::get_copy_func() so that the copied data would be in the storage engine format.

Instant (failure-free) type conversions can be implemented in ~~MDEV-11424~~ as a special case, without depending on this task.

Marko Mäkelä added a comment - 2018-08-27 02:49 In ALGORITHM=COPY , column type conversions are implemented in Copy_field::do_copy() , which is called by copy_data_between_tables() . This makes use of a function pointer, pointing to a conversion function, such as Field_long::store() or do_copy_not_null() . These conversion functions require that the data be available in Field::ptr . InnoDB stores data in a different format internally. Integers are stored in big-endian format, and the sign bit is inverted, so that data can be compared with memcmp() . In order to use Copy_field , the ALGORITHM=INPLACE code in InnoDB would have to convert both the source data and the copied data. It seems that we would have to refactor Copy_field and Field::get_copy_func() so that the copied data would be in the storage engine format. Instant (failure-free) type conversions can be implemented in MDEV-11424 as a special case, without depending on this task.

Marko Mäkelä made changes - 2018-08-27 02:49

Link

This issue blocks ~~MDEV-11424~~ [ ~~MDEV-11424~~ ]

Marko Mäkelä made changes - 2018-08-27 02:49

Fix Version/s

10.4 [ 22408 ]

Sergei Golubchik made changes - 2018-08-28 12:09

Fix Version/s

10.4 [ 22408 ]

Ralf Gebhardt made changes - 2018-09-13 10:52

Target end

12/Feb/19 [ 2019-02-12 ]

Marko Mäkelä made changes - 2018-09-20 14:18

Priority

Critical [ 2 ]

Major [ 3 ]

Marko Mäkelä made changes - 2018-09-20 14:18

Fix Version/s

10.4 [ 22408 ]

Ralf Gebhardt made changes - 2018-09-27 07:45

Rank

Ranked lower

Ralf Gebhardt made changes - 2018-10-23 15:50

Epic Link

PT-80 [ 68561 ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-515~~ [ ~~MDEV-515~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-11675~~ [ ~~MDEV-11675~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-13795~~ [ ~~MDEV-13795~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Link

This issue relates to ~~MDEV-14332~~ [ ~~MDEV-14332~~ ]

Marko Mäkelä made changes - 2018-11-01 06:32

Component/s		Data Definition - Alter Table [ 10114 ]
Component/s	Storage Engine - InnoDB [ 10129 ]
Assignee	Marko Mäkelä [ marko ]	Alexey Botchkov [ holyfoot ]
Description	If an {{ALTER TABLE}} operation involves a column type change (such as changing {{INT}} to {{INT UNSIGNED}}) InnoDB will fall back to {{ALGORITHM=COPY}}, which prevents any concurrent modification to the table. If we support {{ALGORITHM=INPLACE}} for column type conversions ({{ALTER_STORED_COLUMN_TYPE}}) inside InnoDB, we would automatically support {{LOCK=NONE}} as well. Lifting this restriction (and invoking the column data conversions inside InnoDB) is a prerequisite for fixing MDEV-16291, that is, supporting column type changes without changing the data format). Some column type changes (such as {{INT}} to {{BIGINT}}) could be performed instantly, because they cannot fail. This would be within the scope of ~~MDEV-11424~~.	Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0: # Exclusively lock the table. # Set up ‘row event listeners’ for tracking changes from concurrent DDL. # Downgrade the lock. # Copy the table contents (using a non-locking read if supported by the storage engine). # Apply changes from the ‘row event listeners’. # Exclusively lock the table. # Apply any remaining changes from the ‘row event listeners’. # Swap the old and new table, unlock, drop the old table. This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples: # Arbitrary changes of column type will be possible, without duplicating any conversion logic. # It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~). # The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute. We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression. h1. Challenges We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table. In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events. We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’. Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}. If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table. Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.
Summary	Allow online ALTER TABLE for column type changes	Cross-engine ALTER ONLINE TABLE

Marko Mäkelä made changes - 2018-11-01 06:36

Link

This issue blocks MDEV-16291 [ MDEV-16291 ]

Marko Mäkelä made changes - 2018-11-10 12:51

Attachment

Remove-InnoDB-online-table-rebuild.patch [ 46678 ]

Marko Mäkelä added a comment - 2018-11-10 12:59 - edited

Remove-InnoDB-online-table-rebuild.patch is a patch against mariadb-10.4.0 that removes most of the InnoDB code related to online table rebuild. It compiles and links, but I did not test it. More changes would be needed in the file handler0alter.cc. The files row0uins.cc and row0umod.cc implement rollback. In InnoDB, the online table rebuild code would log and possibly apply any DML to the copy of the table before COMMIT, and therefore it must log the affected row operations for ROLLBACK as well.

The cross-engine online table rebuild might be easiest to implement by deferring log apply to the COMMIT of each DML transaction. In that way, there is no issue with ROLLBACK. But then, in order to fix ~~MDEV-11675~~ (or not to reintroduce it) the log should not be replicated, and instead some logic should be added to replication slaves to generate and apply ‘row event log’ locally for the being-rebuilt table.

Inside InnoDB, online ADD INDEX would remain supported with ALGORITHM=INPLACE. In many cases, it is a lighter operation than a full table rebuild. Other supported operations with ALGORITHM=INPLACE would be DROP INDEX and any ALGORITHM=INSTANT operations.

Marko Mäkelä added a comment - 2018-11-10 12:59 - edited Remove-InnoDB-online-table-rebuild.patch is a patch against mariadb-10.4.0 that removes most of the InnoDB code related to online table rebuild. It compiles and links, but I did not test it. More changes would be needed in the file handler0alter.cc. The files row0uins.cc and row0umod.cc implement rollback. In InnoDB, the online table rebuild code would log and possibly apply any DML to the copy of the table before COMMIT , and therefore it must log the affected row operations for ROLLBACK as well. The cross-engine online table rebuild might be easiest to implement by deferring log apply to the COMMIT of each DML transaction. In that way, there is no issue with ROLLBACK . But then, in order to fix MDEV-11675 (or not to reintroduce it) the log should not be replicated, and instead some logic should be added to replication slaves to generate and apply ‘row event log’ locally for the being-rebuilt table. Inside InnoDB, online ADD INDEX would remain supported with ALGORITHM=INPLACE . In many cases, it is a lighter operation than a full table rebuild. Other supported operations with ALGORITHM=INPLACE would be DROP INDEX and any ALGORITHM=INSTANT operations.

Marko Mäkelä made changes - 2018-11-16 16:18

Link

This issue relates to MDEV-16354 [ MDEV-16354 ]

Marko Mäkelä made changes - 2019-01-17 09:05

Link

This issue relates to MDEV-18127 [ MDEV-18127 ]

Marko Mäkelä added a comment - 2019-01-17 09:08

MDEV-18127 gives a strong reason why it would be better to defer the logging to the COMMIT time of DML transactions. In the current InnoDB online table rebuild which logs every row operation immediately, if any DML transaction was aborted due to duplicate key error, then also the online ALTER TABLE could be aborted when applying the log.

Marko Mäkelä added a comment - 2019-01-17 09:08 MDEV-18127 gives a strong reason why it would be better to defer the logging to the COMMIT time of DML transactions. In the current InnoDB online table rebuild which logs every row operation immediately, if any DML transaction was aborted due to duplicate key error, then also the online ALTER TABLE could be aborted when applying the log.

Marko Mäkelä made changes - 2019-01-28 11:59

Link

This issue relates to ~~MDEV-15641~~ [ ~~MDEV-15641~~ ]

Marko Mäkelä added a comment - 2019-03-05 07:06

I believe that some operations will remain impossible to do online (while allowing concurrent modifications). Here are a few examples:

Adding an AUTO_INCREMENT column to a table. (Which values to assign for concurrent DML? This would very likely be nondeterministic and very challenging in a replication environment.)
Dropping the PRIMARY KEY of a table without adding one. (This might be doable, but applying the log of concurrent changes could be very slow.)
ALTER IGNORE TABLE with ADD UNIQUE INDEX or ADD PRIMARY KEY would produce nondeterministic results if concurrent changes were allowed. Hence, we’d better lock the table. Other operations within ALTER IGNORE TABLE (such as replacing NULL values when adding NOT NULL) should be fine.
ALTER IGNORE TABLE with lossy data conversions on columns that are part of a UNIQUE KEY or PRIMARY KEY could lead to nondeterministic results with concurrent modifications.

Marko Mäkelä added a comment - 2019-03-05 07:06 I believe that some operations will remain impossible to do online (while allowing concurrent modifications). Here are a few examples: Adding an AUTO_INCREMENT column to a table. (Which values to assign for concurrent DML? This would very likely be nondeterministic and very challenging in a replication environment.) Dropping the PRIMARY KEY of a table without adding one. (This might be doable, but applying the log of concurrent changes could be very slow.) ALTER IGNORE TABLE with ADD UNIQUE INDEX or ADD PRIMARY KEY would produce nondeterministic results if concurrent changes were allowed. Hence, we’d better lock the table. Other operations within ALTER IGNORE TABLE (such as replacing NULL values when adding NOT NULL ) should be fine. ALTER IGNORE TABLE with lossy data conversions on columns that are part of a UNIQUE KEY or PRIMARY KEY could lead to nondeterministic results with concurrent modifications.

Marko Mäkelä made changes - 2019-03-07 06:14

Link

This issue relates to MDEV-18845 [ MDEV-18845 ]

Marko Mäkelä added a comment - 2019-03-09 12:46

We should extend the progress reporting for ALTER TABLE. Between copying the data and applying the log, we should invoke thd_progress_next_stage(), and then keep invoking thd_progress_report() also when applying the log of changes.

Marko Mäkelä added a comment - 2019-03-09 12:46 We should extend the progress reporting for ALTER TABLE . Between copying the data and applying the log, we should invoke thd_progress_next_stage() , and then keep invoking thd_progress_report() also when applying the log of changes.

Marko Mäkelä made changes - 2019-03-09 12:46

Link

This issue relates to MDEV-12512 [ MDEV-12512 ]

Marko Mäkelä made changes - 2019-03-22 08:49

Link

This issue relates to MDEV-15471 [ MDEV-15471 ]

Marko Mäkelä made changes - 2019-03-22 15:01

Link

This issue relates to MDEV-10453 [ MDEV-10453 ]

Marko Mäkelä made changes - 2019-03-22 15:10

Link

This issue relates to MDEV-9260 [ MDEV-9260 ]

Marko Mäkelä added a comment - 2019-04-15 11:10

thiru pointed out that it might not be a good idea to try to apply Remove-InnoDB-online-table-rebuild.patch after all. A native table-rebuilding ALTER TABLE inside InnoDB could be faster because it would avoid conversions between InnoDB and TABLE::record formats. For some operations, such as character set conversions or adding stored generated columns, I believe that this is better to pay the overhead of these conversions than to try to duplicate logic inside InnoDB.

Marko Mäkelä added a comment - 2019-04-15 11:10 thiru pointed out that it might not be a good idea to try to apply Remove-InnoDB-online-table-rebuild.patch after all. A native table-rebuilding ALTER TABLE inside InnoDB could be faster because it would avoid conversions between InnoDB and TABLE::record formats. For some operations, such as character set conversions or adding stored generated columns, I believe that this is better to pay the overhead of these conversions than to try to duplicate logic inside InnoDB.

Ralf Gebhardt made changes - 2019-07-11 11:01

Target end

12/Feb/19 [ 2019-02-12 ]

Ralf Gebhardt made changes - 2019-08-08 20:18

Fix Version/s

10.5 [ 23123 ]

Sergei Golubchik made changes - 2019-08-09 12:25

Assignee

Alexey Botchkov [ holyfoot ]

Sergei Golubchik made changes - 2019-08-19 16:51

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Golubchik made changes - 2019-08-20 10:38

Assignee

Nikita Malyavin [ nikitamalyavin ]

Andrei Elkin made changes - 2019-10-20 17:26

Labels

alter online-ddl performance

alter online-ddl performance replication

Ralf Gebhardt made changes - 2020-02-14 16:07

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.5 [ 23123 ]

Nikita Malyavin made changes - 2020-02-18 04:22

Status

Open [ 1 ]

In Progress [ 3 ]

Nikita Malyavin made changes - 2020-02-20 06:08

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Marko Mäkelä made changes - 2020-02-20 07:54

Link

This issue is blocked by MENT-651 [ MENT-651 ]

Ralf Gebhardt made changes - 2020-02-20 08:01

Priority

Critical [ 2 ]

Major [ 3 ]

Marko Mäkelä made changes - 2020-02-24 09:15

Remote Link

This issue links to "Bug #77097 InnoDB Online DDL should support change data type (Web Link)" [ 29417 ]

Marko Mäkelä added a comment - 2020-02-24 09:15

I originally filed this bug to address MySQL Bug #77097 by making InnoDB online table rebuild support data type changes. Later, after evaluating the changes needed, it seemed to be more useful to support engine-independent online table rebuild.

Marko Mäkelä added a comment - 2020-02-24 09:15 I originally filed this bug to address MySQL Bug #77097 by making InnoDB online table rebuild support data type changes. Later, after evaluating the changes needed, it seemed to be more useful to support engine-independent online table rebuild.

Marko Mäkelä added a comment - 2020-02-24 09:20

In MySQL Bug #98600, a user complains about the ‘fake duplicate’ problem, which occurs because InnoDB table rebuild is writing online_log before a row operation has been successfully applied to all indexes. For rolling back the operation, another online_log record would be written, but the intermittent duplicate key error will make the table rebuild fail. The problem exists since MySQL 5.6, and it should also affect the non-rebuilding creation of UNIQUE INDEX.

An elegant way to prevent such ‘fake duplicates’ is to buffer row events and write them only after the row operation has been successfully performed, or the entire transaction has been committed.

Marko Mäkelä added a comment - 2020-02-24 09:20 In MySQL Bug #98600 , a user complains about the ‘fake duplicate’ problem, which occurs because InnoDB table rebuild is writing online_log before a row operation has been successfully applied to all indexes. For rolling back the operation, another online_log record would be written, but the intermittent duplicate key error will make the table rebuild fail. The problem exists since MySQL 5.6, and it should also affect the non-rebuilding creation of UNIQUE INDEX . An elegant way to prevent such ‘fake duplicates’ is to buffer row events and write them only after the row operation has been successfully performed, or the entire transaction has been committed.

Marko Mäkelä made changes - 2020-02-24 09:20

Remote Link

This issue links to "Bug #98600 Optimize table fails with duplicate entry on UNIQUE KEY (Web Link)" [ 29418 ]

Nikita Malyavin added a comment - 2020-03-03 10:07

marko, that's strange, because in case of transactional engines data is anyway not immediately written to binlog during row operation. Instead, it is written to a thread-local transaction cache, which is flushed to the binlog during commti() call of binlog_hton

Nikita Malyavin added a comment - 2020-03-03 10:07 marko , that's strange, because in case of transactional engines data is anyway not immediately written to binlog during row operation. Instead, it is written to a thread-local transaction cache, which is flushed to the binlog during commti() call of binlog_hton

Sergei Golubchik made changes - 2020-08-18 14:53

Rank

Ranked lower

Nikita Malyavin made changes - 2020-09-18 12:04

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Ralf Gebhardt made changes - 2020-09-18 14:24

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.6 [ 24028 ]

Nikita Malyavin made changes - 2020-09-18 16:29

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Ralf Gebhardt made changes - 2021-08-17 21:05

Fix Version/s		10.7 [ 24805 ]
Fix Version/s	N/A [ 14700 ]

Ralf Gebhardt made changes - 2021-08-17 21:05

Summary

Cross-engine ALTER ONLINE TABLE

ALTER ONLINE TABLE

Ralf Gebhardt made changes - 2021-08-17 21:11

Priority

Major [ 3 ]

Critical [ 2 ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Assignee

Sergei Golubchik [ serg ]

Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]

Ralf Gebhardt made changes - 2021-08-17 21:12

Assignee	Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]	Sergei Golubchik [ serg ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Ralf Gebhardt made changes - 2021-08-17 21:15

Link

This issue is blocked by MENT-651 [ MENT-651 ]

Sergei Golubchik made changes - 2021-08-19 14:07

Priority

Critical [ 2 ]

Major [ 3 ]

Ralf Gebhardt made changes - 2021-09-28 13:39

Fix Version/s		10.8 [ 26121 ]
Fix Version/s	10.7 [ 24805 ]

Ralf Gebhardt made changes - 2021-10-22 12:56

Priority

Major [ 3 ]

Critical [ 2 ]

Marko Mäkelä made changes - 2021-11-12 12:56

Link

This issue relates to ~~MDEV-15250~~ [ ~~MDEV-15250~~ ]

Marko Mäkelä added a comment - 2021-11-12 13:07

~~MDEV-15250~~ covers the ‘fake duplicate’ problem in InnoDB native online table rebuild (or online CREATE UNIQUE INDEX).

Marko Mäkelä added a comment - 2021-11-12 13:07 MDEV-15250 covers the ‘fake duplicate’ problem in InnoDB native online table rebuild (or online CREATE UNIQUE INDEX ).

Sergei Golubchik made changes - 2021-11-23 13:53

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Nikita Malyavin made changes - 2021-11-24 18:53

Assignee	Nikita Malyavin [ nikitamalyavin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-11-26 08:45

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Rob Schwyzer (Inactive) made changes - 2021-11-26 21:01

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 32618 ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 87561 ]

MariaDB v4 [ 131690 ]

Nikita Malyavin made changes - 2021-12-08 01:34

Assignee	Nikita Malyavin [ nikitamalyavin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-12-10 23:01

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Nikita Malyavin made changes - 2021-12-24 14:13

Assignee	Nikita Malyavin [ nikitamalyavin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-12-25 23:23

Fix Version/s		10.9 [ 26905 ]
Fix Version/s	10.8 [ 26121 ]

Sergei Golubchik made changes - 2022-02-27 15:44

Assignee	Sergei Golubchik [ serg ]	Nikita Malyavin [ nikitamalyavin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Marko Mäkelä added a comment - 2022-02-28 11:31

I’m glad to see that the tests include a case where a concurrent UPDATE during ADD PRIMARY KEY causes a failure of the ALTER TABLE operation. But, I do not see that test for INSERT, or for an UPDATE that would be executed via a foreign key constraint (ON UPDATE CASCADE).

Marko Mäkelä added a comment - 2022-02-28 11:31 I’m glad to see that the tests include a case where a concurrent UPDATE during ADD PRIMARY KEY causes a failure of the ALTER TABLE operation. But, I do not see that test for INSERT , or for an UPDATE that would be executed via a foreign key constraint ( ON UPDATE CASCADE ).

Jan Lindström (Inactive) made changes - 2022-03-03 05:39

Link

This issue is blocked by ~~MDEV-27986~~ [ ~~MDEV-27986~~ ]

Sergei Golubchik made changes - 2022-03-09 15:54

Link

This issue includes ~~MDEV-27986~~ [ ~~MDEV-27986~~ ]

Sergei Golubchik made changes - 2022-03-09 15:54

Link

This issue is blocked by ~~MDEV-27986~~ [ ~~MDEV-27986~~ ]

Sergei Golubchik made changes - 2022-03-15 19:41

Fix Version/s		10.10 [ 27530 ]
Fix Version/s	10.9 [ 26905 ]

Sergei Golubchik made changes - 2022-05-19 18:21

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Rick James added a comment - 2022-06-03 17:09

ALTER syntax allows multiple changes. Is it always possible to perform all of them in a single command? Are there some tricky cases that need special care? Perhaps:

Swapping the names of two columns;
DROPping and ADDing a different PRIMARY KEY;
Multiple PARTITION actions.
what if the old schema would raise an FK or Uniqueness exception but the new schema would not? (Or vice versa.)

Rick James added a comment - 2022-06-03 17:09 ALTER syntax allows multiple changes. Is it always possible to perform all of them in a single command? Are there some tricky cases that need special care? Perhaps: Swapping the names of two columns; DROPping and ADDing a different PRIMARY KEY; Multiple PARTITION actions. what if the old schema would raise an FK or Uniqueness exception but the new schema would not? (Or vice versa.)

Sergei Golubchik added a comment - 2022-06-06 09:42

generally all of that is allowed and should work. not everything of that is tested though, it's in todo.

Sergei Golubchik added a comment - 2022-06-06 09:42 generally all of that is allowed and should work. not everything of that is tested though, it's in todo.

Marko Mäkelä added a comment - 2022-06-07 06:17

serg, I think that rjasdfiii must be aware of the following error. Do we have a plan to remove it?

ER_ALTER_OPERATION_NOT_SUPPORTED_REASON_PARTITION

        chi "分区特定操作尚不支持锁定/算法"

        eng "Partition specific operations do not yet support LOCK/ALGORITHM"

        spa "Las operaciones específicas de partición aún no soportan LOCK/ALGORITHM"

I don’t think it was ever allowed to combine partitioning-related operations (such as ALTER TABLE…DROP PARTITIONING) with others, such as ADD INDEX or DROP INDEX. Some combinations might even be a syntax error. In the development branch, I do not see any added tests that would attempt to trigger such errors.

I believe that there could be other restrictions as well, at least around versioned tables.

My implementation of online table rebuild (WL#6255) in MySQL 5.6.8 does support online ADD PRIMARY KEY. Starting with MySQL 5.7 and MariaDB 10.2, some sorting will be skipped when the ordering of the PRIMARY KEY does not change (say, when changing from PRIMARY KEY(a,b,c) to PRIMARY KEY(a,b) or PRIMARY KEY(a,c)). I see that some tests with alter table t1 drop primary key, add primary key(b) are included in the development branch.

I’d expect that swapping the names of two columns is possible ever since MySQL 5.6 or MariaDB 10.10. ~~MDEV-16290~~ introduced easier syntax for it. It should be something like ALTER TABLE t RENAME COLUMN a TO b, RENAME COLUMN b TO a; Any DROP as well as the first part of RENAME always refer to "old" column names, so there should be no confusion.

Marko Mäkelä added a comment - 2022-06-07 06:17 serg , I think that rjasdfiii must be aware of the following error. Do we have a plan to remove it? ER_ALTER_OPERATION_NOT_SUPPORTED_REASON_PARTITION chi "分区特定操作尚不支持锁定/算法" eng "Partition specific operations do not yet support LOCK/ALGORITHM" spa "Las operaciones específicas de partición aún no soportan LOCK/ALGORITHM" I don’t think it was ever allowed to combine partitioning-related operations (such as ALTER TABLE…DROP PARTITIONING ) with others, such as ADD INDEX or DROP INDEX . Some combinations might even be a syntax error. In the development branch, I do not see any added tests that would attempt to trigger such errors. I believe that there could be other restrictions as well, at least around versioned tables. My implementation of online table rebuild (WL#6255) in MySQL 5.6.8 does support online ADD PRIMARY KEY . Starting with MySQL 5.7 and MariaDB 10.2, some sorting will be skipped when the ordering of the PRIMARY KEY does not change (say, when changing from PRIMARY KEY(a,b,c) to PRIMARY KEY(a,b) or PRIMARY KEY(a,c) ). I see that some tests with alter table t1 drop primary key, add primary key(b) are included in the development branch. I’d expect that swapping the names of two columns is possible ever since MySQL 5.6 or MariaDB 10.10. MDEV-16290 introduced easier syntax for it. It should be something like ALTER TABLE t RENAME COLUMN a TO b, RENAME COLUMN b TO a; Any DROP as well as the first part of RENAME always refer to "old" column names, so there should be no confusion.

Sergei Golubchik made changes - 2022-06-07 13:57

Status

Stalled [ 10000 ]

In Testing [ 10301 ]

Sergei Golubchik made changes - 2022-06-07 13:57

Assignee

Sergei Golubchik [ serg ]

Lena Startseva [ JIRAUSER50478 ]

Rick James added a comment - 2022-06-07 16:18

Is it safe to say the following?

"When multiple changes are allowed in a single ALTER, that will 'always' be as fast or faster than doing the individual Alters separately."

Rick James added a comment - 2022-06-07 16:18 Is it safe to say the following? "When multiple changes are allowed in a single ALTER, that will 'always' be as fast or faster than doing the individual Alters separately."

Roel Van de Paar made changes - 2022-06-08 10:00

Link

This issue causes ~~MDEV-28771~~ [ ~~MDEV-28771~~ ]

Ramesh Sivaraman made changes - 2022-06-08 11:35

Link

This issue relates to ~~MDEV-28774~~ [ ~~MDEV-28774~~ ]

Ramesh Sivaraman made changes - 2022-06-13 07:33

Link

This issue relates to ~~MDEV-28198~~ [ ~~MDEV-28198~~ ]

Ramesh Sivaraman made changes - 2022-06-13 08:48

Link

This issue relates to ~~MDEV-28198~~ [ ~~MDEV-28198~~ ]

Ramesh Sivaraman made changes - 2022-06-13 08:48

Link

This issue relates to ~~MDEV-28816~~ [ ~~MDEV-28816~~ ]

Elena Stepanova made changes - 2022-06-13 23:41

Link

This issue causes ~~MDEV-28825~~ [ ~~MDEV-28825~~ ]

Elena Stepanova made changes - 2022-06-15 12:02

Assignee

Lena Startseva [ JIRAUSER50478 ]

Elena Stepanova [ elenst ]

Sergei Golubchik added a comment - 2022-06-18 19:08 - edited

In the branch preview-10.10-ddl.
And in bb-10.10-MDEV-16329.

Sergei Golubchik added a comment - 2022-06-18 19:08 - edited In the branch preview-10.10-ddl . And in bb-10.10-MDEV-16329 .

Elena Stepanova made changes - 2022-06-24 16:54

Link

This issue relates to ~~MDEV-28942~~ [ ~~MDEV-28942~~ ]

Elena Stepanova made changes - 2022-06-24 18:26

Link

This issue relates to ~~MDEV-28943~~ [ ~~MDEV-28943~~ ]

Elena Stepanova made changes - 2022-06-24 22:35

Link

This issue relates to ~~MDEV-28944~~ [ ~~MDEV-28944~~ ]

Elena Stepanova made changes - 2022-06-26 22:07

Link

This issue causes ~~MDEV-28949~~ [ ~~MDEV-28949~~ ]

Nikita Malyavin made changes - 2022-06-27 10:35

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up ‘row event listeners’ for tracking changes from concurrent DDL.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the ‘row event listeners’.
# Exclusively lock the table.
# Apply any remaining changes from the ‘row event listeners’.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2022-06-27 10:40

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-06-27 11:09

Link

This issue relates to ~~MDEV-28959~~ [ ~~MDEV-28959~~ ]

Nikita Malyavin made changes - 2022-06-27 15:49

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-06-27 22:44

Link

This issue relates to ~~MDEV-28966~~ [ ~~MDEV-28966~~ ]

Elena Stepanova made changes - 2022-06-27 23:01

Link

This issue relates to ~~MDEV-28967~~ [ ~~MDEV-28967~~ ]

Sergei Golubchik made changes - 2022-06-29 16:42

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-07-02 11:18

Link

This issue relates to ~~MDEV-29007~~ [ ~~MDEV-29007~~ ]

Elena Stepanova made changes - 2022-07-05 22:03

Link

This issue relates to ~~MDEV-29038~~ [ ~~MDEV-29038~~ ]

Angelique Sklavounos (Inactive) made changes - 2022-07-07 11:56

Link

This issue relates to ~~MDEV-29056~~ [ ~~MDEV-29056~~ ]

Elena Stepanova made changes - 2022-07-08 22:13

Link

This issue relates to ~~MDEV-29067~~ [ ~~MDEV-29067~~ ]

Elena Stepanova made changes - 2022-07-08 22:28

Link

This issue relates to ~~MDEV-29068~~ [ ~~MDEV-29068~~ ]

Elena Stepanova made changes - 2022-07-08 22:36

Link

This issue relates to ~~MDEV-29013~~ [ ~~MDEV-29013~~ ]

Elena Stepanova made changes - 2022-07-08 22:58

Link

This issue relates to ~~MDEV-28930~~ [ ~~MDEV-28930~~ ]

Elena Stepanova made changes - 2022-07-08 23:30

Link

This issue relates to ~~MDEV-29069~~ [ ~~MDEV-29069~~ ]

Sergei Golubchik made changes - 2022-07-25 20:55

Fix Version/s		10.11 [ 27614 ]
Fix Version/s	10.10 [ 27530 ]

Elena Stepanova made changes - 2022-07-29 17:36

Status

In Testing [ 10301 ]

Stalled [ 10000 ]

Michael Widenius added a comment - 2022-08-09 08:11

To Rick James:
Yes, many alter table operations in a single operation should always be faster than doing individual alter statements.

Michael Widenius added a comment - 2022-08-09 08:11 To Rick James: Yes, many alter table operations in a single operation should always be faster than doing individual alter statements.

Federico Razzoli added a comment - 2022-08-09 10:37

Does this feature work in cases when the table:

Has foreign keys and/or triggers?
Is written by other tables foreign keys or triggers?

Do you plan, at some point, to implement a way to pause the data copying in case the server slows down too much and resume it later? As far as I understand, as long as the RBR is in place, this shouldn't break anything.

Federico Razzoli added a comment - 2022-08-09 10:37 Does this feature work in cases when the table: Has foreign keys and/or triggers? Is written by other tables foreign keys or triggers? Do you plan, at some point, to implement a way to pause the data copying in case the server slows down too much and resume it later? As far as I understand, as long as the RBR is in place, this shouldn't break anything.

Sergei Golubchik added a comment - 2022-08-09 10:47

has foreign keys and/or triggers — yes, it should work, as far as I understand
Is written by other tables foreign keys or triggers — triggers aren't a problem, should work fine, cascading foreign keys are a problem, it's ~~MDEV-29068~~ and one of the main reasons why this preview feature didn't make it into 10.10.1

Sergei Golubchik added a comment - 2022-08-09 10:47 has foreign keys and/or triggers — yes, it should work, as far as I understand Is written by other tables foreign keys or triggers — triggers aren't a problem, should work fine, cascading foreign keys are a problem, it's MDEV-29068 and one of the main reasons why this preview feature didn't make it into 10.10.1

AirFocus made changes - 2022-08-09 16:11

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:
# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

[Not implemented here] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. [Old part] Challenges
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to MDEV\-11675, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future MDEV\-515 code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Julien Fritsch made changes - 2022-08-10 08:22

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to MDEV\-11675, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future MDEV\-515 code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2022-08-16 13:18

Assignee

Elena Stepanova [ elenst ]

Nikita Malyavin [ nikitamalyavin ]

Nikita Malyavin made changes - 2022-08-24 11:01

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Roel Van de Paar made changes - 2022-08-26 10:23

Link

This issue causes ~~MDEV-29393~~ [ ~~MDEV-29393~~ ]

Roel Van de Paar made changes - 2022-08-26 10:45

Link

This issue causes ~~MDEV-29394~~ [ ~~MDEV-29394~~ ]

Roel Van de Paar made changes - 2022-09-10 03:43

Link

This issue causes ~~MDEV-29393~~ [ ~~MDEV-29393~~ ]

Roel Van de Paar made changes - 2022-09-10 03:43

Link

This issue causes ~~MDEV-29394~~ [ ~~MDEV-29394~~ ]

Roel Van de Paar made changes - 2022-09-10 03:45

Link

This issue causes ~~MDEV-29506~~ [ ~~MDEV-29506~~ ]

Sergei Golubchik made changes - 2022-09-28 16:23

Link

This issue causes ~~MDEV-28816~~ [ ~~MDEV-28816~~ ]

Sergei Golubchik made changes - 2022-09-28 16:23

Link

This issue relates to ~~MDEV-28816~~ [ ~~MDEV-28816~~ ]

Sergei Golubchik made changes - 2022-09-28 16:24

Link

This issue causes ~~MDEV-28943~~ [ ~~MDEV-28943~~ ]

Sergei Golubchik made changes - 2022-09-28 16:25

Link

This issue relates to ~~MDEV-28943~~ [ ~~MDEV-28943~~ ]

Sergei Golubchik made changes - 2022-09-28 16:25

Link

This issue causes ~~MDEV-28944~~ [ ~~MDEV-28944~~ ]

Sergei Golubchik made changes - 2022-09-28 16:25

Link

This issue relates to ~~MDEV-28944~~ [ ~~MDEV-28944~~ ]

Sergei Golubchik made changes - 2022-09-28 16:26

Link

This issue causes ~~MDEV-28959~~ [ ~~MDEV-28959~~ ]

Sergei Golubchik made changes - 2022-09-28 16:26

Link

This issue relates to ~~MDEV-28959~~ [ ~~MDEV-28959~~ ]

Sergei Golubchik made changes - 2022-09-28 16:27

Link

This issue causes ~~MDEV-28967~~ [ ~~MDEV-28967~~ ]

Sergei Golubchik made changes - 2022-09-28 16:27

Link

This issue relates to ~~MDEV-28967~~ [ ~~MDEV-28967~~ ]

Sergei Golubchik made changes - 2022-09-28 16:28

Link

This issue causes ~~MDEV-29038~~ [ ~~MDEV-29038~~ ]

Sergei Golubchik made changes - 2022-09-28 16:28

Link

This issue relates to ~~MDEV-29038~~ [ ~~MDEV-29038~~ ]

Sergei Golubchik made changes - 2022-09-28 16:29

Link

This issue causes ~~MDEV-29067~~ [ ~~MDEV-29067~~ ]

Sergei Golubchik made changes - 2022-09-28 16:29

Link

This issue relates to ~~MDEV-29067~~ [ ~~MDEV-29067~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue causes ~~MDEV-29068~~ [ ~~MDEV-29068~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue relates to ~~MDEV-29068~~ [ ~~MDEV-29068~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue causes ~~MDEV-29069~~ [ ~~MDEV-29069~~ ]

Sergei Golubchik made changes - 2022-09-28 16:30

Link

This issue relates to ~~MDEV-29069~~ [ ~~MDEV-29069~~ ]

Sergei Golubchik made changes - 2022-09-28 16:32

Link

This issue causes ~~MDEV-28930~~ [ ~~MDEV-28930~~ ]

Sergei Golubchik made changes - 2022-09-28 16:32

Link

This issue relates to ~~MDEV-28930~~ [ ~~MDEV-28930~~ ]

Sergei Golubchik made changes - 2022-09-28 16:33

Link

This issue causes ~~MDEV-29013~~ [ ~~MDEV-29013~~ ]

Sergei Golubchik made changes - 2022-09-28 16:33

Link

This issue relates to ~~MDEV-29013~~ [ ~~MDEV-29013~~ ]

Sergei Golubchik made changes - 2022-09-28 16:34

Link

This issue causes ~~MDEV-29056~~ [ ~~MDEV-29056~~ ]

Sergei Golubchik made changes - 2022-09-28 16:34

Link

This issue relates to ~~MDEV-29056~~ [ ~~MDEV-29056~~ ]

Sergei Golubchik made changes - 2022-10-30 15:21

Fix Version/s		10.12 [ 28320 ]
Fix Version/s	10.11 [ 27614 ]

Ralf Gebhardt made changes - 2022-12-27 14:03

Labels

alter online-ddl performance replication

Preview_removed_10.10 alter online-ddl performance replication

Julien Fritsch made changes - 2022-12-28 12:50

Fix Version/s		11.1 [ 28549 ]
Fix Version/s	11.0 [ 28320 ]

Sergei Golubchik made changes - 2022-12-28 12:53

Fix Version/s		11.0 [ 28320 ]
Fix Version/s	11.1 [ 28549 ]

Sergei Golubchik made changes - 2023-01-19 10:57

Fix Version/s		11.1 [ 28549 ]
Fix Version/s	11.0 [ 28320 ]

Nikita Malyavin made changes - 2023-02-17 12:53

Status

Stalled [ 10000 ]

In Testing [ 10301 ]

Nikita Malyavin made changes - 2023-02-17 12:57

Assignee

Sergei Golubchik [ serg ]

Nikita Malyavin [ nikitamalyavin ]

Nikita Malyavin added a comment - 2023-02-19 18:43 - edited

elenst the feature is ready for your assessment. The code rebased on top of 11.0 can be found on the following branch:

bb-11.0-MDEV-16329-online-alter

link for the current head: https://github.com/MariaDB/server/commit/57e3333904a6d45077b47bf27808573a723def30

Nikita Malyavin added a comment - 2023-02-19 18:43 - edited elenst the feature is ready for your assessment. The code rebased on top of 11.0 can be found on the following branch: bb-11.0-MDEV-16329-online-alter link for the current head: https://github.com/MariaDB/server/commit/57e3333904a6d45077b47bf27808573a723def30

Sergei Golubchik added a comment - 2023-02-19 19:18

what branch did you rebase? bb-10.11-oalter has a bunch of commits not in bb-11.0-~~MDEV-16329~~-online-alter

Sergei Golubchik added a comment - 2023-02-19 19:18 what branch did you rebase? bb-10.11-oalter has a bunch of commits not in bb-11.0- MDEV-16329 -online-alter

Nikita Malyavin added a comment - 2023-02-19 21:18

I rebased bb-10.11-ddl-nikita. Thanks for reminding, will cherry-pick your updates

Nikita Malyavin added a comment - 2023-02-19 21:18 I rebased bb-10.11-ddl-nikita. Thanks for reminding, will cherry-pick your updates

Nikita Malyavin added a comment - 2023-02-20 14:39

elenst, the new branch for testing is bb-11.0-oalter.
Head: https://github.com/MariaDB/server/commit/f9b33ac570337be320f718d52fd88d301a2bc1e7

Nikita Malyavin added a comment - 2023-02-20 14:39 elenst , the new branch for testing is bb-11.0-oalter . Head: https://github.com/MariaDB/server/commit/f9b33ac570337be320f718d52fd88d301a2bc1e7

Nikita Malyavin made changes - 2023-02-20 14:39

Assignee

Nikita Malyavin [ nikitamalyavin ]

Elena Stepanova [ elenst ]

Ralf Gebhardt made changes - 2023-03-21 09:28

Labels

Preview_removed_10.10 alter online-ddl performance replication

Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Ralf Gebhardt made changes - 2023-03-21 09:33

Labels

Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Preview_10.10 Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Ralf Gebhardt made changes - 2023-03-21 09:35

Labels

Preview_10.10 Preview_11.1 Preview_removed_10.10 alter online-ddl performance replication

Preview_10.10 Preview_11.1 alter online-ddl performance replication

Alice Sherepa made changes - 2023-03-21 11:13

Link

This issue relates to ~~MDEV-30891~~ [ ~~MDEV-30891~~ ]

Elena Stepanova made changes - 2023-03-22 15:50

Link

This issue relates to ~~MDEV-30902~~ [ ~~MDEV-30902~~ ]

Alice Sherepa made changes - 2023-03-24 12:36

Link

This issue causes ~~MDEV-30891~~ [ ~~MDEV-30891~~ ]

Alice Sherepa made changes - 2023-03-24 12:36

Link

This issue relates to ~~MDEV-30891~~ [ ~~MDEV-30891~~ ]

Elena Stepanova made changes - 2023-03-24 12:57

Link

This issue causes ~~MDEV-30902~~ [ ~~MDEV-30902~~ ]

Elena Stepanova made changes - 2023-03-24 12:58

Link

This issue relates to ~~MDEV-30902~~ [ ~~MDEV-30902~~ ]

Elena Stepanova made changes - 2023-03-24 15:23

Link

This issue causes ~~MDEV-30924~~ [ ~~MDEV-30924~~ ]

Elena Stepanova made changes - 2023-03-24 19:53

Link

This issue causes ~~MDEV-30925~~ [ ~~MDEV-30925~~ ]

Nuno added a comment - 2023-03-27 07:53

Hey!
Good day.

Just curious – will this allow us to do "ALTER ONLINE TABLE" when there is a virtual generated column in the table?

Thanks!

Nuno added a comment - 2023-03-27 07:53 Hey! Good day. Just curious – will this allow us to do "ALTER ONLINE TABLE" when there is a virtual generated column in the table? Thanks!

Angelique Sklavounos (Inactive) made changes - 2023-03-28 12:56

Link

This issue causes ~~MDEV-30945~~ [ ~~MDEV-30945~~ ]

Elena Stepanova made changes - 2023-03-28 21:45

Link

This issue causes ~~MDEV-30949~~ [ ~~MDEV-30949~~ ]

Elena Stepanova made changes - 2023-04-01 13:02

Link

This issue causes ~~MDEV-30983~~ [ ~~MDEV-30983~~ ]

Elena Stepanova made changes - 2023-04-01 15:57

Link

This issue relates to ~~MDEV-30984~~ [ ~~MDEV-30984~~ ]

Nikita Malyavin made changes - 2023-04-01 16:28

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-04-01 16:44

Summary

ALTER ONLINE TABLE

Engine-independent ALTER ONLINE TABLE

Nikita Malyavin made changes - 2023-04-01 16:45

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:

# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-04-01 16:49

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’; the bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin added a comment - 2023-04-01 17:01

Hello @nuno! Yes, virtual generated columns are supported

Nikita Malyavin added a comment - 2023-04-01 17:01 Hello @nuno! Yes, virtual generated columns are supported

Sergei Golubchik made changes - 2023-04-01 19:45

Summary

Engine-independent ALTER ONLINE TABLE

Engine-independent online ALTER TABLE

Sergei Golubchik made changes - 2023-04-01 19:46

Description

Implement {{ALTER ONLINE TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Sergei Golubchik made changes - 2023-04-02 11:09

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-04-02 17:57

Link

This issue causes ~~MDEV-30987~~ [ ~~MDEV-30987~~ ]

Elena Stepanova made changes - 2023-04-10 16:50

Link

This issue causes ~~MDEV-31033~~ [ ~~MDEV-31033~~ ]

Alice Sherepa made changes - 2023-04-12 11:31

Link

This issue causes ~~MDEV-31040~~ [ ~~MDEV-31040~~ ]

Elena Stepanova made changes - 2023-04-12 14:21

Link

This issue causes ~~MDEV-30983~~ [ ~~MDEV-30983~~ ]

Elena Stepanova made changes - 2023-04-12 22:52

Link

This issue causes ~~MDEV-31043~~ [ ~~MDEV-31043~~ ]

Nikita Malyavin made changes - 2023-04-14 15:08

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively lock the table.
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
# Downgrade the lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply changes from the replicated contents.
# Exclusively lock the table (MDL_SHARED_WRITE).
# Apply any remaining replicated changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* Myisam/Aria only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-04-14 15:09

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* Myisam/Aria only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-04-15 19:02

Link

This issue causes ~~MDEV-31058~~ [ ~~MDEV-31058~~ ]

Elena Stepanova made changes - 2023-04-16 11:40

Link

This issue causes ~~MDEV-31059~~ [ ~~MDEV-31059~~ ]

Ralf Gebhardt made changes - 2023-04-19 10:53

Link

This issue causes ~~MDEV-30984~~ [ ~~MDEV-30984~~ ]

Ralf Gebhardt made changes - 2023-04-19 10:53

Link

This issue relates to ~~MDEV-30984~~ [ ~~MDEV-30984~~ ]

Elena Stepanova made changes - 2023-04-25 19:04

Link

This issue causes ~~MDEV-31128~~ [ ~~MDEV-31128~~ ]

Elena Stepanova made changes - 2023-04-26 23:28

Link

This issue causes ~~MDEV-31136~~ [ ~~MDEV-31136~~ ]

Elena Stepanova made changes - 2023-05-03 00:22

Link

This issue causes ~~MDEV-31172~~ [ ~~MDEV-31172~~ ]

Ralf Gebhardt made changes - 2023-05-30 07:01

Link

This issue causes ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Ralf Gebhardt made changes - 2023-05-30 08:18

Fix Version/s		11.2 [ 28603 ]
Fix Version/s	11.1 [ 28549 ]

Nikita Malyavin made changes - 2023-05-31 11:27

Link

This issue relates to ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Nikita Malyavin made changes - 2023-05-31 11:27

Link

This issue causes ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Nikita Malyavin made changes - 2023-06-02 13:32

Link

This issue is blocked by ~~MDEV-30985~~ [ ~~MDEV-30985~~ ]

Elena Stepanova made changes - 2023-06-27 16:10

Link

This issue causes ~~MDEV-31563~~ [ ~~MDEV-31563~~ ]

Elena Stepanova made changes - 2023-07-02 13:15

Link

This issue causes ~~MDEV-31601~~ [ ~~MDEV-31601~~ ]

Nikita Malyavin made changes - 2023-07-04 15:36

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-07-04 15:43

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-07-04 15:49

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-07-04 19:16

Link

This issue causes MDEV-31624 [ MDEV-31624 ]

Elena Stepanova made changes - 2023-07-05 14:15

Link

This issue causes ~~MDEV-31631~~ [ ~~MDEV-31631~~ ]

Elena Stepanova made changes - 2023-07-07 23:47

Link

This issue causes ~~MDEV-31646~~ [ ~~MDEV-31646~~ ]

Elena Stepanova made changes - 2023-07-13 11:38

Link

This issue causes ~~MDEV-31677~~ [ ~~MDEV-31677~~ ]

Nikita Malyavin made changes - 2023-07-20 17:07

Link

This issue causes ~~MDEV-31755~~ [ ~~MDEV-31755~~ ]

Elena Stepanova made changes - 2023-07-25 19:00

Link

This issue causes ~~MDEV-31775~~ [ ~~MDEV-31775~~ ]

Elena Stepanova made changes - 2023-07-25 22:17

Link

This issue causes ~~MDEV-31776~~ [ ~~MDEV-31776~~ ]

Elena Stepanova made changes - 2023-07-25 23:15

Link

This issue causes ~~MDEV-31777~~ [ ~~MDEV-31777~~ ]

Elena Stepanova made changes - 2023-07-26 22:43

Link

This issue causes ~~MDEV-31781~~ [ ~~MDEV-31781~~ ]

Nikita Malyavin made changes - 2023-07-26 23:30

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Nikita Malyavin made changes - 2023-07-27 11:42

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-07-29 22:35

Link

This issue causes ~~MDEV-31799~~ [ ~~MDEV-31799~~ ]

Elena Stepanova made changes - 2023-07-31 11:24

Link

This issue causes ~~MDEV-31804~~ [ ~~MDEV-31804~~ ]

Elena Stepanova made changes - 2023-07-31 14:27

Link

This issue causes ~~MDEV-31799~~ [ ~~MDEV-31799~~ ]

Elena Stepanova made changes - 2023-07-31 22:30

Link

This issue causes ~~MDEV-31812~~ [ ~~MDEV-31812~~ ]

Elena Stepanova made changes - 2023-08-03 16:10

Link

This issue causes ~~MDEV-31838~~ [ ~~MDEV-31838~~ ]

Nikita Malyavin made changes - 2023-08-07 13:02

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable
* Sequences
* Engines S3 and CONNECT

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-08-12 18:12

Link

This issue causes MDEV-31906 [ MDEV-31906 ]

Elena Stepanova added a comment - 2023-08-12 22:27 - edited

Tests performed on bb-11.2-oalter c29ff60b didn't reveal any serious issues. In my opinion, the feature as of this revision can be pushed into 11.2 and released with 11.2.1 RC.

The simplest scenario which demonstrates benefits of the new development is:

one connection performs ALTER TABLE which requires COPY algorithm on a significantly large table;
another connection wants to keep running DML on the table.

Here is a comparative example of it executed on a 10M InnoDB table with the new feature vs the baseline. One connection starts ALTER, another starts running point DMLs (single row updates by PK), and continues doing so until ALTER ends. After that, the number of successfully updated rows is counted.
There are no timeouts in this scenario because duration of the ALTER is below the timeout limits.
Disclaimer: it is not an official benchmark, the results are only for relative comparison, the absolute numbers have no value.

baseline
alter_duration (sec) rows_updated
24.47 0

Online alter
alter_duration (sec) rows_updated
28.18 711035

That is, without online alter the table is blocked for the whole duration of ALTER, no updates are executed. With online alter, at a cost of small increase in ALTER duration, updates continue to be executed during the most part of ALTER.
The results are scalable to bigger tables and longer duration.
In addition to performance numbers, it also has a qualitative effect, as with a longer locking ALTER DML would start timing out. The online ALTER prevents it to a big extent.

A user needs to be aware though that while the non-locking copy alter should be beneficial in the vast majority of realistic use cases, there can be scenarios when it can significantly impact performance. One such scenario known to us is the notorious problem of RBR on tables without primary key. When non-locking ALTER is performed on such a table, and DML affecting a big number of records is run in parallel, the ALTER can become extremely slow, and further DML can also be affected.
If there is a chance of such scenarios (and there is no possibility of improving the schema immediately by adding primary keys to the tables), ALTER should be performed with explicit LOCK=SHARED clause. If this is also impossible, then LOCK_ALTER_TABLE_COPY flag should be added to the old_mode variable until the schema can be improved.

Here is a comparative example of such scenario executed on a 5M InnoDB table without a PK. One connection starts ALTER, another one first runs big UPDATE (10K rows updated at once), and then a series of 500 small updates, 100 rows each. running point DMLs (single row updates by PK).
Same disclaimer as above applies.

baseline (seconds)
alter_duration first_dml_duration dml_loop_duration rows_updated
13.08 13.09 5.99 60000

online alter (seconds)
alter_duration first_dml_duration dml_loop_duration rows_updated
516.15 0.02 12.32 60000

That is, on the baseline the first big update waits for the ALTER to end, but the ALTER itself is fast enough, and when it ends, the following DML is performed without obstacles.
With online alter, the first DML is indeed executed in parallel with ALTER, but the following DML becomes affected by concurrency with ALTER, while ALTER itself becomes very slow. Besides, the slow ALTER holds a metadata lock which can cause problems with seeing table definitions etc.
This scenario is also highly scalable and in unfortunate circumstances ALTER can become practically endless.

Elena Stepanova added a comment - 2023-08-12 22:27 - edited Tests performed on bb-11.2-oalter c29ff60b didn't reveal any serious issues. In my opinion, the feature as of this revision can be pushed into 11.2 and released with 11.2.1 RC. The simplest scenario which demonstrates benefits of the new development is: one connection performs ALTER TABLE which requires COPY algorithm on a significantly large table; another connection wants to keep running DML on the table. Here is a comparative example of it executed on a 10M InnoDB table with the new feature vs the baseline. One connection starts ALTER, another starts running point DMLs (single row updates by PK), and continues doing so until ALTER ends. After that, the number of successfully updated rows is counted. There are no timeouts in this scenario because duration of the ALTER is below the timeout limits. Disclaimer: it is not an official benchmark, the results are only for relative comparison, the absolute numbers have no value. baseline alter_duration (sec) rows_updated 24.47 0 Online alter alter_duration (sec) rows_updated 28.18 711035 That is, without online alter the table is blocked for the whole duration of ALTER, no updates are executed. With online alter, at a cost of small increase in ALTER duration, updates continue to be executed during the most part of ALTER. The results are scalable to bigger tables and longer duration. In addition to performance numbers, it also has a qualitative effect, as with a longer locking ALTER DML would start timing out. The online ALTER prevents it to a big extent. A user needs to be aware though that while the non-locking copy alter should be beneficial in the vast majority of realistic use cases, there can be scenarios when it can significantly impact performance. One such scenario known to us is the notorious problem of RBR on tables without primary key. When non-locking ALTER is performed on such a table, and DML affecting a big number of records is run in parallel, the ALTER can become extremely slow, and further DML can also be affected. If there is a chance of such scenarios (and there is no possibility of improving the schema immediately by adding primary keys to the tables), ALTER should be performed with explicit LOCK=SHARED clause. If this is also impossible, then LOCK_ALTER_TABLE_COPY flag should be added to the old_mode variable until the schema can be improved. Here is a comparative example of such scenario executed on a 5M InnoDB table without a PK. One connection starts ALTER, another one first runs big UPDATE (10K rows updated at once), and then a series of 500 small updates, 100 rows each. running point DMLs (single row updates by PK). Same disclaimer as above applies. baseline (seconds) alter_duration first_dml_duration dml_loop_duration rows_updated 13.08 13.09 5.99 60000 online alter (seconds) alter_duration first_dml_duration dml_loop_duration rows_updated 516.15 0.02 12.32 60000 That is, on the baseline the first big update waits for the ALTER to end, but the ALTER itself is fast enough, and when it ends, the following DML is performed without obstacles. With online alter, the first DML is indeed executed in parallel with ALTER, but the following DML becomes affected by concurrency with ALTER, while ALTER itself becomes very slow. Besides, the slow ALTER holds a metadata lock which can cause problems with seeing table definitions etc. This scenario is also highly scalable and in unfortunate circumstances ALTER can become practically endless.

Elena Stepanova made changes - 2023-08-12 22:27

Description

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should ‘just work’ (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable
* Sequences
* Engines S3 and CONNECT

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The ‘row event log’ {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the ‘row event log’, so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the ‘row event log’.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE…ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Implement online {{ALTER TABLE}} above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0.

h1. Intro

{{ALTER TABLE}} can perform many various table metadata alterations, individually or batched (many alterations at once). It supports different algorithms for applying those alterations and different lock levels restricting access to the table while it's being altered. What algorithm and lock level to use depends on the storage engine, requested alterations and explicitly specified algorithm and lock, if any. If no algorithm or lock level is explicitly specified, the server is supposed to select the best algorithm/lock combination automatically.

While certain alterations (like adding a column) can be done by certain storage engines (like InnoDB) internally (using InnoDB-specific {{ALGORITHM=INSTANT}}) and without locking the table ({{LOCK=NONE}}), the most universal {{ALTER TABLE}} algorithm that supports arbitrary alterations in arbitrary combinations is the {{COPY}} algorithm and it locks the table, allowing only read access during the whole {{ALTER TABLE}} duration. When the server has to resort to the {{COPY}} algorithm (because no other one can perform the requested set of alterations) it often means long periods of the application being essentially down, because the table cannot be written into.

The goal of this task is to allow the {{COPY}} algorithm to work without read-locking the table. In other words, this should make the combination {{ALGORITHM=COPY, LOCK=NONE}} possible.

h1. Implementation

The {{COPY}} algorithm for {{ALTER ONLINE TABLE}} is supposed to do the following:
# Exclusively acquire the table Metadata Lock (MDL).
# Acquire the table lock for read (TL_READ)
# Read the first record. In table is empty, online is skipped (goto 11).
# Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs ("online changes").
# Downgrade the MDL lock.
# Copy the table contents (using a non\-locking read if supported by the storage engine).
# Apply the online changes from the replicated contents.
# Unlock the table lock
# Exclusively lock the table MDL (upgrade to MDL_SHARED_WRITE).
# Apply any remaining online changes.
# Swap the old and new table, unlock, drop the old table.

This would remove some limitations that currently exist with the InnoDB\-only online table rebuild. Basically, anything that is supported by {{ALGORITHM=COPY}} should 'just work' (however see the limitations section). The bulk copying could still happen in {{copy_data_between_tables()}}. A few examples:

# Arbitrary changes of column type will be possible, without duplicating any conversion logic.
# It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (~~MDEV-13795~~, ~~MDEV-14332~~).
# The {{ENGINE}} or the partitioning of a table can be changed, just like any other attribute.

\[Not implemented here\] We should remove the online table rebuild code from InnoDB ({{row_log_table_apply()}} and friends), and just let InnoDB fall back to this. The only {{ALTER ONLINE TABLE}} that could better be implemented inside storage engines would be {{ADD INDEX}}. Then, {{ALGORITHM=INPLACE}} would no longer be misleading, because it would mean exactly the same as the {{ALGORITHM=NOCOPY}} that was introduced in ~~MDEV-13134~~. Before this, we must implement ~~MDEV-515~~ (bulk load into an empty InnoDB table) to avoid a performance regression.

h2. Behavior of different engines
The per-engine behavior depends on what operations can happen concurrently while TL_READ is held.
* *Innodb* can do any DML (except TRUNCATE i presume). It lazily opens the _read view_ once the first record is read during the copy stage. This means that in theory some transaction can slip concurrently between TL_READ-locked table and first record is read. This is why we first read one record out, and then set up the online change buffer.
* *Myisam/Aria* only allow inserts in parallel with reads: The last table's record offset is remembered for the table handle, so copy stage will read out only the changes, that are already there. Other DMLs will be blocked until table lock is released.
* Online is disabled for temporary tables.
* For other engines, it depends on whether is it possible to acquire a particular table lock in parallel with TL_READ.

h1. Limitations

* Embedded server doesn't support LOCK=NONE, Until HAVE\_REPLICATION is enabled there (or until some finer refactoring).
* DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
* ALTER TABLE ... ORDER BY is not and cannot be supported
* Tables which are referenced by FOREIGN KEYs with CASCADE operations, see ~~MDEV-29068~~
* ALTER IGNORE TABLE
* Adding autoinc to the existing column, when NO_AUTO_VALUE_ON_ZERO *is not* present, and there were no unchanged UNIQUE NOT NULL keys. A NULL column is always impossible to update to AUTOINC with Online COPY.
* Sequences are not supported
* ADD COLUMN ... AUTO_INCREMENT and ADD COLUMN ... DEFAULT(NEXTVAL(..))
* MODIFY ... NOT NULL DEFAULT(NEXTVAL(..)), if the column initially was NULLable
* Sequences
* Engines S3 and CONNECT

h1. \[Old part\] Challenges

We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to ~~MDEV-11675~~, which would still be needed for native online {{ADD INDEX}}, which would avoid copying the table.

In InnoDB, there is some logic for logging the changes when the {{PRIMARY KEY}} columns are changed, or a {{PRIMARY KEY}} is being added. The 'row event log' {{online_log}} will additionally contain the {{PRIMARY KEY}} values in the new table, so that the records can easily be found. The {{online_log}} will contain INSERT, UPDATE, and DELETE events.

We will need some interface from {{ROLLBACK}} inside the storage engine to the 'row event log', so that {{BEGIN; INSERT; ROLLBACK}} will also create a DELETE event. Similarly, we will need an interface that allows {{CASCADE}} or {{SET NULL}} operations from {{FOREIGN KEY}} constraints to be relayed to the 'row event log'.

Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by {{PRIMARY KEY}} when the sorting does not change. Search for {{skip_pk_sort}}. It would be nice if the future ~~MDEV-515~~ code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by {{PRIMARY KEY}}.

If there exist {{FOREIGN KEY}} constraints on the being\-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.

Initially, we could disable {{ONLINE...ADD FOREIGN KEY}}. That could be easier to implement after moving the {{FOREIGN KEY}} processing from InnoDB to the SQL layer.

Elena Stepanova made changes - 2023-08-12 22:29

Assignee	Elena Stepanova [ elenst ]	Sergei Golubchik [ serg ]
Status	In Testing [ 10301 ]	Stalled [ 10000 ]

Sergei Golubchik made changes - 2023-08-13 09:56

Assignee

Sergei Golubchik [ serg ]

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik made changes - 2023-08-13 10:03

Assignee

Nikita Malyavin [ nikitamalyavin ]

Sergei Golubchik [ serg ]

Sergei Golubchik made changes - 2023-08-15 12:26

Priority

Critical [ 2 ]

Blocker [ 1 ]

Sergei Golubchik made changes - 2023-08-16 09:41

Fix Version/s		11.2.1 [ 29034 ]
Fix Version/s	11.2 [ 28603 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2023-08-16 09:42

Assignee

Sergei Golubchik [ serg ]

Nikita Malyavin [ nikitamalyavin ]

Nikita Malyavin made changes - 2023-08-17 12:59

Link

This issue split to MDEV-31942 [ MDEV-31942 ]

Elena Stepanova made changes - 2023-09-07 17:39

Link

This issue causes ~~MDEV-32126~~ [ ~~MDEV-32126~~ ]

Nikita Malyavin made changes - 2023-10-11 08:48

Link

This issue causes ~~MDEV-32444~~ [ ~~MDEV-32444~~ ]

Nikita Malyavin made changes - 2023-10-17 07:42

Link

This issue relates to TODO-4300 [ TODO-4300 ]

Nikita Malyavin made changes - 2023-10-18 12:27

Link

This issue causes MDEV-32510 [ MDEV-32510 ]

Nikita Malyavin made changes - 2023-11-07 16:03

Link

This issue causes MCOL-5603 [ MCOL-5603 ]

BJ Quinn added a comment - 2023-12-19 01:24

I'm trying to take advantage of this new feature on a very large table that is otherwise very difficult to change (~3TB, ~700 million rows). The feature seems to work as advertised, which is fantastic! But it is insanely slow – it looks like it might take 45 days or so to complete changing a single column from VARCHAR to TEXT. I even tried setting LOCK=EXCLUSIVE, which seems like it would be a bit faster, but not by a whole lot.

But the strange thing is that my hardware is not at all being stressed. This is a high performance test server that has no other activity on it other than my ALTER. Neither this table nor the other tables on the server are being written to or read from. CPU usage is low, ~10% on the core that's running the ALTER. And disk activity is even lower. Is there something that can be done to force the ALTER to be more aggressive about using available resources so that it completes more quickly?

BJ Quinn added a comment - 2023-12-19 01:24 I'm trying to take advantage of this new feature on a very large table that is otherwise very difficult to change (~3TB, ~700 million rows). The feature seems to work as advertised, which is fantastic! But it is insanely slow – it looks like it might take 45 days or so to complete changing a single column from VARCHAR to TEXT. I even tried setting LOCK=EXCLUSIVE, which seems like it would be a bit faster, but not by a whole lot. But the strange thing is that my hardware is not at all being stressed. This is a high performance test server that has no other activity on it other than my ALTER. Neither this table nor the other tables on the server are being written to or read from. CPU usage is low, ~10% on the core that's running the ALTER. And disk activity is even lower. Is there something that can be done to force the ALTER to be more aggressive about using available resources so that it completes more quickly?

Sergei Golubchik added a comment - 2023-12-19 11:49

LOCK=EXCLUSIVE (or LOCK=SHARED) that would pretty much mean the old ALTER TABLE implementation, where concurrent writes are not allowed.

What does information_schema.PROCESSLIST show in the ALTER TABLE row?
Particularly interesting are STAGE, MAX_STAGE, PROGRESS columns.

Sergei Golubchik added a comment - 2023-12-19 11:49 LOCK=EXCLUSIVE (or LOCK=SHARED) that would pretty much mean the old ALTER TABLE implementation, where concurrent writes are not allowed. What does information_schema.PROCESSLIST show in the ALTER TABLE row? Particularly interesting are STAGE , MAX_STAGE , PROGRESS columns.

BJ Quinn added a comment - 2023-12-19 16:02 - edited

Sorry, I should have clarified, I knew that LOCK=EXCLUSIVE would short circuit the online alter functionality, I was just surprised that it also was not pushing the hardware very hard.

I left it running for about 5 days and it got to State = 'copy to tmp table' and Progress was about 11.

I killed it and restarted it and it basically immediately gets to 'copy to tmp table' and starts slowly increasing the Progress number. Watching it more closely this time, I notice periods of activity (a single core at 100% and disk activity at about 10-15%) for several seconds and then no CPU or disk activity for several seconds. The length of the active and inactive periods both vary, sometimes 10 seconds, sometimes 60+ seconds. Not sure what would be causing the inactive periods, but I'm assuming no progress is being made during those periods (EDIT: I confirmed that the Progress field continuously increments when the CPU/disk are active and does not increase while CPU/disk are inactive). I'm not sure what else could be bottlenecking the process if both CPU and disk are inactive, unless there's some intentional throttling mechanism intended to not overload the system. I have plenty of CPU and disk I/O, so I'd be happy to push my hardware harder and not have these inactive periods to shorten the time to alter the table.

I'm just on a test server now, but on the production server I will need the online alter functionality, especially if it takes several hours to days to alter the table, but I was hoping it would be less than ~45 days.

Thanks for your help!

BJ Quinn added a comment - 2023-12-19 16:02 - edited Sorry, I should have clarified, I knew that LOCK=EXCLUSIVE would short circuit the online alter functionality, I was just surprised that it also was not pushing the hardware very hard. I left it running for about 5 days and it got to State = 'copy to tmp table' and Progress was about 11. I killed it and restarted it and it basically immediately gets to 'copy to tmp table' and starts slowly increasing the Progress number. Watching it more closely this time, I notice periods of activity (a single core at 100% and disk activity at about 10-15%) for several seconds and then no CPU or disk activity for several seconds. The length of the active and inactive periods both vary, sometimes 10 seconds, sometimes 60+ seconds. Not sure what would be causing the inactive periods, but I'm assuming no progress is being made during those periods (EDIT: I confirmed that the Progress field continuously increments when the CPU/disk are active and does not increase while CPU/disk are inactive). I'm not sure what else could be bottlenecking the process if both CPU and disk are inactive, unless there's some intentional throttling mechanism intended to not overload the system. I have plenty of CPU and disk I/O, so I'd be happy to push my hardware harder and not have these inactive periods to shorten the time to alter the table. I'm just on a test server now, but on the production server I will need the online alter functionality, especially if it takes several hours to days to alter the table, but I was hoping it would be less than ~45 days. Thanks for your help!

Marko Mäkelä added a comment - 2023-12-19 16:39

bjquinn, I think that the problem that you are highlighting is that ALTER TABLE is single threaded. For the native InnoDB ALTER TABLE, MDEV-16281 has been filed for implementing multi-threaded data loading or index creation. Our version of InnoDB does not natively support any data type conversions (such as INT to BIGINT or CHAR to VARCHAR). Theoretically a VARCHAR to TEXT conversion can be executed as a metadata-only change (no copying needed).

I do not have any idea how hard it would be reimplement this cross-engine ALTER TABLE to make use of multiple threads. The current storage engine APIs that it invokes are row oriented, expected to update all indexes of the table for each row.

Marko Mäkelä added a comment - 2023-12-19 16:39 bjquinn , I think that the problem that you are highlighting is that ALTER TABLE is single threaded. For the native InnoDB ALTER TABLE , MDEV-16281 has been filed for implementing multi-threaded data loading or index creation. Our version of InnoDB does not natively support any data type conversions (such as INT to BIGINT or CHAR to VARCHAR ). Theoretically a VARCHAR to TEXT conversion can be executed as a metadata-only change (no copying needed). I do not have any idea how hard it would be reimplement this cross-engine ALTER TABLE to make use of multiple threads. The current storage engine APIs that it invokes are row oriented, expected to update all indexes of the table for each row.

BJ Quinn added a comment - 2023-12-19 16:45

Using multiple threads would certainly dramatically reduce the time it takes to complete the ALTER, but what I'm seeing here is that it doesn't even consistently use the single thread that it can use. It uses 100% of that thread for a few seconds, and then goes idle for a period, and keeps cycling between active and idle. The ability to use multiple threads would help, but if it would at least stay constantly active on a single thread, it seems like it would complete much faster than what I'm seeing now. I only have a little data, but I measured the rate at which the progress counter increases while the CPU is active vs how much progress it made in 5 days, and it was a 10:1 ratio.

BJ Quinn added a comment - 2023-12-19 16:45 Using multiple threads would certainly dramatically reduce the time it takes to complete the ALTER, but what I'm seeing here is that it doesn't even consistently use the single thread that it can use. It uses 100% of that thread for a few seconds, and then goes idle for a period, and keeps cycling between active and idle. The ability to use multiple threads would help, but if it would at least stay constantly active on a single thread, it seems like it would complete much faster than what I'm seeing now. I only have a little data, but I measured the rate at which the progress counter increases while the CPU is active vs how much progress it made in 5 days, and it was a 10:1 ratio.

VAROQUI Stephane added a comment - 2023-12-19 17:22 - edited

I guess you have nothing special about your primary key, with no extra load as Elena comment in benchmarking with no PK table would possibly turn to an infinite time.

This 3TB table have to be read fist and i guess you are not having 3TB memory. How many random read IO/s can your disk produce single thread? Is the table fragmented ? SAS disk ? hope you are having a RAID of multiple NVME or a SAN capable of 100K reads io/s. What FS ?

Let's suppose you have 500 io reads/s on fragmented table hypothesis , if no IOps are merged (ZFS case) on FS , reading 16K page it's about 0.8M/s it's about 1000*1000*3/3600/24 = 34.7 days just to read the full table

For writing, InnoDB can benefit multiple io writers innodb_write_io_threads and have a speed limit of of iinnodb_io_capacity but based on the size of indexes and how much feat in memory the innodb have also to read again to maintain index rotation of index that can not feat memory. So only solution to maintains such big tables without covering full index size in memory is via partitioning ,: Is the table partitioned ?

VAROQUI Stephane added a comment - 2023-12-19 17:22 - edited I guess you have nothing special about your primary key, with no extra load as Elena comment in benchmarking with no PK table would possibly turn to an infinite time. This 3TB table have to be read fist and i guess you are not having 3TB memory. How many random read IO/s can your disk produce single thread? Is the table fragmented ? SAS disk ? hope you are having a RAID of multiple NVME or a SAN capable of 100K reads io/s. What FS ? Let's suppose you have 500 io reads/s on fragmented table hypothesis , if no IOps are merged (ZFS case) on FS , reading 16K page it's about 0.8M/s it's about 1000*1000*3/3600/24 = 34.7 days just to read the full table For writing, InnoDB can benefit multiple io writers innodb_write_io_threads and have a speed limit of of iinnodb_io_capacity but based on the size of indexes and how much feat in memory the innodb have also to read again to maintain index rotation of index that can not feat memory. So only solution to maintains such big tables without covering full index size in memory is via partitioning ,: Is the table partitioned ?

BJ Quinn added a comment - 2023-12-19 18:08

Thanks for the feedback!

PK is simple, it's just a single int field.

We have 512GB RAM with 360GB allocated to the innodb buffer pool. So yes that is smaller than the table.

However, we have 10x Solidigm NVMe SSDs in RAID10, so we have lots and lots of available disk I/O. Filesystem is zfs (so it's not really RAID10, it's striped mirrored vdevs). I do not think disk I/O is the bottleneck. Even in the "active" periods, the disks are only 10-15% active.

A single core gets 100% busy, which seems to be the bottleneck. But this is only during the active periods, which are a fraction of the overall time. The system (disk, CPU, etc.) is usually inactive, 0% active, while the ALTER is running. No progress shown in the progress column during these inactive periods.

The table is not currently partitioned. I am open to partitioning the table, though I'd imaging I would have to go through the same long ALTER process to get it partitioned in the first place, so it would still be useful to figure out what this whole active/inactive period thing is.

BJ Quinn added a comment - 2023-12-19 18:08 Thanks for the feedback! PK is simple, it's just a single int field. We have 512GB RAM with 360GB allocated to the innodb buffer pool. So yes that is smaller than the table. However, we have 10x Solidigm NVMe SSDs in RAID10, so we have lots and lots of available disk I/O. Filesystem is zfs (so it's not really RAID10, it's striped mirrored vdevs). I do not think disk I/O is the bottleneck. Even in the "active" periods, the disks are only 10-15% active. A single core gets 100% busy, which seems to be the bottleneck. But this is only during the active periods, which are a fraction of the overall time. The system (disk, CPU, etc.) is usually inactive, 0% active, while the ALTER is running. No progress shown in the progress column during these inactive periods. The table is not currently partitioned. I am open to partitioning the table, though I'd imaging I would have to go through the same long ALTER process to get it partitioned in the first place, so it would still be useful to figure out what this whole active/inactive period thing is.

VAROQUI Stephane added a comment - 2023-12-19 19:00

binlog_cache_size & binlog_stmt_cache_size are used by this MDEV worth checking the impact

VAROQUI Stephane added a comment - 2023-12-19 19:00 binlog_cache_size & binlog_stmt_cache_size are used by this MDEV worth checking the impact

BJ Quinn added a comment - 2023-12-19 22:14

Thanks, I tried binlog_cache_size=10485760 and binlog_stmt_cache_size=10485760 (10MB) but that did not seem to have an effect.

BJ Quinn added a comment - 2023-12-19 22:14 Thanks, I tried binlog_cache_size=10485760 and binlog_stmt_cache_size=10485760 (10MB) but that did not seem to have an effect.

Elena Stepanova added a comment - 2023-12-19 22:28

Since the previous comments suggest that the observed slowness is not specific to the online alter (non-online ALTER is similarly slow), I suppose online alter tuning with binlog variables is unlikely to help here.

Elena Stepanova added a comment - 2023-12-19 22:28 Since the previous comments suggest that the observed slowness is not specific to the online alter (non-online ALTER is similarly slow), I suppose online alter tuning with binlog variables is unlikely to help here.

Marko Mäkelä added a comment - 2023-12-20 12:24

I filed ~~MDEV-33087~~ for the bug that the copy_data_between_tables() phase for InnoDB is not making use of the ~~MDEV-24621~~ optimization.

Marko Mäkelä added a comment - 2023-12-20 12:24 I filed MDEV-33087 for the bug that the copy_data_between_tables() phase for InnoDB is not making use of the MDEV-24621 optimization.

Nikita Malyavin made changes - 2023-12-20 13:15

Link

This issue causes MDEV-33094 [ MDEV-33094 ]

BJ Quinn added a comment - 2023-12-20 21:54

Thanks! Do you think that's what's causing the alternating active/inactive cycles, or is it something that's affecting the overall efficiency of the ALTER?

BJ Quinn added a comment - 2023-12-20 21:54 Thanks! Do you think that's what's causing the alternating active/inactive cycles, or is it something that's affecting the overall efficiency of the ALTER?

Marko Mäkelä added a comment - 2023-12-21 07:47

MDEV-33094 was filed for further optimizing the online log application. It is currently writing undo log records inside InnoDB, for no good reason.

bjquinn, I do not have any idea what could be causing the active/inactive cycles. Would it be possible to collect stack traces of all threads (attach a debugger to the running process) while the system is inactive? (Or just something like http://poormansprofiler.org once per second?) Also, a system profiler like perf or offcputime could be helpful, but the latter is tricky because you’d typically need all code to be compiled with -fno-omit-frame-pointer in order to get meaningful stack traces (because the stack unwinder in the Linux kernel requires frame pointers; see 1234). Back in September, I successfully used offcputime in ~~MDEV-32050~~ to identify one bottleneck that I was completely unaware of.

Marko Mäkelä added a comment - 2023-12-21 07:47 MDEV-33094 was filed for further optimizing the online log application. It is currently writing undo log records inside InnoDB, for no good reason. bjquinn , I do not have any idea what could be causing the active/inactive cycles. Would it be possible to collect stack traces of all threads (attach a debugger to the running process) while the system is inactive? (Or just something like http://poormansprofiler.org once per second?) Also, a system profiler like perf or offcputime could be helpful, but the latter is tricky because you’d typically need all code to be compiled with -fno-omit-frame-pointer in order to get meaningful stack traces (because the stack unwinder in the Linux kernel requires frame pointers; see 1234 ). Back in September, I successfully used offcputime in MDEV-32050 to identify one bottleneck that I was completely unaware of.

BJ Quinn added a comment - 2023-12-21 17:01 - edited

Thanks Marko. I should be able to set it up to capture the stack traces, this system is not yet in production so I should be able to do whatever is necessary. I'll try to get that to you soon.

Stéphane also had a good suggestion to test mysql -e ’’select * from bigtable’ > /dev/null and see if I get a similar active/inactive cycle. I did not, but it does settle on about 65% CPU usage over time after starting at 100% CPU usage. Disks are 10% to 25% busy, so I don't think that's the bottleneck here.

EDIT: I'm going to be out of town the next couple of weeks so some of this data might be delayed.

BJ Quinn added a comment - 2023-12-21 17:01 - edited Thanks Marko. I should be able to set it up to capture the stack traces, this system is not yet in production so I should be able to do whatever is necessary. I'll try to get that to you soon. Stéphane also had a good suggestion to test mysql -e ’’select * from bigtable’ > /dev/null and see if I get a similar active/inactive cycle. I did not, but it does settle on about 65% CPU usage over time after starting at 100% CPU usage. Disks are 10% to 25% busy, so I don't think that's the bottleneck here. EDIT: I'm going to be out of town the next couple of weeks so some of this data might be delayed.

BJ Quinn made changes - 2024-01-11 21:44

Attachment

output.txt [ 72780 ]

BJ Quinn added a comment - 2024-01-11 21:44

Marko, attached is the result of poormansprofiler.org, ran once a second while the CPU was idle. Please let me know if this is helpful or you need me to collect this data any differently.

One thing I noticed that Stephane pointed out was that it got better, at least early on, if innodb_log_file_size was set larger, but as the table I'm testing with is much larger than I can reasonably set innodb_log_file_size to, the issue recurs after a while anyway. But it may be related.

Thanks!! output.txt

BJ Quinn added a comment - 2024-01-11 21:44 Marko, attached is the result of poormansprofiler.org, ran once a second while the CPU was idle. Please let me know if this is helpful or you need me to collect this data any differently. One thing I noticed that Stephane pointed out was that it got better, at least early on, if innodb_log_file_size was set larger, but as the table I'm testing with is much larger than I can reasonably set innodb_log_file_size to, the issue recurs after a while anyway. But it may be related. Thanks!! output.txt

BJ Quinn added a comment - 2024-01-17 20:44

Marko, please disregard. This may have ended up being a hardware problem that was affecting multiple identical servers. A firmware bug in our SSDs. In case anyone is interested, here was the problem and apparent solution – https://forum-proxmox-com.translate.goog/threads/nvme-qid-timeout.51579/?_x_tr_sl=de&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=sc

BJ Quinn added a comment - 2024-01-17 20:44 Marko, please disregard. This may have ended up being a hardware problem that was affecting multiple identical servers. A firmware bug in our SSDs. In case anyone is interested, here was the problem and apparent solution – https://forum-proxmox-com.translate.goog/threads/nvme-qid-timeout.51579/?_x_tr_sl=de&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=sc

Elena Stepanova made changes - 2024-01-30 13:57

Link

This issue causes ~~MDEV-33330~~ [ ~~MDEV-33330~~ ]

Rob Schwyzer (Inactive) made changes - 2024-04-04 22:00

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36750 ]

Rob Schwyzer (Inactive) made changes - 2024-04-05 18:34

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36750 ]

Rob Schwyzer (Inactive) made changes - 2024-04-11 19:02

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36796 ]

Jira Automation (IT) made changes - 2024-07-04 07:39

Zendesk Related Tickets

187921

Elena Stepanova made changes - 2024-08-18 20:27

Link

This issue relates to MDEV-34768 [ MDEV-34768 ]

Elena Stepanova made changes - 2024-11-17 22:12

Link

This issue relates to MDEV-34768 [ MDEV-34768 ]

MariaDB Server

Engine-independent online ALTER TABLE

Details

Description

Intro

Implementation

Behavior of different engines

Limitations

[Old part] Challenges

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Git Integration