Implement ALTER ONLINE TABLE above the storage engine layer by mimicking what InnoDB does since MariaDB 10.0:
- Exclusively lock the table.
- Set up (a separate, per-table one) row-based replication for tracking changes from concurrent DMLs.
- Downgrade the lock.
- Copy the table contents (using a non-locking read if supported by the storage engine).
- Apply changes from the replicated contents.
- Exclusively lock the table (MDL_SHARED_WRITE).
- Apply any remaining replicated changes.
- Swap the old and new table, unlock, drop the old table.
This would remove some limitations that currently exist with the InnoDB-only online table rebuild. Basically, anything that is supported by ALGORITHM=COPY should ‘just work’; the bulk copying could still happen in copy_data_between_tables(). A few examples:
- Arbitrary changes of column type will be possible, without duplicating any conversion logic.
- It will be possible to add virtual columns (materialized or not) together with adding indexes, while allowing concurrent writes (
- The ENGINE or the partitioning of a table can be changed, just like any other attribute.
[Not implemented here] We should remove the online table rebuild code from InnoDB (row_log_table_apply() and friends), and just let InnoDB fall back to this. The only ALTER ONLINE TABLE that could better be implemented inside storage engines would be ADD INDEX. Then, ALGORITHM=INPLACE would no longer be misleading, because it would mean exactly the same as the ALGORITHM=NOCOPY that was introduced in
MDEV-13134. Before this, we must implement MDEV-515 (bulk load into an empty InnoDB table) to avoid a performance regression.
- Embedded server doesn't support LOCK=NONE, Until HAVE_REPLICATION is enabled there (or until some finer refactoring).
- DROP SYSTEM VERSIONING is not currently supported, but the support can be added on demand
- ALTER TABLE ... ORDER BY is not and cannot be supported
We should replicate the online rebuild on slaves in parallel, so that the master and slaves will be able to commit at roughly the same time. This would be something similar to
MDEV-11675, which would still be needed for native online ADD INDEX, which would avoid copying the table.
In InnoDB, there is some logic for logging the changes when the PRIMARY KEY columns are changed, or a PRIMARY KEY is being added. The ‘row event log’ online_log will additionally contain the PRIMARY KEY values in the new table, so that the records can easily be found. The online_log will contain INSERT, UPDATE, and DELETE events.
We will need some interface from ROLLBACK inside the storage engine to the ‘row event log’, so that BEGIN; INSERT; ROLLBACK will also create a DELETE event. Similarly, we will need an interface that allows CASCADE or SET NULL operations from FOREIGN KEY constraints to be relayed to the ‘row event log’.
Starting with MariaDB 10.2, there is an optimization that avoids unnecessarily sorting the data by PRIMARY KEY when the sorting does not change. Search for skip_pk_sort. It would be nice if the future
MDEV-515 code inside InnoDB could be informed of this, so that it can assume that the data is already sorted by PRIMARY KEY.
If there exist FOREIGN KEY constraints on the being-rebuilt table, then this approach should work just as fine as the current online table rebuild in InnoDB: The constraints would be enforced on the old copy of the table until the very end where we switch the tables, and from that point on, on the new copy of the table.
Initially, we could disable ONLINE…ADD FOREIGN KEY. That could be easier to implement after moving the FOREIGN KEY processing from InnoDB to the SQL layer.