[MDEV-22168] Supporting multiple engines with table partitioning - Jira

Details

Type: New Feature
Status: Stalled (View Workflow)
Priority: Critical
Resolution: Unresolved
Fix Version/s: 12.1
Component/s: Partitioning
Labels:
None

Description

Allow different engines in a partitioned table.
What is mostly needed is allow one to combine InnoDB with S3 tables but other combinations should be possible . InnoDB couldb be a OVERFLOW partition that handles all inserts while the other partitions are only used for read.

From optimizer point of view, we should be AND the flags from all engines
that is part of a query. If S3 and InnoDB is in the same query, we would
loose the clustered key cost optimization, however I think we can survive
without that.

To support multiple engines, we probably also need to do ~~MDEV-5271~~

Attachments

Issue Links

blocks

MDEV-28863 Galera testing of Support multiple engines in partitioning

Open

MDEV-29415 status and variables history via system-versioned tables parttion

Open

includes

MDEV-5271 Support engine-defined attributes per partition

Closed

relates to

MDEV-15471 Assertion `new_clustered == ctx->need_rebuild()' failed in ha_innobase::commit_inplace_alter_table

Stalled

MDEV-27180 Fully atomic partitioning DDL operations

In Review

MDEV-22088 S3 partitioning support

Closed

MDEV-27545 migrate server attributes applicable to engines to the storage engine(s)

Open

MDEV-27618 Atomic DDL is not very atomic on partitioned tables

Confirmed

mentioned in: Page Loading...

(3 relates to, 1 mentioned in)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2022-01-19 08:51

Since MySQL 5.6.10 and MariaDB 10.0, the native ALTER TABLE implementation in InnoDB uses a trick that will commit the changes to all partitions within the same transaction, using ha_alter_info->group_commit_ctx.

I think that in order to be able to support multiple storage engines in a partitioned table, we must remove the ha_alter_info->group_commit_ctx and refactor the handler::commit_inplace_alter_table() interface so that there will be an explicit transaction commit. Even after the code cleanup in ~~MDEV-25180~~, InnoDB still uses an internal transaction for modifying the data dictionary. Because the lock wait logic was already cleaned up in ~~MDEV-25919~~, it should not be difficult to switch to use the "user" transaction that is attached to thd_get_ha_data(thd, innodb_hton_ptr).

Due to the ha_alter_info->group_commit_ctx hack, InnoDB also assumes that the ALTER TABLE operation is executed in the same way on each partition; see ~~MDEV-21832~~. I believe that this assumption or restriction can be lifted if we switch to use a more DML-like transaction commit.

If multiple storage engines participate in an ALTER TABLE operation, then we should probably employ a 2-phase commit mechanism, similar to binlog-driven DML transactions that use the internal MySQLXid identifier. The recovery code would have to be extended as well, to instruct each participating storage engine to either commit or roll back the operation. Currently, it is InnoDB that decides whether an ALTER TABLE had been committed or not; the logic around ddl-recovery.log will then decide which of the two .frm files is valid.

Marko Mäkelä added a comment - 2022-01-19 08:51 Since MySQL 5.6.10 and MariaDB 10.0, the native ALTER TABLE implementation in InnoDB uses a trick that will commit the changes to all partitions within the same transaction, using ha_alter_info->group_commit_ctx . I think that in order to be able to support multiple storage engines in a partitioned table, we must remove the ha_alter_info->group_commit_ctx and refactor the handler::commit_inplace_alter_table() interface so that there will be an explicit transaction commit. Even after the code cleanup in MDEV-25180 , InnoDB still uses an internal transaction for modifying the data dictionary. Because the lock wait logic was already cleaned up in MDEV-25919 , it should not be difficult to switch to use the "user" transaction that is attached to thd_get_ha_data(thd, innodb_hton_ptr) . Due to the ha_alter_info->group_commit_ctx hack, InnoDB also assumes that the ALTER TABLE operation is executed in the same way on each partition; see MDEV-21832 . I believe that this assumption or restriction can be lifted if we switch to use a more DML-like transaction commit. If multiple storage engines participate in an ALTER TABLE operation, then we should probably employ a 2-phase commit mechanism, similar to binlog-driven DML transactions that use the internal MySQLXid identifier. The recovery code would have to be extended as well, to instruct each participating storage engine to either commit or roll back the operation. Currently, it is InnoDB that decides whether an ALTER TABLE had been committed or not; the logic around ddl-recovery.log will then decide which of the two .frm files is valid.

Marko Mäkelä added a comment - 2022-09-19 13:58

MDEV-27180 aims to implement crash-safe partitioning ALTER TABLE operations. I think that the goal should be that everything is crash-safe.

Marko Mäkelä added a comment - 2022-09-19 13:58 MDEV-27180 aims to implement crash-safe partitioning ALTER TABLE operations. I think that the goal should be that everything is crash-safe.

Marko Mäkelä added a comment - 2022-12-16 07:33

How would this work if different storage engines support different sets of capability flags? For example, HA_DUPLICATE_POS is specific to MyISAM and Aria.

Marko Mäkelä added a comment - 2022-12-16 07:33 How would this work if different storage engines support different sets of capability flags? For example, HA_DUPLICATE_POS is specific to MyISAM and Aria.

VAROQUI Stephane added a comment - 2023-11-02 09:08

https://hackernoon.com/fr/transitions-musicales-tencent-de-clickhouse-%C3%A0-apache-doris

The takeaway moving from Clickhouse to Doris, one critical point was Doris enable cold and hot data splitting. Today in MariaDB possible via Spider but could put workload at risk via possible queries regression, this task is very important for the all MariaDB ecosystem

VAROQUI Stephane added a comment - 2023-11-02 09:08 https://hackernoon.com/fr/transitions-musicales-tencent-de-clickhouse-%C3%A0-apache-doris The takeaway moving from Clickhouse to Doris, one critical point was Doris enable cold and hot data splitting. Today in MariaDB possible via Spider but could put workload at risk via possible queries regression, this task is very important for the all MariaDB ecosystem

Marko Mäkelä added a comment - 2024-09-24 04:45

MDEV-15471 is another example where the InnoDB storage engine is assuming that all partitions of a table have the same layout, not only the same storage engine.

Marko Mäkelä added a comment - 2024-09-24 04:45 MDEV-15471 is another example where the InnoDB storage engine is assuming that all partitions of a table have the same layout, not only the same storage engine.

MariaDB Server

Supporting multiple engines with table partitioning

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration