[MDEV-22168] Supporting multiple engines with table partitioning Created: 2020-04-06  Updated: 2024-02-04

Status: Stalled
Project: MariaDB Server
Component/s: Partitioning
Fix Version/s: 11.5

Type: New Feature Priority: Critical
Reporter: Michael Widenius Assignee: Alexey Botchkov
Resolution: Unresolved Votes: 3
Labels: None

Issue Links:
Blocks
blocks MDEV-29415 status and variables history via syst... Open
PartOf
includes MDEV-5271 Support engine-defined attributes per... Closed
Relates
relates to MDEV-27180 Fully atomic partitioning DDL operations Stalled
relates to MDEV-22088 S3 partitioning support Closed
relates to MDEV-27545 Feature request: migrate server attri... Open
relates to MDEV-27618 Atomic DDL is not very atomic on part... Open
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MDEV-28863 Galera testing of Support multiple en... Technical task Open Ramesh Sivaraman  

 Description   

Allow different engines in a partitioned table.
What is mostly needed is allow one to combine InnoDB with S3 tables but other combinations should be possible . InnoDB couldb be a OVERFLOW partition that handles all inserts while the other partitions are only used for read.

From optimizer point of view, we should be AND the flags from all engines
that is part of a query. If S3 and InnoDB is in the same query, we would
loose the clustered key cost optimization, however I think we can survive
without that.

To support multiple engines, we probably also need to do MDEV-5271



 Comments   
Comment by Marko Mäkelä [ 2022-01-19 ]

Since MySQL 5.6.10 and MariaDB 10.0, the native ALTER TABLE implementation in InnoDB uses a trick that will commit the changes to all partitions within the same transaction, using ha_alter_info->group_commit_ctx.

I think that in order to be able to support multiple storage engines in a partitioned table, we must remove the ha_alter_info->group_commit_ctx and refactor the handler::commit_inplace_alter_table() interface so that there will be an explicit transaction commit. Even after the code cleanup in MDEV-25180, InnoDB still uses an internal transaction for modifying the data dictionary. Because the lock wait logic was already cleaned up in MDEV-25919, it should not be difficult to switch to use the "user" transaction that is attached to thd_get_ha_data(thd, innodb_hton_ptr).

Due to the ha_alter_info->group_commit_ctx hack, InnoDB also assumes that the ALTER TABLE operation is executed in the same way on each partition; see MDEV-21832. I believe that this assumption or restriction can be lifted if we switch to use a more DML-like transaction commit.

If multiple storage engines participate in an ALTER TABLE operation, then we should probably employ a 2-phase commit mechanism, similar to binlog-driven DML transactions that use the internal MySQLXid identifier. The recovery code would have to be extended as well, to instruct each participating storage engine to either commit or roll back the operation. Currently, it is InnoDB that decides whether an ALTER TABLE had been committed or not; the logic around ddl-recovery.log will then decide which of the two .frm files is valid.

Comment by Marko Mäkelä [ 2022-09-19 ]

MDEV-27180 aims to implement crash-safe partitioning ALTER TABLE operations. I think that the goal should be that everything is crash-safe.

Comment by Marko Mäkelä [ 2022-12-16 ]

How would this work if different storage engines support different sets of capability flags? For example, HA_DUPLICATE_POS is specific to MyISAM and Aria.

Comment by VAROQUI Stephane [ 2023-11-02 ]

https://hackernoon.com/fr/transitions-musicales-tencent-de-clickhouse-%C3%A0-apache-doris

The takeaway moving from Clickhouse to Doris, one critical point was Doris enable cold and hot data splitting. Today in MariaDB possible via Spider but could put workload at risk via possible queries regression, this task is very important for the all MariaDB ecosystem

Generated at Thu Feb 08 09:12:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.