Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22168

Supporting multiple engines with table partitioning

Details

    • New Feature
    • Status: Stalled (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 12.1
    • Partitioning
    • None

    Description

      Allow different engines in a partitioned table.
      What is mostly needed is allow one to combine InnoDB with S3 tables but other combinations should be possible . InnoDB couldb be a OVERFLOW partition that handles all inserts while the other partitions are only used for read.

      From optimizer point of view, we should be AND the flags from all engines
      that is part of a query. If S3 and InnoDB is in the same query, we would
      loose the clustered key cost optimization, however I think we can survive
      without that.

      To support multiple engines, we probably also need to do MDEV-5271

      Attachments

        Issue Links

          Activity

            Since MySQL 5.6.10 and MariaDB 10.0, the native ALTER TABLE implementation in InnoDB uses a trick that will commit the changes to all partitions within the same transaction, using ha_alter_info->group_commit_ctx.

            I think that in order to be able to support multiple storage engines in a partitioned table, we must remove the ha_alter_info->group_commit_ctx and refactor the handler::commit_inplace_alter_table() interface so that there will be an explicit transaction commit. Even after the code cleanup in MDEV-25180, InnoDB still uses an internal transaction for modifying the data dictionary. Because the lock wait logic was already cleaned up in MDEV-25919, it should not be difficult to switch to use the "user" transaction that is attached to thd_get_ha_data(thd, innodb_hton_ptr).

            Due to the ha_alter_info->group_commit_ctx hack, InnoDB also assumes that the ALTER TABLE operation is executed in the same way on each partition; see MDEV-21832. I believe that this assumption or restriction can be lifted if we switch to use a more DML-like transaction commit.

            If multiple storage engines participate in an ALTER TABLE operation, then we should probably employ a 2-phase commit mechanism, similar to binlog-driven DML transactions that use the internal MySQLXid identifier. The recovery code would have to be extended as well, to instruct each participating storage engine to either commit or roll back the operation. Currently, it is InnoDB that decides whether an ALTER TABLE had been committed or not; the logic around ddl-recovery.log will then decide which of the two .frm files is valid.

            marko Marko Mäkelä added a comment - Since MySQL 5.6.10 and MariaDB 10.0, the native ALTER TABLE implementation in InnoDB uses a trick that will commit the changes to all partitions within the same transaction, using ha_alter_info->group_commit_ctx . I think that in order to be able to support multiple storage engines in a partitioned table, we must remove the ha_alter_info->group_commit_ctx and refactor the handler::commit_inplace_alter_table() interface so that there will be an explicit transaction commit. Even after the code cleanup in MDEV-25180 , InnoDB still uses an internal transaction for modifying the data dictionary. Because the lock wait logic was already cleaned up in MDEV-25919 , it should not be difficult to switch to use the "user" transaction that is attached to thd_get_ha_data(thd, innodb_hton_ptr) . Due to the ha_alter_info->group_commit_ctx hack, InnoDB also assumes that the ALTER TABLE operation is executed in the same way on each partition; see MDEV-21832 . I believe that this assumption or restriction can be lifted if we switch to use a more DML-like transaction commit. If multiple storage engines participate in an ALTER TABLE operation, then we should probably employ a 2-phase commit mechanism, similar to binlog-driven DML transactions that use the internal MySQLXid identifier. The recovery code would have to be extended as well, to instruct each participating storage engine to either commit or roll back the operation. Currently, it is InnoDB that decides whether an ALTER TABLE had been committed or not; the logic around ddl-recovery.log will then decide which of the two .frm files is valid.

            MDEV-27180 aims to implement crash-safe partitioning ALTER TABLE operations. I think that the goal should be that everything is crash-safe.

            marko Marko Mäkelä added a comment - MDEV-27180 aims to implement crash-safe partitioning ALTER TABLE operations. I think that the goal should be that everything is crash-safe.

            How would this work if different storage engines support different sets of capability flags? For example, HA_DUPLICATE_POS is specific to MyISAM and Aria.

            marko Marko Mäkelä added a comment - How would this work if different storage engines support different sets of capability flags? For example, HA_DUPLICATE_POS is specific to MyISAM and Aria.

            https://hackernoon.com/fr/transitions-musicales-tencent-de-clickhouse-%C3%A0-apache-doris

            The takeaway moving from Clickhouse to Doris, one critical point was Doris enable cold and hot data splitting. Today in MariaDB possible via Spider but could put workload at risk via possible queries regression, this task is very important for the all MariaDB ecosystem

            stephane@skysql.com VAROQUI Stephane added a comment - https://hackernoon.com/fr/transitions-musicales-tencent-de-clickhouse-%C3%A0-apache-doris The takeaway moving from Clickhouse to Doris, one critical point was Doris enable cold and hot data splitting. Today in MariaDB possible via Spider but could put workload at risk via possible queries regression, this task is very important for the all MariaDB ecosystem

            MDEV-15471 is another example where the InnoDB storage engine is assuming that all partitions of a table have the same layout, not only the same storage engine.

            marko Marko Mäkelä added a comment - MDEV-15471 is another example where the InnoDB storage engine is assuming that all partitions of a table have the same layout, not only the same storage engine.

            People

              holyfoot Alexey Botchkov
              monty Michael Widenius
              Votes:
              4 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.