Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30649

ALTER TABLE ENGINE doesn't copy data from SPIDER tables

Details

    Description

      If I use the CONNECT engine to federate data from another table, and then issue something like the following on the CONNECT table:

      ALTER TABLE tbl1 ENGINE=InnoDB
      

      When converting from CONNECT to InnoDB, both the schema and table contents will be copied.

      If I instead use the SPIDER engine to federate the access, only the schema definition is modified with the previous SQL statement, but no data is copied into the local table, instead the table remain empty.

      My particular intent is to archive legacy data, moving it from one physical server to another after it has aged out.

      I would simply use the CONNECT engine itself, but it fails to support SETs, ENUMs, and NULLable indexes.

      Attachments

        Issue Links

          Activity

            darkain Vincent Milum Jr created issue -

            Going from SPIDER to S3, neither of these work

            ALTER TABLE spdr1 ENGINE=S3;
             
            CREATE TABLE aws1 ENGINE=S3 (SELECT * FROM spdr1);
            

            darkain Vincent Milum Jr added a comment - Going from SPIDER to S3, neither of these work ALTER TABLE spdr1 ENGINE=S3;   CREATE TABLE aws1 ENGINE=S3 ( SELECT * FROM spdr1);
            darkain Vincent Milum Jr made changes -
            Field Original Value New Value
            ycp Yuchen Pei made changes -
            Assignee Yuchen Pei [ JIRAUSER52627 ]
            ycp Yuchen Pei added a comment -

            mtr testcase

            src/storage/spider/mysql-test/spider/bugfix/t/mdev_30649.test

            --echo #
            --echo # MDEV-30649 ALTER TABLE ENGINE doesn't copy data from SPIDER tables
            --echo #
             
            --disable_query_log
            --disable_result_log
            --source ../../t/test_init.inc
            --source include/have_innodb.inc
            --enable_result_log
            --enable_query_log
             
            --connection child2_1
            CREATE DATABASE auto_test_remote;
            USE auto_test_remote;
            eval CREATE TABLE tbl_a (
                a INT
            ) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
             
            INSERT INTO tbl_a VALUES(42);
             
            --connection master_1
            CREATE DATABASE auto_test_local;
            USE auto_test_local;
            eval CREATE TABLE tbl_a (
                a INT
            ) $MASTER_1_ENGINE COMMENT='table "tbl_a", srv "s_2_1"';
             
            SELECT * FROM tbl_a;
             
            ALTER TABLE tbl_a ENGINE=InnoDB;
             
            SELECT * FROM tbl_a;
             
            --connection master_1
            DROP DATABASE IF EXISTS auto_test_local;
             
            --connection child2_1
            DROP DATABASE IF EXISTS auto_test_remote;
             
            --disable_query_log
            --disable_result_log
            --source ../t/test_deinit.inc
            --enable_query_log
            --enable_result_log

            src/storage/spider/mysql-test/spider/bugfix/t/mdev_30649.cnf

            !include include/default_mysqld.cnf
            !include ../my_1_1.cnf
            !include ../my_2_1.cnf

            ycp Yuchen Pei added a comment - mtr testcase src/storage/spider/mysql-test/spider/bugfix/t/mdev_30649.test --echo # --echo # MDEV-30649 ALTER TABLE ENGINE doesn't copy data from SPIDER tables --echo #   --disable_query_log --disable_result_log --source ../../t/test_init.inc --source include/have_innodb.inc --enable_result_log --enable_query_log   --connection child2_1 CREATE DATABASE auto_test_remote; USE auto_test_remote; eval CREATE TABLE tbl_a ( a INT ) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;   INSERT INTO tbl_a VALUES (42);   --connection master_1 CREATE DATABASE auto_test_local; USE auto_test_local; eval CREATE TABLE tbl_a ( a INT ) $MASTER_1_ENGINE COMMENT=' table "tbl_a" , srv "s_2_1" ';   SELECT * FROM tbl_a;   ALTER TABLE tbl_a ENGINE=InnoDB;   SELECT * FROM tbl_a;   --connection master_1 DROP DATABASE IF EXISTS auto_test_local;   --connection child2_1 DROP DATABASE IF EXISTS auto_test_remote;   --disable_query_log --disable_result_log --source ../t/test_deinit.inc --enable_query_log --enable_result_log src/storage/spider/mysql-test/spider/bugfix/t/mdev_30649.cnf !include include/default_mysqld.cnf !include ../my_1_1.cnf !include ../my_2_1.cnf
            elenst Elena Stepanova made changes -
            Fix Version/s 10.6 [ 24028 ]
            ycp Yuchen Pei added a comment - - edited

            Right ralf.gebhardt, this does remind me of a comment in MDEV-16967
            at
            https://jira.mariadb.org/browse/MDEV-16967?focusedCommentId=224301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-224301
            though here the intention is to copy data from the data node, rather
            than pushing down ddl operation there.

            But yeah, the effect of operations on spider tables should be
            documented - which ones operates on the data node, and which ones
            don't.

            ycp Yuchen Pei added a comment - - edited Right ralf.gebhardt , this does remind me of a comment in MDEV-16967 at https://jira.mariadb.org/browse/MDEV-16967?focusedCommentId=224301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-224301 though here the intention is to copy data from the data node, rather than pushing down ddl operation there. But yeah, the effect of operations on spider tables should be documented - which ones operates on the data node, and which ones don't.
            ycp Yuchen Pei added a comment - - edited

            It's intentionally skipped in the spider code

            int ha_spider::rnd_next_internal(
              uchar *buf
            ) {
            //  [... 11 lines elided]
              /* do not copy table data at alter table */
              if (wide_handler->sql_command == SQLCOM_ALTER_TABLE)
                DBUG_RETURN(HA_ERR_END_OF_FILE);
            //  [... 301 lines elided]
            }

            Trace during the alter:

            ha_spider::rnd_next_internal > ha_spider::rnd_next > rr_sequential > READ_RECORD::read_record > mysql_alter_table > mysql_execute_command > mysql_parse > do_command > do_handle_one_connection > pfs_spawn_thread

            Blame gives this 10-year old big commit

            0b116de7c58db3e483964d00e6a3803947bfaf4f
            Author:     Sergey Vojtovich <svoj@mariadb.org>
            AuthorDate: Thu Jun 27 15:18:48 2013 +0400
            Commit:     Sergey Vojtovich <svoj@mariadb.org>
            CommitDate: Thu Jun 27 15:18:48 2013 +0400
             
            MDEV-4438 - Spider storage engine

            Commenting out this check and the alter table does copy the data in
            the testcase in my previous comment, and the only regression in
            existing test is mdev_26539, which is caused by uninitialised
            result_list.internal_offset and result_list.internal_limit - most
            likely a separate issue.

            ycp Yuchen Pei added a comment - - edited It's intentionally skipped in the spider code int ha_spider::rnd_next_internal( uchar *buf ) { // [... 11 lines elided] /* do not copy table data at alter table */ if (wide_handler->sql_command == SQLCOM_ALTER_TABLE) DBUG_RETURN(HA_ERR_END_OF_FILE); // [... 301 lines elided] } Trace during the alter: ha_spider::rnd_next_internal > ha_spider::rnd_next > rr_sequential > READ_RECORD::read_record > mysql_alter_table > mysql_execute_command > mysql_parse > do_command > do_handle_one_connection > pfs_spawn_thread Blame gives this 10-year old big commit 0b116de7c58db3e483964d00e6a3803947bfaf4f Author: Sergey Vojtovich <svoj@mariadb.org> AuthorDate: Thu Jun 27 15:18:48 2013 +0400 Commit: Sergey Vojtovich <svoj@mariadb.org> CommitDate: Thu Jun 27 15:18:48 2013 +0400   MDEV-4438 - Spider storage engine Commenting out this check and the alter table does copy the data in the testcase in my previous comment, and the only regression in existing test is mdev_26539, which is caused by uninitialised result_list.internal_offset and result_list.internal_limit - most likely a separate issue.

            It's just illogical. if you do ALTER TABLE from Spider to, say, InnoDB and have all the data locally, then all data must've been deleted from remote servers. Spider doesn't create or drop remote tables, it's by design. So it cannot copy all data into the local table with ALTER.

            I think it would made a lot of sense to have some kind of a special mode where a spider would push down DDLs. When a local CREATE TABLE ENGINE=SPIDER would create a table on the remote server, when a local DROP would drop a remote table too. And in that mode ALTER TABLE should copy the data, indeed.

            serg Sergei Golubchik added a comment - It's just illogical. if you do ALTER TABLE from Spider to, say, InnoDB and have all the data locally, then all data must've been deleted from remote servers. Spider doesn't create or drop remote tables, it's by design. So it cannot copy all data into the local table with ALTER. I think it would made a lot of sense to have some kind of a special mode where a spider would push down DDLs. When a local CREATE TABLE ENGINE=SPIDER would create a table on the remote server, when a local DROP would drop a remote table too. And in that mode ALTER TABLE should copy the data, indeed.

            Its only illogical until it isn't. By that argument, you shouldn't even be allowed to ALTER TABLE ENGINE on a SPIDER table.

            And going back to the original example, the data copy works exactly as expected for CONNECT engine.

            So why one federated engine type, but not another? I would use CONNECT if not for all the other ways its broken (ENUM/SET, etc), and not supporting clustering/sharding.

            The idea is to archive data, so going from a federated engine like SPIDER directly into the S3 engine, bypassing InnoDB entirely. If going to InnoDB or Aria locally first, then a second ALTER is needed, plus double the temporary storage while the S3 data is generated locally before uploading.

            The entire point is to have a worker MariaDB node act as a data warehouse archival node, pulling data from a larger cluster, then dumping it into S3 when it hits a legacy "read-only" state.

            darkain Vincent Milum Jr added a comment - Its only illogical until it isn't. By that argument, you shouldn't even be allowed to ALTER TABLE ENGINE on a SPIDER table. And going back to the original example, the data copy works exactly as expected for CONNECT engine. So why one federated engine type, but not another? I would use CONNECT if not for all the other ways its broken (ENUM/SET, etc), and not supporting clustering/sharding. The idea is to archive data, so going from a federated engine like SPIDER directly into the S3 engine, bypassing InnoDB entirely. If going to InnoDB or Aria locally first, then a second ALTER is needed, plus double the temporary storage while the S3 data is generated locally before uploading. The entire point is to have a worker MariaDB node act as a data warehouse archival node, pulling data from a larger cluster, then dumping it into S3 when it hits a legacy "read-only" state.
            ycp Yuchen Pei added a comment - - edited

            > I think it would made a lot of sense to have some kind of a special mode where a spider would push down DDLs. When a local CREATE TABLE ENGINE=SPIDER would create a table on the remote server, when a local DROP would drop a remote table too. And in that mode ALTER TABLE should copy the data, indeed.

            Makes sense. DML operations are unambiguous when operating on a spider
            table, as it has no meaning unless pushed down. DDL operations by
            contrast could be interpreted either way, as a change on the local or
            remote table, and spider takes a conservative approach and only
            operates locally.

            So dropping a spider table have two meanings: drop the local table
            only; drop both the local and the remote table.

            But alter table changing the engine say to innodb is even more
            ambiguous, as there are three meanings:
            1. alter the local table only and change its engine from spider to
            innodb
            2. alter the remote table only and change its engine from whatever it
            is to innodb
            3. alter the local table and change its engine from spider to innodb
            and move data from remote to local, drop remote table

            And the request is this ticket is asking for yet a 4th meaning: alter
            the local table and change its engine from spider to innodb, and copy
            data from remote to local.

            In any case first step would be confirming this with some documentation, as
            ralf.gebhardt suggested. For example I'm seeing some interesting
            spider system variables like local_lock_table with the comment
            "Remote server transmission when lock tables is executed at local".

            ycp Yuchen Pei added a comment - - edited > I think it would made a lot of sense to have some kind of a special mode where a spider would push down DDLs. When a local CREATE TABLE ENGINE=SPIDER would create a table on the remote server, when a local DROP would drop a remote table too. And in that mode ALTER TABLE should copy the data, indeed. Makes sense. DML operations are unambiguous when operating on a spider table, as it has no meaning unless pushed down. DDL operations by contrast could be interpreted either way, as a change on the local or remote table, and spider takes a conservative approach and only operates locally. So dropping a spider table have two meanings: drop the local table only; drop both the local and the remote table. But alter table changing the engine say to innodb is even more ambiguous, as there are three meanings: 1. alter the local table only and change its engine from spider to innodb 2. alter the remote table only and change its engine from whatever it is to innodb 3. alter the local table and change its engine from spider to innodb and move data from remote to local, drop remote table And the request is this ticket is asking for yet a 4th meaning: alter the local table and change its engine from spider to innodb, and copy data from remote to local. In any case first step would be confirming this with some documentation, as ralf.gebhardt suggested. For example I'm seeing some interesting spider system variables like local_lock_table with the comment "Remote server transmission when lock tables is executed at local".

            darkain, I understand your use case, it's a perfectly valid one. I only said that in the current Spider logic ALTER TABLE is the wrong command for that. COPY TABLE would've been a better one, if MariaDB had it. Or this new Spider mode where DDL's would apply to remote tables, which we don't have either. But anyway, of course, there must be some solution for your use case, one way or another.

            serg Sergei Golubchik added a comment - darkain , I understand your use case, it's a perfectly valid one. I only said that in the current Spider logic ALTER TABLE is the wrong command for that. COPY TABLE would've been a better one, if MariaDB had it. Or this new Spider mode where DDL's would apply to remote tables, which we don't have either. But anyway, of course, there must be some solution for your use case, one way or another.
            ycp Yuchen Pei added a comment - - edited

            BTW spider has a udf called spider_copy_tables that copies data
            from a src table to a set of dest tables:

            https://mariadb.com/kb/en/spider_copy_tables/

            All the src and dest tables are on the data nodes.

            For example, say ta_l is a spider table with two remotes, then

            SELECT spider_copy_tables('ta_l', '0', '1')

            copies data from the 0th link to the 1st link.

            This udf should satisfy the requirement in the ticket, where the
            dest link is an empty local table. However it uses the deprecated
            Spider HA (MDEV-28479), and it implements its own logic in selecting
            from the src table in chunks of bulk_insert_rows and create
            queries manually inserting them to the destination tables. If we
            just want to implement what is requested in this ticket, i.e.
            copying data from a remote table to a local table, then using the
            existing alter table logic (amounting to removing the check on ALTER
            TABLE in a previous comment) is simpler.

            If we want to keep the functionality of copying from one possibly
            remote table to another possibly remote, then it could be done
            without HA also, with a new implementation of spider_copy_tables,
            that does

            {{SELECT spider_copy_tables(t1, t2)}

            where t1 and t2 could be an arbitrary combination of spider and
            local tables. If t1 is a spider table and t2 is a local table, then
            it copies from the data node of t1 to t2. If both t1 and t2 are
            spider tables, then it copies from the data node of t1 to the data
            node of t2. And so on.

            Special push down DDL mode suffers from ambiguity mentioned in my
            previous comment: say t1 is a spider table, then in the special mode

            alter table t1 engine=innodb

            could mean alter the spider table t1 to an innodb table and move
            the data from the data node table to t1, or it could mean simply
            alter the data node table to an innodb table.

            ycp Yuchen Pei added a comment - - edited BTW spider has a udf called spider_copy_tables that copies data from a src table to a set of dest tables: https://mariadb.com/kb/en/spider_copy_tables/ All the src and dest tables are on the data nodes. For example, say ta_l is a spider table with two remotes, then SELECT spider_copy_tables('ta_l', '0', '1') copies data from the 0th link to the 1st link. This udf should satisfy the requirement in the ticket, where the dest link is an empty local table. However it uses the deprecated Spider HA ( MDEV-28479 ), and it implements its own logic in selecting from the src table in chunks of bulk_insert_rows and create queries manually inserting them to the destination tables. If we just want to implement what is requested in this ticket, i.e. copying data from a remote table to a local table, then using the existing alter table logic (amounting to removing the check on ALTER TABLE in a previous comment) is simpler. If we want to keep the functionality of copying from one possibly remote table to another possibly remote, then it could be done without HA also, with a new implementation of spider_copy_tables, that does {{SELECT spider_copy_tables(t1, t2)} where t1 and t2 could be an arbitrary combination of spider and local tables. If t1 is a spider table and t2 is a local table, then it copies from the data node of t1 to t2. If both t1 and t2 are spider tables, then it copies from the data node of t1 to the data node of t2. And so on. Special push down DDL mode suffers from ambiguity mentioned in my previous comment: say t1 is a spider table, then in the special mode alter table t1 engine=innodb could mean alter the spider table t1 to an innodb table and move the data from the data node table to t1, or it could mean simply alter the data node table to an innodb table.
            ycp Yuchen Pei added a comment -

            ralf.gebhardt, sure - here's how to do it.

            Suppose we have a table src on a remote server, and we want to copy its content to an innodb table dst on the local server using spider.

            Let's say the src table has the following schema: (c INT, d DATE, PRIMARY KEY(c)).

            First, create the dst table with the same schema if it does not already exist:

            CREATE TABLE dst (c INT, d DATE, PRIMARY KEY(c)) ENGINE=INNODB;

            Then, create a spider table t with both src and dst as HA data nodes. Also ensure spider_same_server_link because t and src are on the same server (assuming the src is on SERVER s1 and the local server is defined to be SERVER s2):

            SET spider_same_server_link=1;
            CREATE TABLE t (c INT, d DATE, PRIMARY KEY(c)) ENGINE=SPIDER COMMENT='table "src dst", srv "s1 s2"';

            Finally, copy data from src to dst using spider_copy_tables:

            SELECT spider_copy_tables('t', '0', '1');

            I've also added a testcase which I'll push as a "fix" for this issue once it is reviewed.

            ycp Yuchen Pei added a comment - ralf.gebhardt , sure - here's how to do it. Suppose we have a table src on a remote server, and we want to copy its content to an innodb table dst on the local server using spider. Let's say the src table has the following schema: (c INT, d DATE, PRIMARY KEY(c)) . First, create the dst table with the same schema if it does not already exist: CREATE TABLE dst (c INT , d DATE , PRIMARY KEY (c)) ENGINE=INNODB; Then, create a spider table t with both src and dst as HA data nodes. Also ensure spider_same_server_link because t and src are on the same server (assuming the src is on SERVER s1 and the local server is defined to be SERVER s2): SET spider_same_server_link=1; CREATE TABLE t (c INT , d DATE , PRIMARY KEY (c)) ENGINE=SPIDER COMMENT= 'table "src dst", srv "s1 s2"' ; Finally, copy data from src to dst using spider_copy_tables : SELECT spider_copy_tables( 't' , '0' , '1' ); I've also added a testcase which I'll push as a "fix" for this issue once it is reviewed.
            ycp Yuchen Pei made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            ycp Yuchen Pei added a comment -

            Hi holyfoot, ptal at the new testcase, thanks:

            3b8bfbb8319 bb-10.5-mdev-30649 MDEV-30649 Adding a spider testcase showing copying from a remote to a local table
            

            ycp Yuchen Pei added a comment - Hi holyfoot , ptal at the new testcase, thanks: 3b8bfbb8319 bb-10.5-mdev-30649 MDEV-30649 Adding a spider testcase showing copying from a remote to a local table
            ycp Yuchen Pei made changes -
            Assignee Yuchen Pei [ JIRAUSER52627 ] Alexey Botchkov [ holyfoot ]
            Status Confirmed [ 10101 ] In Review [ 10002 ]

            ok to push.

            holyfoot Alexey Botchkov added a comment - ok to push.
            holyfoot Alexey Botchkov made changes -
            Assignee Alexey Botchkov [ holyfoot ] Yuchen Pei [ JIRAUSER52627 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            ycp Yuchen Pei added a comment -

            thanks for the review - pushed 5be859d52ccefcd8d63ff5b00167383e4d9837df to 10.5

            ycp Yuchen Pei added a comment - thanks for the review - pushed 5be859d52ccefcd8d63ff5b00167383e4d9837df to 10.5
            ycp Yuchen Pei made changes -
            Fix Version/s 10.5.28 [ 29952 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            JIraAutomate JiraAutomate made changes -
            Fix Version/s 10.6.21 [ 29953 ]
            Fix Version/s 10.11.11 [ 29954 ]
            Fix Version/s 11.4.5 [ 29956 ]
            Fix Version/s 11.7.2 [ 29914 ]
            ycp Yuchen Pei made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            ycp Yuchen Pei added a comment -

            As previously mentioned, pushed a testcase showcasing the copy data

            ycp Yuchen Pei added a comment - As previously mentioned, pushed a testcase showcasing the copy data
            ycp Yuchen Pei made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 11.7.2 [ 29914 ]
            Fix Version/s 10.5.28 [ 29952 ]
            Fix Version/s 10.6.21 [ 29953 ]
            Fix Version/s 10.11.11 [ 29954 ]
            Fix Version/s 11.4.5 [ 29956 ]
            Resolution Not a Bug [ 6 ]
            Status Stalled [ 10000 ] Closed [ 6 ]

            People

              ycp Yuchen Pei
              darkain Vincent Milum Jr
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.