Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16610

Torn reads from spider, with XA and isolation level REPEATABLE READ

Details

    Description

      Writes through spider on external datanodes are guaranteed to be consistent through XA, however reads are no consistent, but can be torn. Meaning spider cannot be considered to be an ACID engine, even with SERIALIZABLE isolation level and XA. Where a simple transaction such as inserting or deleting two rows atomically, can still be seen during read as only 1 row exists.

      On the spider head node:

      create database spider_replication_test;
      create table t2 (
        id int unsigned PRIMARY KEY,
        filler varchar(64) DEFAULT 'Default Filler non-sense!'
      ) ENGINE=SPIDER COMMENT='wrapper = "mysql", table = "t2", database = "spider_replication_test"'
      PARTITION BY HASH (`id`)
      (PARTITION `p0` COMMENT = 'srv "datanode1"' ENGINE = SPIDER,
       PARTITION `p1` COMMENT = 'srv "datanode2"' ENGINE = SPIDER,
       PARTITION `p2` COMMENT = 'srv "datanode3"' ENGINE = SPIDER,
       PARTITION `p3` COMMENT = 'srv "datanode4"' ENGINE = SPIDER)
       
       
      DELIMITER //
       
      DROP PROCEDURE IF EXISTS insert_and_delete//
      CREATE PROCEDURE insert_and_delete(IN loops INT)
        BEGIN
          SET @x = 1;
          SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
          # To make it easier to find in the query.log:
          SELECT * FROM spider_replication_test.t2 WHERE filler NOT LIKE 'START OF insert_and_delete';
          REPEAT
              SET @x = @x + 1;
              start transaction;
              INSERT INTO spider_replication_test.t2 (id, filler) VALUES (0, 'p0 consistent view check'), (1, 'p1 consistent view check'), (2, 'p2 consistent view check'), (3, 'p3 consistent view check');
              commit;
              start transaction;
              DELETE FROM spider_replication_test.t2 WHERE id IN (0,1,2,3);
              commit;
          UNTIL @x > loops END REPEAT;
          # To make it easier to find in the query.log:
          SELECT * FROM spider_replication_test.t2 WHERE filler NOT LIKE 'END OF insert_and_delete';
        END
      //
      DROP PROCEDURE IF EXISTS check_select_consistency//
      CREATE PROCEDURE check_select_consistency(IN loops INT, OUT outparam INT)
        BEGIN
          DECLARE var_sum INT;
          SET @x = 1;
          SET outparam = 0;
          SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
          # To make it easier to find in the query.log:
          SELECT * FROM spider_replication_test.t2 WHERE filler NOT LIKE 'START OF check_select_consistency';
          REPEAT
              SET @x = @x + 1;
              SELECT sum(id) INTO var_sum FROM spider_replication_test.t2 WHERE id in (0,1,2,3);
              IF (var_sum IS NOT NULL AND var_sum != 6 ) THEN
                  SET outparam = outparam + 1;
              END IF;
       
          UNTIL @x > loops END REPEAT;
          # To make it easier to find in the query.log:
          SELECT * FROM spider_replication_test.t2 WHERE filler NOT LIKE 'END OF check_select_consistency';
        END
      //
      DELIMITER ;
      

      Insert some minimal data to avoid 0/1 row optimisations:

      INSERT INTO t2 (id, filler) VALUES (4, 'p0 dummy data'), (5, 'p1 dummy data'), (6, 'p2 dummy data'), (7, 'p3 dummy data'), (8, 'P0 dummy data 2'), (9, 'P1 dummy data 2'), (10, 'P2 dummy data 2'), (11, 'P3 dummy data 2');
      

      On the data nodes:

      use spider_replication_test;
      create table t2 (
        id int unsigned PRIMARY KEY,
        filler varchar(64) DEFAULT 'Default Filler non-sense!'
      ) ENGINE=INNODB;
      

      Spider head variables:

      spider-head [spider_replication_test]> show variables WHERE Variable_name like '%spider%' AND value not like '-1';
      +---------------------------------------+--------+
      | Variable_name                         | Value  |
      +---------------------------------------+--------+
      | spider_bka_engine                     |        |
      | spider_block_size                     | 16384  |
      | spider_conn_recycle_mode              | 0      |
      | spider_conn_recycle_strict            | 0      |
      | spider_conn_wait_timeout              | 10     |
      | spider_connect_error_interval         | 1      |
      | spider_connect_mutex                  | OFF    |
      | spider_connect_retry_count            | 1000   |
      | spider_connect_retry_interval         | 1000   |
      | spider_direct_dup_insert              | 1      |
      | spider_dry_access                     | OFF    |
      | spider_force_commit                   | 1      |
      | spider_general_log                    | ON     |
      | spider_index_hint_pushdown            | OFF    |
      | spider_internal_unlock                | OFF    |
      | spider_internal_xa                    | ON     |
      | spider_internal_xa_id_type            | 0      |
      | spider_internal_xa_snapshot           | 0      |
      | spider_local_lock_table               | OFF    |
      | spider_lock_exchange                  | OFF    |
      | spider_log_result_error_with_sql      | 3      |
      | spider_log_result_errors              | 3      |
      | spider_max_connections                | 0      |
      | spider_net_read_timeout               | 86400  |
      | spider_net_write_timeout              | 86400  |
      | spider_ping_interval_at_trx_start     | 3600   |
      | spider_quick_mode                     | 3      |
      | spider_remote_access_charset          |        |
      | spider_remote_default_database        |        |
      | spider_remote_time_zone               | SYSTEM |
      | spider_same_server_link               | OFF    |
      | spider_semi_table_lock                | 1      |
      | spider_semi_trx                       | ON     |
      | spider_support_xa                     | ON     |
      | spider_sync_autocommit                | ON     |
      | spider_sync_time_zone                 | ON     |
      | spider_sync_trx_isolation             | ON     |
      | spider_table_crd_thread_count         | 10     |
      | spider_table_init_error_interval      | 1      |
      | spider_table_sts_thread_count         | 10     |
      | spider_udf_table_lock_mutex_count     | 20     |
      | spider_udf_table_mon_mutex_count      | 20     |
      | spider_use_all_conns_snapshot         | OFF    |
      | spider_use_consistent_snapshot        | OFF    |
      | spider_use_default_database           | ON     |
      | spider_use_flash_logs                 | OFF    |
      | spider_use_snapshot_with_flush_tables | 0      |
      | spider_version                        | 3.3.13 |
      | spider_xa_register_mode               | 1      |
      +---------------------------------------+--------+
      49 rows in set (0.00 sec)
      

      Spider head session 1

      spider-head [spider_replication_test]> call insert_and_delete(1000);
      Empty set (0.01 sec)
       
      Empty set (20.42 sec)
       
      Query OK, 8000 rows affected (20.42 sec)
      

      Spider head session 2 Concurrently as session 1, showing >15% torn reads!

      spider-head [spider_replication_test]> call check_select_consistency(2000, @o); select @o;
      Empty set (0.01 sec)
       
      Empty set (15.89 sec)
       
      Query OK, 2000 rows affected (15.89 sec)
       
      +------+
      | @o   |
      +------+
      |  385 |
      +------+
      1 row in set (0.00 sec)
      

      From query.log on datanode showing concurrent reads and writes, see '5785670 Query select a.id,sum(b.`id`)...' showing reads in 'in-doubt' XA state between prepare and commit state:

                      5785669 Connect spider_test@spider-head.example.com as anonymous on 
                      5785669 Query   set session transaction isolation level read committed;set session autocommit = 1;xa start 0x3339613139,0x63656634376134,1
                      5785669 Query   SET NAMES utf8mb4
                      5785669 Init DB spider_replication_test
                      5785669 Query   drop temporary table if exists spider_replication_test.tmp_spider_bka_0x7f79d40a9870;create temporary table spider_replication_test.tmp_spider_bka_0x7f79d40a9870(id bigint,c0 int(10) unsigned)engine=memory default charset=utf8mb4 collate utf8mb4_general_ci;insert into spider_replication_test.tmp_spider_bka_0x7f79d40a9870(id,c0)values(0,0),(1,1),(2,2),(3,3)
                      5785669 Query   select a.id,b.`id` from spider_replication_test.tmp_spider_bka_0x7f79d40a9870 a,`spider_replication_test`.`t2` b where a.c0 <=> b.`id` and (b.`id` in( 0 , 1 , 2 , 3)) for update
                      5785669 Query   delete from `spider_replication_test`.`t2` where `id` = 0 limit 1
                      5785669 Query   drop temporary table if exists spider_replication_test.tmp_spider_bka_0x7f79d40a9870
                      5785669 Query   xa end 0x3339613139,0x63656634376134,1
                      5785669 Query   xa prepare 0x3339613139,0x63656634376134,1
                      5785670 Connect spider_test@spider-head.example.com as anonymous on 
                      5785670 Query   set session transaction isolation level read committed;set session autocommit = 1;xa start 0x3339633438,0x63656634376134,1
                      5785670 Query   SET NAMES utf8mb4
                      5785670 Init DB spider_replication_test
                      5785670 Query   drop temporary table if exists spider_replication_test.tmp_spider_bka_0x7f79d40a9870;create temporary table spider_replication_test.tmp_spider_bka_0x7f79d40a9870(id bigint,c0 int(10) unsigned)engine=memory default charset=utf8mb4 collate utf8mb4_general_ci;insert into spider_replication_test.tmp_spider_bka_0x7f79d40a9870(id,c0)values(0,0),(1,1),(2,2),(3,3)
                      5785670 Query   select a.id,sum(b.`id`),b.`id` from spider_replication_test.tmp_spider_bka_0x7f79d40a9870 a,`spider_replication_test`.`t2` b where a.c0 <=> b.`id` and (b.`id` in( 0 , 1 , 2 , 3))
                      5785669 Query   xa commit 0x3339613139,0x63656634376134,1
                      5785669 Quit    
                      5785671 Connect spider_test@spider-head.example.com as anonymous on 
                      5785671 Query   set session transaction isolation level read committed;set session autocommit = 1;xa start 0x3339613139,0x63656634376134,1
                      5785671 Query   SET NAMES utf8mb4
                      5785671 Init DB spider_replication_test
                      5785671 Query   insert high_priority into `spider_replication_test`.`t2`(`id`,`filler`)values(0,'p0 consistent view check')
                      5785670 Query   drop temporary table if exists spider_replication_test.tmp_spider_bka_0x7f79d40a9870
                      5785670 Query   xa end 0x3339633438,0x63656634376134,1
                      5785670 Query   xa prepare 0x3339633438,0x63656634376134,1
                      5785670 Query   xa commit 0x3339633438,0x63656634376134,1
                      5785671 Query   xa end 0x3339613139,0x63656634376134,1
                      5785670 Quit    
      

      I think this can be fixed by not allowing queries in spider to cross XA commit, i.e. XA commit needs to wait for currently running queries before sent to any node and new queries needs to be blocked before XA commit is completed on all nodes. But this will be limiting the parallelism and performance.

      Attachments

        Issue Links

          Activity

            serg Sergei Golubchik added a comment - - edited

            I don't think it is. "start with consistent snapshot" will simply ask all engines to start a transaction now. It'll be still sequential, so race conditions are possible. And from the spider point of view it'll be one call, something like spider_hton->start_consistent_snapshot() — where spider will need to iterate all nodes and start a consistent snapshot there, which, again, is not atomic and prone to race conditions.

            Generally, I think, DTP XA standard only provides proper isolation on the SERIALIZABLE isolation level.

            On the other hand, if you serialize "starting a transaction on all nodes" code, for example, with a mutex. Then you'll have consistent reads, but at the cost of reduced concurrency.

            Anyway, as far as I can see in no case MDEV-14090 will be of any use to you here, it looks quite unrelated.

            serg Sergei Golubchik added a comment - - edited I don't think it is. "start with consistent snapshot" will simply ask all engines to start a transaction now. It'll be still sequential, so race conditions are possible. And from the spider point of view it'll be one call, something like spider_hton->start_consistent_snapshot() — where spider will need to iterate all nodes and start a consistent snapshot there, which, again, is not atomic and prone to race conditions. Generally, I think, DTP XA standard only provides proper isolation on the SERIALIZABLE isolation level. On the other hand, if you serialize "starting a transaction on all nodes" code, for example, with a mutex. Then you'll have consistent reads, but at the cost of reduced concurrency. Anyway, as far as I can see in no case MDEV-14090 will be of any use to you here, it looks quite unrelated.

            serg
            What I would like to tell is like "start with consistent snapshot at '2019-05-16 01:21:00.000000000'". As same as, commit is like "commit at '2019-05-16 01:21:00.000000000'".
            I assume that use same time to all nodes make it possible to access datas with consistent views without any locks.
            Should I create new jira task instead of MDEV-14090?

            Kentoku Kentoku Shiba (Inactive) added a comment - serg What I would like to tell is like "start with consistent snapshot at '2019-05-16 01:21:00.000000000'". As same as, commit is like "commit at '2019-05-16 01:21:00.000000000'". I assume that use same time to all nodes make it possible to access datas with consistent views without any locks. Should I create new jira task instead of MDEV-14090 ?
            Kentoku Kentoku Shiba (Inactive) added a comment - - edited

            Spider can be considered to be an ACID engine with SERIALIZABLE isolation level and XA. Because all SELECTs take shared locks and wait commit/rollback if read rows are inserted/updated/deleted.

            Kentoku Kentoku Shiba (Inactive) added a comment - - edited Spider can be considered to be an ACID engine with SERIALIZABLE isolation level and XA. Because all SELECTs take shared locks and wait commit/rollback if read rows are inserted/updated/deleted.
            marko Marko Mäkelä added a comment - - edited

            monty suggested a possible solution to this problem:

            • Spider head node(s) will send information about transaction commits to a monitor server, which is also assigning InnoDB trx_t::id to all Spider data nodes.
            • In a special mode of InnoDB, trx_t::id for write transactions will be assigned by the monitor server.
            • The spider head node and the monitor server will control read view creation and delay the purge, by extending the BEGIN or START TRANSACTION syntax with the InnoDB local transaction identifiers, something like this:

            BEGIN 100, (103,104);
            

            This tells InnoDB two things:

            • The purge view is not allowed to advance beyond transaction identifier 99 until another such extended BEGIN statement arrives.
            • No matter what transactions were already committed, InnoDB will pretend that transactions 103, 104 (which must have started earlier) were not committed.

            It is an error to request a BEGIN with a smaller number for the purge limit, such as:

            BEGIN 99, (100,103,104); -- error: previously we got BEGIN 100, …
            BEGIN 100, (101,102,105); -- OK: the previous limit was BEGIN 100, …
            BEGIN 101, (101,102,105); -- OK, returning the same read view (allowing purge view to advance)
            BEGIN 100, (101,102,105); -- error: previously we got BEGIN 101, …
            

            Note: Not all listed transaction identifiers are necessarily known by all Spider data nodes, because not all transactions will always modify tables on all data nodes.

            marko Marko Mäkelä added a comment - - edited monty suggested a possible solution to this problem: Spider head node(s) will send information about transaction commits to a monitor server , which is also assigning InnoDB trx_t::id to all Spider data nodes. In a special mode of InnoDB, trx_t::id for write transactions will be assigned by the monitor server. The spider head node and the monitor server will control read view creation and delay the purge, by extending the BEGIN or START TRANSACTION syntax with the InnoDB local transaction identifiers, something like this: BEGIN 100, (103,104); This tells InnoDB two things: The purge view is not allowed to advance beyond transaction identifier 99 until another such extended BEGIN statement arrives. No matter what transactions were already committed, InnoDB will pretend that transactions 103, 104 (which must have started earlier) were not committed. It is an error to request a BEGIN with a smaller number for the purge limit, such as: BEGIN 99, (100,103,104); -- error: previously we got BEGIN 100, … BEGIN 100, (101,102,105); -- OK: the previous limit was BEGIN 100, … BEGIN 101, (101,102,105); -- OK, returning the same read view (allowing purge view to advance) BEGIN 100, (101,102,105); -- error: previously we got BEGIN 101, … Note: Not all listed transaction identifiers are necessarily known by all Spider data nodes, because not all transactions will always modify tables on all data nodes.

            I feel that some clarifications are necessary:

            • This extended functionality should be enabled only if a special parameter is set. Only so we can avoid a scenario where a data node is restarted and then suddenly starts purging history before the first extended BEGIN statement is issued.
            • All Spider data nodes that are serving a Spider head node would use the same 48-bit InnoDB transaction identifiers.
            • InnoDB would allocate those transaction identifiers from a new monitor server, which does not have any persistent state.
            • On startup, the monitor server would query all data nodes from the maximum transaction identifier, and initialize the sequence from that.
            • We seem to need XA 2-phase-commit syntax for distributed transaction commit between Spider data nodes.
            • But, XIDs will be lost after XA COMMIT, and we must be able to refer to committed not-yet-purged transactions.
            • Perhaps we could introduce an identity mapping, such as sscanf(xid, "xid" TRX_ID_FMT, &trx_id)?
            • If a Spider system is being replicated by Galera, all nodes should use the same transaction identifiers, assigned by the monitor server.

            I have not thought out how to handle distributed transactions between multiple Spider clusters. Above, we would be internally using the XA mechanism for transactions that are not distributed from the point of view of a client that connects to a Spider head node.

            marko Marko Mäkelä added a comment - I feel that some clarifications are necessary: This extended functionality should be enabled only if a special parameter is set. Only so we can avoid a scenario where a data node is restarted and then suddenly starts purging history before the first extended BEGIN statement is issued. All Spider data nodes that are serving a Spider head node would use the same 48-bit InnoDB transaction identifiers. InnoDB would allocate those transaction identifiers from a new monitor server , which does not have any persistent state. On startup, the monitor server would query all data nodes from the maximum transaction identifier, and initialize the sequence from that. We seem to need XA 2-phase-commit syntax for distributed transaction commit between Spider data nodes. But, XIDs will be lost after XA COMMIT , and we must be able to refer to committed not-yet-purged transactions. Perhaps we could introduce an identity mapping, such as sscanf(xid, "xid" TRX_ID_FMT, &trx_id) ? If a Spider system is being replicated by Galera, all nodes should use the same transaction identifiers, assigned by the monitor server. I have not thought out how to handle distributed transactions between multiple Spider clusters. Above, we would be internally using the XA mechanism for transactions that are not distributed from the point of view of a client that connects to a Spider head node.

            People

              ycp Yuchen Pei
              mattiasjonsson Mattias Jonsson
              Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.