Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2624

Transaction Replay fails with checksum error as soon as Clustrix cluster is formed.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 2.5.0
    • Component/s: readwritesplit, xpandmon
    • Labels:
      None
    • Environment:
      MaxScale server karma197:
      OS: CentOS 7
      Version: built from 2.4 branch, MariaDB MaxScale 2.4.1 started (Commit: 658aae6b6c77774c34d0f52c03f63edc2d44969e)
      Clustrix nodes:
      OS: CentOS 7
      Version: clustrix-fredonyer-16045
    • Sprint:
      MXS-SPRINT-88, MXS-SPRINT-89, MXS-SPRINT-90, MXS-SPRINT-91

      Description

      [root@karma108 ~]# clx s
      Cluster Name:    cld8d568c4e1f3e0db
      Cluster Version: clustrix-fredonyer-16045
      Cluster Status:   OK
      Cluster Size:    4 nodes - 16 CPUs per Node
      Current Node:    karma108 - nid 1
       
      nid |  Hostname | Status |  IP Address  | Zone | TPS |      Used     |  Total
      ----+-----------+--------+--------------+------+-----+---------------+--------
        1 |  karma108 |    OK  |   10.2.13.91 |    1 |   0 |  1.8G (0.24%) |  762.9G
        2 |  karma180 |    OK  |  10.2.15.180 |    2 |   0 |  1.8G (0.24%) |  762.9G
        3 |  karma123 |    OK  |   10.2.15.89 |    3 |   0 |  1.8G (0.24%) |  762.9G
        4 |  karma065 |    OK  |  10.2.14.119 |    0 |   0 |  9.3M (0.00%) |  762.9G
      ----+-----------+--------+--------------+------+-----+---------------+--------
                                                         0 |  5.5G (0.18%) |    3.0T
      Conf file snip:
      [RCR]
      type=service
      router=readwritesplit
      user=maxscale
      password=maxscale_pw
      cluster=Clustrix
      transaction_replay=true
      slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS
      delayed_retry_timeout=360s
      transaction_replay_attempts=500
       
      MaxScale run as:
      [root@karma197 etc]# maxscale -d -f clustrix_karma108_trxreplay.cnf --user=root
       
      [root@karma197 log]# maxctrl list servers
      ┌─────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
      │ Server              │ Address     │ Port │ Connections │ State           │ GTID │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      └─────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘
      

      Connected to MaxScale from client machine and run following:

      [root@jones ~]# mysql -h karma197 -u maxscale -pmaxscale_pw -P 6008
      mysql> set session autocommit=false;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql> begin;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql> use test;
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
       
      Database changed
      mysql>  insert into t2 values (9);
      Query OK, 1 row affected (0.03 sec)
      

      Restart clustrix here by running:
      [root@karma108 ~]# clx dbrestart
      Success: stop_job krobix run on all nodes.
      Success: start_job krobix run on all nodes.

      Let the cluster form.

      Now run:

      mysql>  insert into t2 values (9);
      ERROR 1927 (08S01): Transaction checksum mismatch encountered when replaying transaction.
      

      Expected:
      Transaction replay should be delayed and upon the next attempt to run next statement in the trx, trx should be replayed.

      Actual:
      The trx replay fails with checksum mismatch error.

      Relevant logs:

      2019-08-07 21:58:57   info   : (1) [readwritesplit] Replaying: insert into t2 values (9)
      2019-08-07 21:58:57   info   : > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 30, type: QUERY_TYPE_WRITE, stmt: insert into t2 va
      lues (9)
      2019-08-07 21:58:57   info   : [readwritesplit] Route query to master: @@Clustrix:node-1        [10.2.13.91]:3306 <
      2019-08-07 21:58:57   info   : (1) [readwritesplit] Adding to trx: insert into t2 values (9)
      2019-08-07 21:58:57   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
      2019-08-07 21:58:57   info   : (1) [readwritesplit] Checksum mismatch, transaction replay failed. Closing connection.
      2019-08-07 21:58:57   info   : Stopped RCR client session [1]
      2019-08-07 21:58:59   notice : Server changed state: Bootstrap1-karma108[10.2.13.91:3306]: master_up. [Down] -> [Master, Running]
      

      Full logs and config file attached.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              johan.wikman Johan Wikman
              Reporter:
              rahul.joshi@mariadb.com Rahul Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: