Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2625

Transaction Replay: Trying to execute statement before Clustrix cluster is up, gets the statement stuck

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 2.4.1
    • xpandmon
    • None
    • MaxScale server karma197:
      OS: CentOS 7
      Version: built from 2.4 branch, MariaDB MaxScale 2.4.1 started (Commit: 658aae6b6c77774c34d0f52c03f63edc2d44969e)
      Clustrix nodes:
      OS: CentOS 7
      Version: clustrix-fredonyer-16045
    • MXS-SPRINT-88, MXS-SPRINT-89

    Description

      [root@karma108 ~]# clx s
      Cluster Name:    cld8d568c4e1f3e0db
      Cluster Version: clustrix-fredonyer-16045
      Cluster Status:   OK
      Cluster Size:    4 nodes - 16 CPUs per Node
      Current Node:    karma108 - nid 1
       
      nid |  Hostname | Status |  IP Address  | Zone | TPS |      Used     |  Total
      ----+-----------+--------+--------------+------+-----+---------------+--------
        1 |  karma108 |    OK  |   10.2.13.91 |    1 |   0 |  1.8G (0.24%) |  762.9G
        2 |  karma180 |    OK  |  10.2.15.180 |    2 |   0 |  1.8G (0.24%) |  762.9G
        3 |  karma123 |    OK  |   10.2.15.89 |    3 |   0 |  1.8G (0.24%) |  762.9G
        4 |  karma065 |    OK  |  10.2.14.119 |    0 |   0 |  9.3M (0.00%) |  762.9G
      ----+-----------+--------+--------------+------+-----+---------------+--------
                                                         0 |  5.5G (0.18%) |    3.0T
      Conf file snip:
      [RCR]
      type=service
      router=readwritesplit
      user=maxscale
      password=maxscale_pw
      cluster=Clustrix
      transaction_replay=true
      slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS
      delayed_retry_timeout=360s
      transaction_replay_attempts=500
       
      MaxScale run as:
      [root@karma197 etc]# maxscale -d -f clustrix_karma108_trxreplay.cnf --user=root
       
      [root@karma197 log]# maxctrl list servers
      ┌─────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
      │ Server              │ Address     │ Port │ Connections │ State           │ GTID │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      └─────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘
      

      Connected to MaxScale from client machine and run following:

      [root@jones ~]# mysql -h karma197 -u maxscale -pmaxscale_pw -P 6008
      mysql> set session autocommit=false;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql> begin;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql> use test;
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
       
      Database changed
      mysql>  insert into t2 values (9);
      Query OK, 1 row affected (0.03 sec)
      

      Restart clustrix here by running:
      [root@karma108 ~]# clx dbrestart
      Success: stop_job krobix run on all nodes.
      Success: start_job krobix run on all nodes.

      Before the cluster forms, run following.

      mysql>  insert into t2 values (9);
      

      Expected:
      The trx should be replayed including the last statement that was attempted while the cluster was forming, once all nodes come up and the cluster forms. Last statement should complete successfully.

      Actual:
      Transaction does not get retried. Last statement execution never completed, but just got stuck. Waited multiple minutes. Same statement completed instantly before Clustrix service restart.

      Relevant logs:

      2019-08-07 22:33:05   info   : > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 10, type: QUERY_TYPE_BEGIN_TRX, stmt: begin
      2019-08-07 22:33:05   info   : [readwritesplit] Connected to '@@Clustrix:node-1'
      2019-08-07 22:33:05   info   : [readwritesplit] Queuing query until '@@Clustrix:node-1' completes session command
      2019-08-07 22:33:05   info   : (1) Connected to '@@Clustrix:node-1' with thread id 3073
      2019-08-07 22:33:05   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
      2019-08-07 22:33:05   info   : (1) [readwritesplit] 1 session commands left on '@@Clustrix:node-1'
      2019-08-07 22:33:05   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
      2019-08-07 22:33:05   info   : (1) [readwritesplit] >>> Routing stored queries
      2019-08-07 22:33:05   info   : (1) [readwritesplit] New query received while transaction replay is active: insert into t2 values (9)
      2019-08-07 22:33:05   info   : (1) [readwritesplit] <<< Stored queries routed
      2019-08-07 22:33:07   notice : Server changed state: Bootstrap1-karma108[10.2.13.91:3306]: master_up. [Down] -> [Master, Running]
      

      Full logs and config file attached.

      Clustrix log for service shutwon and cluster formation times:

      2019-08-07 22:32:36.343713 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/shutdown.c:61 shutdown_signal(): Shutting down node pfec37993505e4
      d91
      2019-08-07 22:32:36.365713 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/coordinate_dbstart.c:193 write_coordinate_dbstart_file_done(): Dbs
      tart coordination info ('4') written to /etc/clustrix/expected_nodes
      2019-08-07 22:32:48.073414 UTC karma108.colo.sproutsys.com clxnode: INFO test/clxnode.c:410 clxnode_post_args(): Starting Clustrix, clustrix-fredonyer-1
      6045, pid=25572, build=release
      ...
      2019-08-07 22:33:04.143217 UTC nid 1 karma108.colo.sproutsys.com clxnode: ALERT NEW_GROUP INFO Node 1 has new group 2cfffe: { 1-4 } for cluster_id d8d568c4e1f3e0db
      

      Attachments

        1. clustrix_karma108_trxreplay.cnf
          0.7 kB
        2. interruption.log
          4 kB
        3. maxscale.log
          54 kB
        4. maxscale.stmtAfterGC.log
          80 kB
        5. maxscale.stmtDuringGC.log
          79 kB
        6. successful-trx-replay.log
          48 kB
        7. unsuccessful-trx-replay.log
          44 kB

        Activity

          People

            markus makela markus makela
            rahul.joshi@mariadb.com Rahul Joshi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.