[MXS-2625] Transaction Replay: Trying to execute statement before Clustrix cluster is up, gets the statement stuck Created: 2019-08-07  Updated: 2019-09-13  Resolved: 2019-09-06

Status: Closed
Project: MariaDB MaxScale
Component/s: xpandmon
Affects Version/s: None
Fix Version/s: 2.4.1

Type: Bug Priority: Major
Reporter: Rahul Joshi (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

MaxScale server karma197:
OS: CentOS 7
Version: built from 2.4 branch, MariaDB MaxScale 2.4.1 started (Commit: 658aae6b6c77774c34d0f52c03f63edc2d44969e)
Clustrix nodes:
OS: CentOS 7
Version: clustrix-fredonyer-16045


Attachments: File clustrix_karma108_trxreplay.cnf     File interruption.log     Text File maxscale.log     Text File maxscale.stmtAfterGC.log     Text File maxscale.stmtDuringGC.log     File successful-trx-replay.log     File unsuccessful-trx-replay.log    
Sprint: MXS-SPRINT-88, MXS-SPRINT-89

 Description   

[root@karma108 ~]# clx s
Cluster Name:    cld8d568c4e1f3e0db
Cluster Version: clustrix-fredonyer-16045
Cluster Status:   OK
Cluster Size:    4 nodes - 16 CPUs per Node
Current Node:    karma108 - nid 1
 
nid |  Hostname | Status |  IP Address  | Zone | TPS |      Used     |  Total
----+-----------+--------+--------------+------+-----+---------------+--------
  1 |  karma108 |    OK  |   10.2.13.91 |    1 |   0 |  1.8G (0.24%) |  762.9G
  2 |  karma180 |    OK  |  10.2.15.180 |    2 |   0 |  1.8G (0.24%) |  762.9G
  3 |  karma123 |    OK  |   10.2.15.89 |    3 |   0 |  1.8G (0.24%) |  762.9G
  4 |  karma065 |    OK  |  10.2.14.119 |    0 |   0 |  9.3M (0.00%) |  762.9G
----+-----------+--------+--------------+------+-----+---------------+--------
                                                   0 |  5.5G (0.18%) |    3.0T
Conf file snip:
[RCR]
type=service
router=readwritesplit
user=maxscale
password=maxscale_pw
cluster=Clustrix
transaction_replay=true
slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS
delayed_retry_timeout=360s
transaction_replay_attempts=500
 
MaxScale run as:
[root@karma197 etc]# maxscale -d -f clustrix_karma108_trxreplay.cnf --user=root
 
[root@karma197 log]# maxctrl list servers
┌─────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
│ Server              │ Address     │ Port │ Connections │ State           │ GTID │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
└─────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘

Connected to MaxScale from client machine and run following:

[root@jones ~]# mysql -h karma197 -u maxscale -pmaxscale_pw -P 6008
mysql> set session autocommit=false;
Query OK, 0 rows affected (0.00 sec)
 
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql>  insert into t2 values (9);
Query OK, 1 row affected (0.03 sec)

Restart clustrix here by running:
[root@karma108 ~]# clx dbrestart
Success: stop_job krobix run on all nodes.
Success: start_job krobix run on all nodes.

Before the cluster forms, run following.

mysql>  insert into t2 values (9);

Expected:
The trx should be replayed including the last statement that was attempted while the cluster was forming, once all nodes come up and the cluster forms. Last statement should complete successfully.

Actual:
Transaction does not get retried. Last statement execution never completed, but just got stuck. Waited multiple minutes. Same statement completed instantly before Clustrix service restart.

Relevant logs:

2019-08-07 22:33:05   info   : > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 10, type: QUERY_TYPE_BEGIN_TRX, stmt: begin
2019-08-07 22:33:05   info   : [readwritesplit] Connected to '@@Clustrix:node-1'
2019-08-07 22:33:05   info   : [readwritesplit] Queuing query until '@@Clustrix:node-1' completes session command
2019-08-07 22:33:05   info   : (1) Connected to '@@Clustrix:node-1' with thread id 3073
2019-08-07 22:33:05   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
2019-08-07 22:33:05   info   : (1) [readwritesplit] 1 session commands left on '@@Clustrix:node-1'
2019-08-07 22:33:05   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
2019-08-07 22:33:05   info   : (1) [readwritesplit] >>> Routing stored queries
2019-08-07 22:33:05   info   : (1) [readwritesplit] New query received while transaction replay is active: insert into t2 values (9)
2019-08-07 22:33:05   info   : (1) [readwritesplit] <<< Stored queries routed
2019-08-07 22:33:07   notice : Server changed state: Bootstrap1-karma108[10.2.13.91:3306]: master_up. [Down] -> [Master, Running]

Full logs and config file attached.

Clustrix log for service shutwon and cluster formation times:

2019-08-07 22:32:36.343713 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/shutdown.c:61 shutdown_signal(): Shutting down node pfec37993505e4
d91
2019-08-07 22:32:36.365713 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/coordinate_dbstart.c:193 write_coordinate_dbstart_file_done(): Dbs
tart coordination info ('4') written to /etc/clustrix/expected_nodes
2019-08-07 22:32:48.073414 UTC karma108.colo.sproutsys.com clxnode: INFO test/clxnode.c:410 clxnode_post_args(): Starting Clustrix, clustrix-fredonyer-1
6045, pid=25572, build=release
...
2019-08-07 22:33:04.143217 UTC nid 1 karma108.colo.sproutsys.com clxnode: ALERT NEW_GROUP INFO Node 1 has new group 2cfffe: { 1-4 } for cluster_id d8d568c4e1f3e0db



 Comments   
Comment by Johan Wikman [ 2019-08-21 ]

Could repeat the behaviour using 658aae6b6c77774c34d0f52c03f63edc2d44969e

Comment by Johan Wikman [ 2019-08-21 ]

rahul.joshi@mariadb.com With 2.4.1, MaxScale does not hang but the connection is closed. I think the closing is caused by session commands (use test;) being executed within the transaction.

If perform the same test like:

mysql> use test;
mysql> set session autocommit=false;
mysql> begin;
mysql> insert into t2 values (9);

Restart cluster

mysql> insert into t2 values (9);

then the transaction is replayed successfully.

Could you retry after

  • having updated to 2.4.1 and
  • having moved the use test outside the transaction.
Comment by Rahul Joshi (Inactive) [ 2019-08-22 ]

Hi johan.wikman,
Tried with latest 2.4.1 :
[root@karma197 log]# maxscale -V
MaxScale 2.4.1 - 7e0b2d88969e20618fc8df1f476fde9b79737cf2

Moved the use test; session command out of the trx.
Now seeing the checksum error instead of a hang.

[root@vqc007c ~]# mysql -h karma197 -P 6008 -u maxscale -pmaxscale_pw
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.0.45-clustrix-fredonyer-16045
 
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> set session autocommit=false;
Query OK, 0 rows affected (0.01 sec)
 
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into t1 values (19);
Query OK, 1 row affected (0.00 sec)

Restarted the cluster and run following before it can form group

mysql> insert into t1 values (20);
ERROR 1927 (08S01): Transaction checksum mismatch encountered when replaying transaction.
mysql>

2019-08-22 22:22:27   info   : (10) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
2019-08-22 22:22:27   info   : (10) [readwritesplit] Replaying: insert into t1 values (19)
2019-08-22 22:22:27   info   : > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 31, type: QUERY_TYPE_WRITE, stmt: insert into t1 values (19)
2019-08-22 22:22:27   info   : [readwritesplit] Route query to master: @@Clustrix:node-1        [10.2.13.91]:3306 <
2019-08-22 22:22:27   info   : (10) [readwritesplit] Adding to trx: insert into t1 values (19)
2019-08-22 22:22:27   info   : (10) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
2019-08-22 22:22:27   info   : (10) [readwritesplit] Checksum mismatch, transaction replay failed. Closing connection.
2019-08-22 22:22:27   info   : Stopped RCR client session [10]
2019-08-22 22:22:28   notice : Server changed state: Bootstrap1-karma108[10.2.13.91:3306]: master_up. [Down] -> [Master, Running]

conf file is the same:

[root@karma197 log]# cat /etc/clustrix_karma108_trxreplay.cnf
[maxscale]
log_info=1
threads=auto
logdir=/data/clustrix/log
 
[Bootstrap1-karma108]
type=server
address=10.2.13.91
port=3306
protocol=mariadbbackend
#karma108
 
[Clustrix]
type=monitor
module=clustrixmon
servers=Bootstrap1-karma108
user=maxscale
password=maxscale_pw
cluster_monitor_interval=10000
 
[RCR]
type=service
router=readwritesplit
user=maxscale
password=maxscale_pw
cluster=Clustrix
transaction_replay=true
slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS
delayed_retry_timeout=360s
transaction_replay_attempts=500
 
[RCR-Listener]
type=listener
service=RCR
protocol=MariaDBClient
port=6008
 
[MaxAdmin-Service]
type=service
router=cli
 
[MaxAdmin-Unix-Listener]
type=listener
service=MaxAdmin-Service
protocol=maxscaled
socket=default

Comment by Johan Wikman [ 2019-09-02 ]

I have now tried this numerous times:

  • If I, after having executed clx dbrestart, wait until the cluster has come up again before executing the second INSERT, then the transaction replying works every time.
  • If I, after having executed clx dbrestart issue the second INSERT before the cluster has come up again, then the session will hang every time.
    In my earlier testing I actually did not use clx dbrestart but service clustrix restart on the connected to node.

I think this is related to a new command being received while a transaction replay (that will fail due to no nodes being up) is in progress.

The following works:

First

mysql> use test;
mysql> set session autocommit=false;
mysql> insert into t2 values (9);

Then on one Clustrix node:

$ sudo clx dbrestart

Wait until the restart has been performed (it's possible to sudo mysql) on the node.

mysql> insert into t2 values (9);
mysql> commit;

That results in successful-trx-replay.log.

However, when done like:

First

mysql> use test;
mysql> set session autocommit=false;
mysql> insert into t2 values (9);

Then on one Clustrix node:

$ sudo clx dbrestart

and immediately when the command has returned

mysql> insert into t2 values (9);

the mysql-client will hang and we get the log unsuccessful-trx-replay.log.

If the hanging of the client is interrupted by 3 ctrl-C:s

^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- exit!
Aborted

we further get into the log interruption.log.

It appears that if a transaction replay fails, the internal book-keeping of RWS gets confused so that the transaction is not replayed correctly and a reply is not sent to the client. There is some additional logging in the logs.

Comment by markus makela [ 2019-09-05 ]

rahul.joshi@mariadb.com Can you try reproducing this with a build from the latest 2.4 branch? I've tested this numerous times and I suspect this might've been fixed by this commit that's in 2.4.1.

commit 8bc4e42f2df62deed6b34d40299112690cf20753
Author: Markus Mäkelä <markus.makela@mariadb.com>
Date:   Tue Aug 6 11:17:15 2019 +0300
 
    Fix query queuing on session command execution
    
    If session command execution during server reconnection caused a query to
    be queued, the query would be put on the tail end of the queue. This would
    cause queries to be reordered if the queue wasn't empty. The correct thing
    to do would be to put the next pending query back at the front of the
    queue.
 
diff --git a/server/modules/routing/readwritesplit/rwsplit_route_stmt.cc b/server/modules/routing/readwritesplit/rwsplit_route_stmt.cc
index 28b4d11b8..7919d5a0a 100644
--- a/server/modules/routing/readwritesplit/rwsplit_route_stmt.cc
+++ b/server/modules/routing/readwritesplit/rwsplit_route_stmt.cc
@@ -336,7 +336,7 @@ bool RWSplitSession::route_single_stmt(GWBUF* querybuf)
             else if (target->has_session_commands())
             {
                 // We need to wait until the session commands are executed
-                m_query_queue.emplace_back(gwbuf_clone(querybuf));
+                m_query_queue.emplace_front(gwbuf_clone(querybuf));
                 MXS_INFO("Queuing query until '%s' completes session command", target->name());
             }
             else

The problem is easily reproducible when that commit is reverted.

Comment by Johan Wikman [ 2019-09-05 ]

rahul.joshi@mariadb.com I just verified that with commit 658aae6b6c77774c34d0f52c03f63edc2d44969e the behaviour repeats itselfs consistently, but using tag maxscale-2.4.2 it does not.

Comment by Rahul Joshi (Inactive) [ 2019-09-06 ]

Hi,
I tried the scenario with new 2.4.2 build:
[root@karma182 ~]# maxscale -V
MaxScale 2.4.2 - 52e3b80d95ed8a5623bdd3ca5aafe4a446b1543b
with Clustrix 9.1.4 and 9.2 build
I still see that I get checksum error in both cases.

Case 1:
[root@vqc007c ~]# mysql -h karma182 -P 6008 -u maxscale -pmaxscale_pw
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.5-10.2.12 2.4.2-maxscale
 
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> set session autocommit=false;
Query OK, 0 rows affected (0.01 sec)
 
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into t1 values (2);
Query OK, 1 row affected (0.01 sec)
 
Restarted the cluster and run following before it can form group
 
mysql> insert into t1 values (3);
ERROR 1927 (08S01): Transaction checksum mismatch encountered when replaying transaction.
 
 
Case 2:
[root@vqc007c ~]# mysql -h karma182 -P 6008 -u maxscale -pmaxscale_pw
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.5-10.2.12 2.4.2-maxscale
 
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> set session autocommit=false;
Query OK, 0 rows affected (0.01 sec)
 
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into t1 values (4);
Query OK, 1 row affected (0.00 sec)
 
Restarted the cluster and run following after group is formed.
 
mysql> insert into t1 values (5);
ERROR 1927 (08S01): Transaction checksum mismatch encountered when replaying transaction.
mysql>

Logs show checksum error.
Attached logs as:
maxscale.stmtDuringGC.log
maxscale.stmtAfterGC.log

Generated at Thu Feb 08 04:15:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.