[MXS-2624] Transaction Replay fails with checksum error as soon as Clustrix cluster is formed. Created: 2019-08-07  Updated: 2019-10-07  Resolved: 2019-10-07

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit, xpandmon
Affects Version/s: None
Fix Version/s: 2.5.0

Type: Bug Priority: Major
Reporter: Rahul Joshi (Inactive) Assignee: Johan Wikman
Resolution: Duplicate Votes: 0
Labels: None
Environment:

MaxScale server karma197:
OS: CentOS 7
Version: built from 2.4 branch, MariaDB MaxScale 2.4.1 started (Commit: 658aae6b6c77774c34d0f52c03f63edc2d44969e)
Clustrix nodes:
OS: CentOS 7
Version: clustrix-fredonyer-16045


Attachments: File clustrix_karma108_trxreplay.cnf     Text File maxscale.log    
Issue Links:
Relates
relates to MXS-2655 Transaction replay fails when session... Closed
Sprint: MXS-SPRINT-88, MXS-SPRINT-89, MXS-SPRINT-90, MXS-SPRINT-91

 Description   

[root@karma108 ~]# clx s
Cluster Name:    cld8d568c4e1f3e0db
Cluster Version: clustrix-fredonyer-16045
Cluster Status:   OK
Cluster Size:    4 nodes - 16 CPUs per Node
Current Node:    karma108 - nid 1
 
nid |  Hostname | Status |  IP Address  | Zone | TPS |      Used     |  Total
----+-----------+--------+--------------+------+-----+---------------+--------
  1 |  karma108 |    OK  |   10.2.13.91 |    1 |   0 |  1.8G (0.24%) |  762.9G
  2 |  karma180 |    OK  |  10.2.15.180 |    2 |   0 |  1.8G (0.24%) |  762.9G
  3 |  karma123 |    OK  |   10.2.15.89 |    3 |   0 |  1.8G (0.24%) |  762.9G
  4 |  karma065 |    OK  |  10.2.14.119 |    0 |   0 |  9.3M (0.00%) |  762.9G
----+-----------+--------+--------------+------+-----+---------------+--------
                                                   0 |  5.5G (0.18%) |    3.0T
Conf file snip:
[RCR]
type=service
router=readwritesplit
user=maxscale
password=maxscale_pw
cluster=Clustrix
transaction_replay=true
slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS
delayed_retry_timeout=360s
transaction_replay_attempts=500
 
MaxScale run as:
[root@karma197 etc]# maxscale -d -f clustrix_karma108_trxreplay.cnf --user=root
 
[root@karma197 log]# maxctrl list servers
┌─────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
│ Server              │ Address     │ Port │ Connections │ State           │ GTID │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
└─────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘

Connected to MaxScale from client machine and run following:

[root@jones ~]# mysql -h karma197 -u maxscale -pmaxscale_pw -P 6008
mysql> set session autocommit=false;
Query OK, 0 rows affected (0.00 sec)
 
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql>  insert into t2 values (9);
Query OK, 1 row affected (0.03 sec)

Restart clustrix here by running:
[root@karma108 ~]# clx dbrestart
Success: stop_job krobix run on all nodes.
Success: start_job krobix run on all nodes.

Let the cluster form.

Now run:

mysql>  insert into t2 values (9);
ERROR 1927 (08S01): Transaction checksum mismatch encountered when replaying transaction.

Expected:
Transaction replay should be delayed and upon the next attempt to run next statement in the trx, trx should be replayed.

Actual:
The trx replay fails with checksum mismatch error.

Relevant logs:

2019-08-07 21:58:57   info   : (1) [readwritesplit] Replaying: insert into t2 values (9)
2019-08-07 21:58:57   info   : > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 30, type: QUERY_TYPE_WRITE, stmt: insert into t2 va
lues (9)
2019-08-07 21:58:57   info   : [readwritesplit] Route query to master: @@Clustrix:node-1        [10.2.13.91]:3306 <
2019-08-07 21:58:57   info   : (1) [readwritesplit] Adding to trx: insert into t2 values (9)
2019-08-07 21:58:57   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
2019-08-07 21:58:57   info   : (1) [readwritesplit] Checksum mismatch, transaction replay failed. Closing connection.
2019-08-07 21:58:57   info   : Stopped RCR client session [1]
2019-08-07 21:58:59   notice : Server changed state: Bootstrap1-karma108[10.2.13.91:3306]: master_up. [Down] -> [Master, Running]

Full logs and config file attached.



 Comments   
Comment by Johan Wikman [ 2019-08-21 ]

rahul.joshi@mariadb.com I suspect this cause by session commands being executed within the transaction.

Could you please upgrade to 2.4.1 and retry with the use test being executed before the transaction is started.

Comment by Rahul Joshi (Inactive) [ 2019-08-22 ]

Hi johan.wikman,
Tried with latest 2.4.1 :
[root@karma197 log]# maxscale -V
MaxScale 2.4.1 - 7e0b2d88969e20618fc8df1f476fde9b79737cf2

Moved the use test; session command out of the trx. Still seeing the same issue.

[root@vqc007c ~]# mysql -h karma197 -P 6008 -u maxscale -pmaxscale_pw
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.0.45-clustrix-fredonyer-16045
 
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> set session autocommit=false;
Query OK, 0 rows affected (0.00 sec)
 
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into t1 values (15);
Query OK, 1 row affected (0.00 sec)

Restarted the cluster and let it form group

mysql> insert into t1 values (16);
ERROR 1927 (08S01): Transaction checksum mismatch encountered when replaying transaction.
mysql>

2019-08-22 20:38:09   info   : (8) [readwritesplit] Adding to trx: begin
2019-08-22 20:38:09   info   : (8) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
2019-08-22 20:38:09   info   : (8) [readwritesplit] Replaying: insert into t1 values (15)
2019-08-22 20:38:09   info   : > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 31, type: QUERY_TYPE_WRITE, stmt: insert into t1 values (15)
2019-08-22 20:38:09   info   : [readwritesplit] Route query to master: @@Clustrix:node-1        [10.2.13.91]:3306 <
2019-08-22 20:38:09   info   : (8) [readwritesplit] Adding to trx: insert into t1 values (15)
2019-08-22 20:38:09   info   : (8) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1
2019-08-22 20:38:09   info   : (8) [readwritesplit] Checksum mismatch, transaction replay failed. Closing connection.
2019-08-22 20:38:09   info   : Stopped RCR client session [8]
2019-08-22 20:38:10   notice : Server changed state: Bootstrap1-karma108[10.2.13.91:3306]: master_up. [Down] -> [Master, Running]

conf file is the same:

[root@karma197 log]# cat /etc/clustrix_karma108_trxreplay.cnf
[maxscale]
log_info=1
threads=auto
logdir=/data/clustrix/log
 
[Bootstrap1-karma108]
type=server
address=10.2.13.91
port=3306
protocol=mariadbbackend
#karma108
 
[Clustrix]
type=monitor
module=clustrixmon
servers=Bootstrap1-karma108
user=maxscale
password=maxscale_pw
cluster_monitor_interval=10000
 
[RCR]
type=service
router=readwritesplit
user=maxscale
password=maxscale_pw
cluster=Clustrix
transaction_replay=true
slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS
delayed_retry_timeout=360s
transaction_replay_attempts=500
 
[RCR-Listener]
type=listener
service=RCR
protocol=MariaDBClient
port=6008
 
[MaxAdmin-Service]
type=service
router=cli
 
[MaxAdmin-Unix-Listener]
type=listener
service=MaxAdmin-Service
protocol=maxscaled
socket=default

Comment by Johan Wikman [ 2019-09-02 ]

Moved my comments to MXS-2625 as that's were they belong.

Comment by markus makela [ 2019-10-07 ]

Fixed by MXS-2655.

Generated at Thu Feb 08 04:15:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.