[MXS-2625] Transaction Replay: Trying to execute statement before Clustrix cluster is up, gets the statement stuck - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.4.1
Component/s: xpandmon
Labels:
None
Environment:
MaxScale server karma197:
OS: CentOS 7
Version: built from 2.4 branch, MariaDB MaxScale 2.4.1 started (Commit: 658aae6b6c77774c34d0f52c03f63edc2d44969e)
Clustrix nodes:
OS: CentOS 7
Version: clustrix-fredonyer-16045

Sprint:
MXS-SPRINT-88, MXS-SPRINT-89

Description

[root@karma108 ~]# clx s

Cluster Name:    cld8d568c4e1f3e0db

Cluster Version: clustrix-fredonyer-16045

Cluster Status:   OK

Cluster Size:    4 nodes - 16 CPUs per Node

Current Node:    karma108 - nid 1

nid |  Hostname | Status |  IP Address  | Zone | TPS |      Used     |  Total

----+-----------+--------+--------------+------+-----+---------------+--------

  1 |  karma108 |    OK  |   10.2.13.91 |    1 |   0 |  1.8G (0.24%) |  762.9G

  2 |  karma180 |    OK  |  10.2.15.180 |    2 |   0 |  1.8G (0.24%) |  762.9G

  3 |  karma123 |    OK  |   10.2.15.89 |    3 |   0 |  1.8G (0.24%) |  762.9G

  4 |  karma065 |    OK  |  10.2.14.119 |    0 |   0 |  9.3M (0.00%) |  762.9G

----+-----------+--------+--------------+------+-----+---------------+--------

                                                   0 |  5.5G (0.18%) |    3.0T

Conf file snip:

[RCR]

type=service

router=readwritesplit

user=maxscale

password=maxscale_pw

cluster=Clustrix

transaction_replay=true

slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS

delayed_retry_timeout=360s

transaction_replay_attempts=500

MaxScale run as:

[root@karma197 etc]# maxscale -d -f clustrix_karma108_trxreplay.cnf --user=root

[root@karma197 log]# maxctrl list servers

┌─────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐

│ Server              │ Address     │ Port │ Connections │ State           │ GTID │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │

└─────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘

Connected to MaxScale from client machine and run following:

[root@jones ~]# mysql -h karma197 -u maxscale -pmaxscale_pw -P 6008

mysql> set session autocommit=false;

Query OK, 0 rows affected (0.00 sec)

mysql> begin;

Query OK, 0 rows affected (0.00 sec)

mysql> use test;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql>  insert into t2 values (9);

Query OK, 1 row affected (0.03 sec)

Restart clustrix here by running:
[root@karma108 ~]# clx dbrestart
Success: stop_job krobix run on all nodes.
Success: start_job krobix run on all nodes.

Before the cluster forms, run following.

mysql>  insert into t2 values (9);

Expected:
The trx should be replayed including the last statement that was attempted while the cluster was forming, once all nodes come up and the cluster forms. Last statement should complete successfully.

Actual:
Transaction does not get retried. Last statement execution never completed, but just got stuck. Waited multiple minutes. Same statement completed instantly before Clustrix service restart.

Relevant logs:

2019-08-07 22:33:05   info   : > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 10, type: QUERY_TYPE_BEGIN_TRX, stmt: begin

2019-08-07 22:33:05   info   : [readwritesplit] Connected to '@@Clustrix:node-1'

2019-08-07 22:33:05   info   : [readwritesplit] Queuing query until '@@Clustrix:node-1' completes session command

2019-08-07 22:33:05   info   : (1) Connected to '@@Clustrix:node-1' with thread id 3073

2019-08-07 22:33:05   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1

2019-08-07 22:33:05   info   : (1) [readwritesplit] 1 session commands left on '@@Clustrix:node-1'

2019-08-07 22:33:05   info   : (1) [readwritesplit] Reply complete, last reply from @@Clustrix:node-1

2019-08-07 22:33:05   info   : (1) [readwritesplit] >>> Routing stored queries

2019-08-07 22:33:05   info   : (1) [readwritesplit] New query received while transaction replay is active: insert into t2 values (9)

2019-08-07 22:33:05   info   : (1) [readwritesplit] <<< Stored queries routed

2019-08-07 22:33:07   notice : Server changed state: Bootstrap1-karma108[10.2.13.91:3306]: master_up. [Down] -> [Master, Running]

Full logs and config file attached.

Clustrix log for service shutwon and cluster formation times:

2019-08-07 22:32:36.343713 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/shutdown.c:61 shutdown_signal(): Shutting down node pfec37993505e4

d91

2019-08-07 22:32:36.365713 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/coordinate_dbstart.c:193 write_coordinate_dbstart_file_done(): Dbs

tart coordination info ('4') written to /etc/clustrix/expected_nodes

2019-08-07 22:32:48.073414 UTC karma108.colo.sproutsys.com clxnode: INFO test/clxnode.c:410 clxnode_post_args(): Starting Clustrix, clustrix-fredonyer-1

6045, pid=25572, build=release

...

2019-08-07 22:33:04.143217 UTC nid 1 karma108.colo.sproutsys.com clxnode: ALERT NEW_GROUP INFO Node 1 has new group 2cfffe: { 1-4 } for cluster_id d8d568c4e1f3e0db

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

unsuccessful-trx-replay.log
44 kB
2019-09-02 07:35
successful-trx-replay.log
48 kB
2019-09-02 07:35
maxscale.stmtDuringGC.log
79 kB
2019-09-06 06:35
maxscale.stmtAfterGC.log
80 kB
2019-09-06 06:35
maxscale.log
54 kB
2019-08-07 22:43
interruption.log
4 kB
2019-09-02 07:35
clustrix_karma108_trxreplay.cnf
0.7 kB
2019-08-07 22:43

Transaction Replay: Trying to execute statement before Clustrix cluster is up, gets the statement stuck

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration