Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.10
-
None
-
redhat 7 64-bits on VMware
Description
Hi all,
The DB is a 3 nodes galera prodcution cluster with 2 data nodes and 1 arbitrator.
Encountered an incident that many SQL hung at one point. The SQL stopped at "Sending data" / "Updating".
We cannot figured out resason. The final action is to restart DB node to resume.
During the review, we found that binlog content has tranaction not in time order.
Xid = 200421161 timestamp is 1:50:07 but it is earlier than
Xid = 200421179 timestamp is 1:49:42
The DB has 4 Galera slave thread. Is it related?
Any problem of this behavior?
In case we need to recovery db to 1:50:00, will Xid = 200421179 transaction be applied? We worry recovery will stop at Xid = 200421161 since its time is 1:50:07.
We captured processlist every 1min. In the first checking, time is 01:50:01 and found oldest pending SQL waited for 19s. Thus, old SQL started at 1:49:42. This matched with Xid = 200421179.
Kindly help.
attached files:
binlog-mask.txt - binlog content extracted
processlist.txt - process list of node 1
mariadb.node1.cnf - db node 1 my.cnf
mariadb.node2.cnf - db node 2 my.cnf
Regards,
William Wong