[MDEV-32827] grastate.dat is not updated on node shutdown. Created: 2023-11-17  Updated: 2023-12-15  Resolved: 2023-12-15

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Blocker
Reporter: MikaH Assignee: Julius Goryavsky
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

CentOS Linux release 7.7.1908. 3 nodes. Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz with 64 CPU's. Bare metal. 512GB RAM, galera-enterprise-4-26.4.14-1.el7_9.src.rpm


Attachments: File mariadb.err_joining_node6    

 Description   

During the normal shutdown of MariaDB 10.5.22 cluster node, grastate.dat-file is not updated and that leads to SST on start always. IST won't be available.

[root@node6 ~]# date;ls -la /var/lib/mysql/datadir/grastate.dat ;cat /var/lib/mysql/datadir/grastate.dat; ps -auxww|grep mariadb |grep -v grep ; systemctl stop mariadb && ls -la /var/lib/mysql/datadir/grastate.dat ;cat /var/lib/mysql/datadir/grastate.dat; ps -auxww|grep mariadb |grep -v grep;date
Fri Nov 17 09:50:34 +06 2023
-rw-rw----. 1 mysql mysql 119 Nov 16 17:40 /var/lib/mysql/datadir/grastate.dat
# GALERA saved state
version: 2.1
uuid:    00000000-0000-0000-0000-000000000000
seqno:   -1
safe_to_bootstrap: 0
      mysql    20759  751 71.6 467978704 378103816 ? Ssl  Nov16 10190:33 /usr/sbin/mariadbd --wsrep_start_position=5256dcba-7f0c-11ee-aabd-4fc1491ca19b:79256066
-rw-rw----. 1 mysql mysql 119 Nov 16 17:40 /var/lib/mysql/datadir/grastate.dat
# GALERA saved state
version: 2.1
uuid:    00000000-0000-0000-0000-000000000000
seqno:   -1
safe_to_bootstrap: 0
      Fri Nov 17 09:51:08 +06 2023
[root@node6 ~]#



 Comments   
Comment by Rick Pizzi [ 2023-11-17 ]

The cluster has not been bootstrapped correctly and has a null cluster UUID (all zeros).
This is not a bug. Please re-bootstrap cluster correctly and the file will be updated as expected.

Comment by MikaH [ 2023-11-17 ]

Okay. Someone has bootstrapped the cluster before I was involved to this. What would be the best way to Bootstrap the cluster and avoiding SST's?

Would this work:
1) Stop all nodes
2) Bootstrap one node.
3) Wait that the node is running fine.
4) Stop the node
5) Copy grastate.dat to other two nodes
6) Bootstrap the same node (node2) again
7) Join other nodes one by one, with systemctl start mariadb
8) Verify that IST's happen and nodes are in sync

Comment by Rick Pizzi [ 2023-11-17 ]

You should never copy the state file to other nodes.
The file is maintained and updated by the local cluster node.
So the correct process is , after stopping all nodes, to bootstrap the cluster using the most up to date (or selected) node using the galera_new_cluster script (or an empty nodelist in config) and once it is up, just start the other nodes one at a time (waiting for the node to join and get synced before proceeding to next node).

Comment by MikaH [ 2023-11-17 ]

Well, that kinda sucks. SST took 8h18min to complete. With little trick it is possible to shrink to few minutes.

Comment by Rick Pizzi [ 2023-11-17 ]

Also you cannot avoid SST if you don't have a valid state for a node.

Comment by Rick Pizzi [ 2023-11-17 ]

The "trick" implies to have a valid state on a node, you do not have it.

Comment by MikaH [ 2023-11-17 ]

Yeah, automatic SST is the easiest way but having dataset size like 7TB not a good one. You have documented Manual SST here: https://mariadb.com/kb/en/manual-sst-of-galera-cluster-node-with-mariabackup/

The tricky part with that Manual SST is the file grastate.dat, single typo, extra empty line, leads to SST instead of IST.

Let's close this ticket and if the issue re-appears after cluster bootstrapped again, I'll return here. If nothing Is heard, all is ok

Comment by Julius Goryavsky [ 2023-12-15 ]

I am closing this ticket because according to available information it is not a bug, if the problem appears after correct bootstrap we will reopen this ticket.

Generated at Thu Feb 08 10:34:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.