[MDEV-29246] WSREP_CLUSTER_SIZE at 0 after rolling update a node from 10.3 to 10.4 Created: 2022-08-04  Updated: 2023-01-03  Resolved: 2022-08-25

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.35, 10.4.25
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Claudio Nanni Assignee: Ramesh Sivaraman
Resolution: Not a Bug Votes: 4
Labels: None

Attachments: Text File workaround_test.txt    
Issue Links:
Relates
relates to MDEV-19983 Galera: Rolling upgrade: Upgraded nod... Closed
relates to MDEV-20439 WSREP_CLUSTER_SIZE at 0 after rolling... Closed
relates to MDEV-22723 Data loss when performing rolling upg... Closed
relates to MDEV-22745 node crash on upgrade from 10.3 to 10... Closed

 Description   

A problem that supposedly was fixed (MDEV-22745, MDEV-20439, MDEV-22723, MDEV-19983) seems to still exist.
In this case the 10.4 node is replicating changes but still reporting:

wsrep_cluster_size as 0
wsrep_cluster_conf_id 18446744073709551615

The versions are:

On existing 2 nodes:

galera-25.3.35-1.el7.centos.x86_64
MariaDB-common-10.3.35-1.el7.centos.x86_64
MariaDB-server-10.3.35-1.el7.centos.x86_64
MariaDB-compat-10.3.35-1.el7.centos.x86_64
MariaDB-client-10.3.35-1.el7.centos.x86_64
MariaDB-backup-10.3.35-1.el7.centos.x86_64
MariaDB-shared-10.3.35-1.el7.centos.x86_64

On upgraded node:
galera-4-26.4.11-1.el7.centos.x86_64
MariaDB-compat-10.4.25-1.el7.centos.x86_64
MariaDB-server-10.4.25-1.el7.centos.x86_64
MariaDB-common-10.4.25-1.el7.centos.x86_64
MariaDB-backup-10.4.25-1.el7.centos.x86_64
MariaDB-client-10.4.25-1.el7.centos.x86_64
MariaDB-shared-10.4.25-1.el7.centos.x86_64

Tried a workaround also, starting with wsrep_on=OFF and turning it on afterwards, no luck (see attached)



 Comments   
Comment by Ramesh Sivaraman [ 2022-08-05 ]

jplindst wsrep_cluster_size becomes zero only when SST is triggered after an upgrade. Restarting the upgraded node will resolve this issue.

[vagrant@node3 ~]$ sudo mysql -e"show status like '%wsrep_cluster_size%'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 0     |
+--------------------+-------+
[vagrant@node3 ~]$ 
[vagrant@node3 ~]$ sudo grep -i 'SST succeeded' /var/lib/mysql/mysql-error.log
2022-08-05 11:00:14 3 [Note] WSREP: SST succeeded for position b3bf9ad6-14a7-11ed-9106-5e6aa65a6a88:4
[vagrant@node3 ~]$ 
[vagrant@node3 ~]$ sudo systemctl restart mariadb.service 
[vagrant@node3 ~]$ sudo mysql -e"show status like '%wsrep_cluster_size%'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
[vagrant@node3 ~]$ 

If the upgraded node does't start SST, wsrep_cluster_size shows correctly after an upgrade.

[vagrant@node3 ~]$ sudo mysql -e"show status like '%wsrep_cluster_size%'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
[vagrant@node3 ~]$ 
[vagrant@node3 ~]$ sudo grep -i sst /var/lib/mysql/mysql-error.log
[vagrant@node3 ~]$ 

Comment by Jan Lindström (Inactive) [ 2022-08-25 ]

ramesh Can you re-test with branch bb-10.4-MDEV-29246-galera

Comment by Jan Lindström (Inactive) [ 2022-08-25 ]

As noted in error log

[Warning] WSREP: View recovered from stable storage was empty. If the server is doing rolling upgrade from previous version which does not support storing view info into stable storage, this is ok. Otherwise this may be a sign of malfunction.

this is not a bug. 10.3 does not store view into stable storage so to see correct wsrep_cluster_size value all nodes need to be upgraded to 10.4. Workaround as noted is to restart already upgraded 10.4 node.

Generated at Thu Feb 08 10:07:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.