[MDEV-29281] Add details about node eviction status to the JSON file with Galera node status Created: 2022-08-09  Updated: 2023-03-22  Resolved: 2023-02-10

Status: Closed
Project: MariaDB Server
Component/s: Galera, wsrep
Fix Version/s: 11.0.1

Type: Task Priority: Major
Reporter: Valerii Kravchuk Assignee: Denis Protivensky
Resolution: Fixed Votes: 0
Labels: Galera, Preview_11.0, json, wsrep

Issue Links:
Relates
relates to MDEV-26971 JSON file interface to wsrep node sta... Closed

 Description   

It turned out that Galera node status details stored in the JSON file as created in frames of MDEV-21901 do NOT include note eviction status, represented by some messages in the error log and status variables like wsrep_evs_status, wsrep_evs_delayes and wsrep_evs_evict_list.

Please, add this information.



 Comments   
Comment by Valerii Kravchuk [ 2022-08-09 ]

My original task for this, MDEV-21901, was closed as "Won't Do" with the idea to use this JSON status file instead. Unfortunately this had not happened.

Comment by Jan Lindström (Inactive) [ 2022-12-09 ]
  • branch : preview-10.11-MDEV-29281-galea-node-eviction-status
  • Galera library version : 26.4.14 from branch : mariadb-4.x-test
Comment by Ramesh Sivaraman [ 2022-12-09 ]

Node eviction status is now written to JSON file.

{
	"date": "2022-12-09 08:13:27.573",
	"timestamp": 1670573607.57340074,
	"errors": [
		{
			"timestamp": 1670573607.00000000,
			"msg": "WSREP: exception from gcomm, backend must be restarted: this node has been evicted out of the cluster, gcomm backend restart is required (FATAL)\n\t at \/test\/mtest\/10.11_galera\/gcomm\/src\/gmcast_proto.cpp:handle_failed():295"
		}
	],
	"warnings": [
		{
			"timestamp": 1670573188.00000000,
			"msg": "'user' entry 'root@node1' ignored in --skip-name-resolve mode."
		},
		{
			"timestamp": 1670573188.00000000,
			"msg": "'user' entry '@node1' ignored in --skip-name-resolve mode."
		},
		{
			"timestamp": 1670573188.00000000,
			"msg": "'proxies_priv' entry '@% root@node1' ignored in --skip-name-resolve mode."
		},
		{
			"timestamp": 1670573607.00000000,
			"msg": "WSREP: handshake with 514a168d-a05d tcp:\/\/192.168.100.10:4567 failed: 'evicted'"
		}
	],
	"events": [
		{
			"timestamp": 1670573607.57092166,
			"event": {"status": "evicted", "message": "This node was evicted permanently from cluster, restart is required"}
		}
	],
	"status": {
		"state": "DISCONNECTED",
		"comment": "Disconnected",
		"progress": { "from": -1, "to": -1, "total": -1, "done": -1, "indefinite": -1 }
	}
}
 
 
MariaDB [(none)]> SHOW STATUS LIKE 'wsrep%stat%';
+---------------------------+--------------------------------------+
| Variable_name             | Value                                |
+---------------------------+--------------------------------------+
| wsrep_local_state_uuid    | d42e04ae-7796-11ed-9641-164ea6a4b8d0 |
| wsrep_local_state         | 0                                    |
| wsrep_local_state_comment | Initialized                          |
| wsrep_cluster_state_uuid  | 00000000-0000-0000-0000-000000000000 |
| wsrep_cluster_status      | Disconnected                         |
+---------------------------+--------------------------------------+
5 rows in set (0.002 sec)
 
MariaDB [(none)]> 

Comment by Ramesh Sivaraman [ 2023-01-13 ]

denis.protivensky 11.0 does not print node eviction status in event section.

11.0.0 1cb0835be98985f20cccd1724ac78de3649eb2e6

 
test case
 
node3:root@localhost> show global status like 'wsrep_gcomm_uuid';
+------------------+--------------------------------------+
| Variable_name    | Value                                |
+------------------+--------------------------------------+
| wsrep_gcomm_uuid | 74ae01e0-9316-11ed-a9d5-7208a7fc2a19 |
+------------------+--------------------------------------+
1 row in set (0.002 sec)
 
node3:root@localhost> set global wsrep_provider_evs_evict='74ae01e0-9316-11ed-a9d5-7208a7fc2a19';
Query OK, 0 rows affected (0.002 sec)
 
node3:root@localhost> show global status like 'wsrep%stat%';
+---------------------------+--------------------------------------+
| Variable_name             | Value                                |
+---------------------------+--------------------------------------+
| wsrep_local_state_uuid    | 1e476211-930e-11ed-89f9-da74090fa0cb |
| wsrep_local_state         | 0                                    |
| wsrep_local_state_comment | Initialized                          |
| wsrep_cluster_state_uuid  | 00000000-0000-0000-0000-000000000000 |
| wsrep_cluster_status      | Disconnected                         |
+---------------------------+--------------------------------------+
5 rows in set (0.001 sec)
 
node3:root@localhost> 
 
 
 
Status file.
 
Every 3.0s: cat node3/wsrep_status.json                                                                                                                                             galapq: Fri Jan 13 09:51:53 2023
 
{
        "date": "2023-01-13 09:48:00.000",
        "timestamp": 1673596080.00000000,
        "errors": [
                {
                        "timestamp": 1673596080.00000000,
                        "msg": "WSREP: exception from gcomm, backend must be restarted: this node has been evicted out of the cluster, gcomm backend restart is required (FATAL)\n\t at \/test\/mtest\/galera\/gcomm
\/src\/gmcast_proto.cpp:handle_failed():283"
                }
        ],
        "warnings": [
                {
                        "timestamp": 1673596080.00000000,
                        "msg": "WSREP: handshake with e2482adf-ac83 tcp:\/\/127.0.0.1:11391 failed: 'evicted'"
                },
                {
                        "timestamp": 1673596080.00000000,
                        "msg": "Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)"
                },
                {
                        "timestamp": 1673596080.00000000,
                        "msg": "Aborted connection 6 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)"
                }
        ],
        "events": [
        ],
        "status": {
                "state": "DISCONNECTED",
                "comment": "Disconnected",
                "progress": { "from": -1, "to": -1, "total": -1, "done": -1, "indefinite": -1 }
        }
}

Comment by Denis Protivensky [ 2023-01-13 ]

Ramesh Sivaraman I checked out the commit SHA and performed the steps you described to evict the node, and the event is generated for me. Can you check that you're using the appropriate Galera library that contains the fix to emit node eviction events?

Comment by Ramesh Sivaraman [ 2023-01-13 ]

denis.protivensky Sorry, you are right, I was using Galera 4.x base branch with 11.0 version. It works fine when using the Galera branch mariadb-4.x-test.

        ],
        "events": [
                {
                        "timestamp": 1673624273.64578247,
                        "event": {"status": "evicted", "message": "This node was evicted permanently from cluster, restart is required"}
                }
        ],
        "status": {
                "state": "DISCONNECTED",
                "comment": "Disconnected",
                "progress": { "from": -1, "to": -1, "total": -1, "done": -1, "indefinite": -1 }
        }
}

Comment by Ramesh Sivaraman [ 2023-02-08 ]

ok to push

Generated at Thu Feb 08 10:07:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.