[MDEV-26971] JSON file interface to wsrep node state / SST progress logging Created: 2021-09-07  Updated: 2023-08-10  Resolved: 2022-03-18

Status: Closed
Project: MariaDB Server
Component/s: Galera, wsrep
Fix Version/s: 10.9.0

Type: Task Priority: Critical
Reporter: Alexey Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: Preview_10.9

Issue Links:
PartOf
includes MDEV-21901 Write details into a separate .dat fi... Closed
is part of MDEV-28112 prepare 10.9.0 preview releases Closed
Problem/Incident
causes MDEV-28423 IST is failing on Joiner node when ac... Closed
causes MDEV-28656 Inability to roll upgrade without sto... Closed
causes MDEV-31738 Unable to install MariaDB Community v... Stalled
Relates
relates to MDEV-29281 Add details about node eviction statu... Closed

 Description   

https://github.com/MariaDB/server/pull/1982

Codership is planning to add a new feature to cluster nodes: reporting some wsrep status variables in a dedicated JSON file, that then can be read by an external monitoring tool. Or a human for that matter.

Rationale: until the server is fully initialized it is inaccessible by client and the only source of information is an error log which is not machine-friendly. Since wsrep node can spend a very long time in initialization phase (state transfer), it may be a very long time that automatic tools can't easily monitor its liveness and progression.

Rationale behind using a file as opposed to some sort of a socket: it is simpler and safer and the file stays in case of the process abort, so it is easy to get the last error that caused the abort.

For now the file contents will look as follows:

$ cat /tmp/galera/0/mysql/var/wsrep_status.json 
{
	"date": "2021-09-04 15:35:02.000",
	"timestamp": 1630758902.00000000,
	"errors": [
		{
			"timestamp": 1630758901.00000000,
			"msg": "mysqld: Can't open shared library '/tmp/galera/0/mysql/lib64/mysql/plugin/audit_log.so' (errno: 0, cannot open shared object file: No such file or directory)"
		},
		{
			"timestamp": 1630758901.00000000,
			"msg": "Couldn't load plugins from 'audit_log.so'."
		}
	],
	"warnings": [
		{
			"timestamp": 1630758902.00000000,
			"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown option '--loose-skip_mysqlx'"
		},
		{
			"timestamp": 1630758902.00000000,
			"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown variable 'loose-log_error_verbosity=3'"
		},
		{
			"timestamp": 1630758902.00000000,
			"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown variable 'loose-audit_log_file=/tmp/galera/0/mysql/var/audit.log'"
		},
		{
			"timestamp": 1630758902.00000000,
			"msg": "'proxies_priv' entry '@% root@void' ignored in --skip-name-resolve mode."
		}
	],
	"status": {
		"state": "DISCONNECTED",
		"comment": "Disconnected",
		"progress": -1.00000
	}
}

So there are a few most recent errors and warnings form the error log, wsrep state and a progress indicator (in case of SST/IST).

I have an ready patch for MariaDB 10.4. It introduces a new variable: `wsrep_status_file`. If that variable is unset, no file is created and no reporting is done. The patch does not support SST/IST progress reporting yet, only discrete state changes. We plan to add progress reporting in the followup patches.

This task contains also progress reporting for mariabackup SST

  • Progress reporting requires tool pv
  • Progress/rate-limiting can be disabled by configuration (progress = NONE)
  • Progress is reported now in server error log for example :

    2022-03-16 14:39:27 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 392923645, "indefinite": -1 }'
    2022-03-16 14:39:28 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 896353227, "indefinite": -1 }'
    2022-03-16 14:39:29 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 1386740995, "indefinite": -1 }'
    2022-03-16 14:39:30 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 1914292021, "indefinite": -1 }'
    2022-03-16 14:39:31 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 2429366550, "indefinite": -1 }'
    2022-03-16 14:39:32 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 2731243266, "indefinite": -1 }'
    2022-03-16 14:39:32 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2734803150, "done": 2734803150, "indefinite": -1 }'
    



 Comments   
Comment by Jan Lindström (Inactive) [ 2021-12-17 ]

ralf.gebhardt@mariadb.com julien.fritsch This needs to be moved on next release.

Comment by Jan Lindström (Inactive) [ 2022-02-11 ]

branch: preview-10.9-MDEV-26971-JSON-status

Contains progress reporting for mariabackup. Testing should include testing without pv tool and with pv tool.

Comment by Ramesh Sivaraman [ 2022-02-16 ]

Yurchenko As discussed, changes to the node state are not reflected in the wsrep_status_file when SST/IST is triggered or the node state is forcibly changed. The node wsrep state is always shown as disconnected in wsrep_status_file.

 	"status": {
		"state": "DISCONNECTED",
		"comment": "Disconnected",
		"progress": { "from": -1, "to": -1, "total": -1, "done": -1, "indefinite": -1 }
	}
 
MariaDB [(none)]> show status like '%wsrep_local_state%';
+---------------------------+--------------------------------------+
| Variable_name             | Value                                |
+---------------------------+--------------------------------------+
| wsrep_local_state_uuid    | e4b724da-8efa-11ec-af60-76a360145ddc |
| wsrep_local_state         | 4                                    |
| wsrep_local_state_comment | Synced                               |
+---------------------------+--------------------------------------+
3 rows in set (0.001 sec)
 
MariaDB [(none)]> SET GLOBAL wsrep_provider_options = 'gmcast.isolate=1';
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> 
 
MariaDB [(none)]> show status like '%wsrep_local_state%';
+---------------------------+--------------------------------------+
| Variable_name             | Value                                |
+---------------------------+--------------------------------------+
| wsrep_local_state_uuid    | e4b724da-8efa-11ec-af60-76a360145ddc |
| wsrep_local_state         | 0                                    |
| wsrep_local_state_comment | Initialized                          |
+---------------------------+--------------------------------------+
3 rows in set (0.001 sec)
 
MariaDB [(none)]> 

Comment by Ramesh Sivaraman [ 2022-03-15 ]

ok to push

Comment by Julius Goryavsky [ 2022-03-22 ]

Added post-fix for SST scripts: https://github.com/MariaDB/server/commit/ab1a7925714727743e5dd7bb1b83014044fc5660

1) Fixed a bug with incorrect calculation of the payload length when using the --innodb-undo-directory option on the command line (not in the .cnf file);
2) Fixed a bug with incorrect processing of progress=none in the presence of *.qp-files;
3) Progress processing for *.qp-files has been made more logically unified with the semantics of the adjust_progress function;
4) Added forgotten option "--maxdepth 1" when searching for *.qp-files;
5) Eliminated "cosmetic" differences with the SST scripts in previous versions, which otherwise will cause problems when automatically migrating future changes;
6) Fixed a bug caused by automatic migration of the MDEV-27524.

Comment by Jan Lindström (Inactive) [ 2022-03-23 ]

For the SST script you can use following in configuration file:

[sst]
progress=none

Comment by Ramesh Sivaraman [ 2023-07-20 ]

ralf.gebhardt As discussed, these are my observations regarding this feature.

Test case 1
****************************************
wsrep_debug=0
wsrep_status_file=wsrep_status.json
 
[sst]
progress=none
 
in this configuration wsrep_status_file will be generated irrespective of any progress options (none/1/<filename>) and wsrep-debug=on/off
 
SST log
-------
{
        "date": "2023-07-20 10:06:49.241",
        "timestamp": 1689836809.24185443,
        "errors": [
        ],
        "warnings": [
                {
                        "timestamp": 1689836683.00000000,
                        "msg": "WSREP: Could not open state file for reading: '\/test\/GAL_MD100723-mariadb-11.2.0-linux-x86_64-dbg\/node2\/\/grastate.dat'"
                }
        ],
        "events": [
        ],
        "status": {
                "state": "SYNCED",
                "comment": "Operational",
                "progress": { "from": -1, "to": -1, "total": 0, "done": 0, "indefinite": -1 }
        }
}
 
IST log
-------
 
{
        "date": "2023-07-20 10:26:03.868",
        "timestamp": 1689837963.86892033,
        "errors": [
        ],
        "warnings": [
        ],
        "events": [
        ],
        "status": {
                "state": "JOINING",
                "comment": "Receiving IST",
                "progress": { "from": 1, "to": 3, "total": 44812, "done": 0, "undefined": -1 }
        }
}
 
Test case 2
****************************************
wsrep_debug=1
wsrep_status_file=wsrep_status.json
 
[sst]
progress=none
 
In this configuration, even after wsrep-debug=1 is enabled, SST progress is not written to the error log as  "progress=none" in the SST section 
 
Test case 3
****************************************
wsrep_debug=1
wsrep_status_file=wsrep_status.json
 
[sst]
#progress=none
 
SST progress is written to the error log as progress=none commented out in the SST section and wsrep-debug is enabled. 
 
 
SREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:12603,reuseaddr stdio | { pv -f -i 1 -n -b 2>&3 | '/test/GAL_MD100723-mariadb-11.2.0-linux-x86_64-dbg//bin/mbstream' -x; } 3>&1 | stdbuf -oL tr '\r' '\n' | xargs 
-n1 echo complete; RC=( ${PIPESTATUS[@]} ) (20230720 10:32:13.916)
WSREP_SST: [INFO] Proceeding with SST (20230720 10:32:13.916)
2023-07-20 10:32:13 0 [Note] WSREP: (89a27fca-931d, 'tcp://127.0.0.1:12273') turning message relay requesting off
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20230720 10:32:13.939)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20230720 10:32:14.047)
2023-07-20 10:32:25 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44352189, "done": 44352189, "indefinite": -1 }'
2023-07-20 10:32:25 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44644937, "done": 44644937, "indefinite": -1 }'
2023-07-20 10:32:26 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44937938, "done": 44937938, "indefinite": -1 }'
2023-07-20 10:32:27 0 [Note] WSREP: 0.0 (galapq): State transfer to 1.0 (galapq) complete.
 
 
Test case 4
****************************************
wsrep_debug=1
wsrep_status_file=wsrep_status.json
 
[sst]
progress=sst_progress.txt
 
SST progress is written to the file which is mentioned in progress option and wsrep-debug is enabled. 
 
$ cat node2/sst_progress.txt 
   joiner: => Rate:[3.36MiB/s] Avg:[3.36MiB/s] Elapsed:0:00:13  Bytes: 45.6MiB 
$ 
 
 
This confirms that wsrep-debug must be enabled to get SST progress in the error log or in a separate file. 
Also wsrep status JSON will be generated if we enable wsrep_status_file in configuration. It is not depened on wsrep-debug and SST progress option. This file is mainly used to get wsrep node state

Generated at Thu Feb 08 09:49:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.