Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
https://github.com/MariaDB/server/pull/1982
Codership is planning to add a new feature to cluster nodes: reporting some wsrep status variables in a dedicated JSON file, that then can be read by an external monitoring tool. Or a human for that matter.
Rationale: until the server is fully initialized it is inaccessible by client and the only source of information is an error log which is not machine-friendly. Since wsrep node can spend a very long time in initialization phase (state transfer), it may be a very long time that automatic tools can't easily monitor its liveness and progression.
Rationale behind using a file as opposed to some sort of a socket: it is simpler and safer and the file stays in case of the process abort, so it is easy to get the last error that caused the abort.
For now the file contents will look as follows:
$ cat /tmp/galera/0/mysql/var/wsrep_status.json
|
{
|
"date": "2021-09-04 15:35:02.000",
|
"timestamp": 1630758902.00000000,
|
"errors": [
|
{
|
"timestamp": 1630758901.00000000,
|
"msg": "mysqld: Can't open shared library '/tmp/galera/0/mysql/lib64/mysql/plugin/audit_log.so' (errno: 0, cannot open shared object file: No such file or directory)"
|
},
|
{
|
"timestamp": 1630758901.00000000,
|
"msg": "Couldn't load plugins from 'audit_log.so'."
|
}
|
],
|
"warnings": [
|
{
|
"timestamp": 1630758902.00000000,
|
"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown option '--loose-skip_mysqlx'"
|
},
|
{
|
"timestamp": 1630758902.00000000,
|
"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown variable 'loose-log_error_verbosity=3'"
|
},
|
{
|
"timestamp": 1630758902.00000000,
|
"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown variable 'loose-audit_log_file=/tmp/galera/0/mysql/var/audit.log'"
|
},
|
{
|
"timestamp": 1630758902.00000000,
|
"msg": "'proxies_priv' entry '@% root@void' ignored in --skip-name-resolve mode."
|
}
|
],
|
"status": {
|
"state": "DISCONNECTED",
|
"comment": "Disconnected",
|
"progress": -1.00000
|
}
|
}
|
So there are a few most recent errors and warnings form the error log, wsrep state and a progress indicator (in case of SST/IST).
I have an ready patch for MariaDB 10.4. It introduces a new variable: `wsrep_status_file`. If that variable is unset, no file is created and no reporting is done. The patch does not support SST/IST progress reporting yet, only discrete state changes. We plan to add progress reporting in the followup patches.
This task contains also progress reporting for mariabackup SST
- Progress reporting requires tool pv
- Progress/rate-limiting can be disabled by configuration (progress = NONE)
- Progress is reported now in server error log for example :
2022-03-16 14:39:27 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 392923645, "indefinite": -1 }'
2022-03-16 14:39:28 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 896353227, "indefinite": -1 }'
2022-03-16 14:39:29 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 1386740995, "indefinite": -1 }'
2022-03-16 14:39:30 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 1914292021, "indefinite": -1 }'
2022-03-16 14:39:31 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 2429366550, "indefinite": -1 }'
2022-03-16 14:39:32 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 2731243266, "indefinite": -1 }'
2022-03-16 14:39:32 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2734803150, "done": 2734803150, "indefinite": -1 }'
Attachments
Issue Links
- causes
-
MDEV-28423 IST is failing on Joiner node when active data load on donor node
- Closed
-
MDEV-28656 Inability to roll upgrade without stopping the Galera cluster
- Closed
-
MDEV-31738 Unable to install MariaDB Community version 10.9+ on RHEL based OS due to unresolved dependency pv
- Stalled
- includes
-
MDEV-21901 Write details into a separate .dat file in case of Galera node auto-eviction
- Closed
- is part of
-
MDEV-28112 prepare 10.9.0 preview releases
- Closed
- relates to
-
MDEV-29281 Add details about node eviction status to the JSON file with Galera node status
- Closed