Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26971

JSON file interface to wsrep node state / SST progress logging

Details

    Description

      https://github.com/MariaDB/server/pull/1982

      Codership is planning to add a new feature to cluster nodes: reporting some wsrep status variables in a dedicated JSON file, that then can be read by an external monitoring tool. Or a human for that matter.

      Rationale: until the server is fully initialized it is inaccessible by client and the only source of information is an error log which is not machine-friendly. Since wsrep node can spend a very long time in initialization phase (state transfer), it may be a very long time that automatic tools can't easily monitor its liveness and progression.

      Rationale behind using a file as opposed to some sort of a socket: it is simpler and safer and the file stays in case of the process abort, so it is easy to get the last error that caused the abort.

      For now the file contents will look as follows:

      $ cat /tmp/galera/0/mysql/var/wsrep_status.json 
      {
      	"date": "2021-09-04 15:35:02.000",
      	"timestamp": 1630758902.00000000,
      	"errors": [
      		{
      			"timestamp": 1630758901.00000000,
      			"msg": "mysqld: Can't open shared library '/tmp/galera/0/mysql/lib64/mysql/plugin/audit_log.so' (errno: 0, cannot open shared object file: No such file or directory)"
      		},
      		{
      			"timestamp": 1630758901.00000000,
      			"msg": "Couldn't load plugins from 'audit_log.so'."
      		}
      	],
      	"warnings": [
      		{
      			"timestamp": 1630758902.00000000,
      			"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown option '--loose-skip_mysqlx'"
      		},
      		{
      			"timestamp": 1630758902.00000000,
      			"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown variable 'loose-log_error_verbosity=3'"
      		},
      		{
      			"timestamp": 1630758902.00000000,
      			"msg": "/tmp/galera/0/mysql/sbin/mysqld: unknown variable 'loose-audit_log_file=/tmp/galera/0/mysql/var/audit.log'"
      		},
      		{
      			"timestamp": 1630758902.00000000,
      			"msg": "'proxies_priv' entry '@% root@void' ignored in --skip-name-resolve mode."
      		}
      	],
      	"status": {
      		"state": "DISCONNECTED",
      		"comment": "Disconnected",
      		"progress": -1.00000
      	}
      }
      

      So there are a few most recent errors and warnings form the error log, wsrep state and a progress indicator (in case of SST/IST).

      I have an ready patch for MariaDB 10.4. It introduces a new variable: `wsrep_status_file`. If that variable is unset, no file is created and no reporting is done. The patch does not support SST/IST progress reporting yet, only discrete state changes. We plan to add progress reporting in the followup patches.

      This task contains also progress reporting for mariabackup SST

      • Progress reporting requires tool pv
      • Progress/rate-limiting can be disabled by configuration (progress = NONE)
      • Progress is reported now in server error log for example :

        2022-03-16 14:39:27 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 392923645, "indefinite": -1 }'
        2022-03-16 14:39:28 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 896353227, "indefinite": -1 }'
        2022-03-16 14:39:29 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 1386740995, "indefinite": -1 }'
        2022-03-16 14:39:30 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 1914292021, "indefinite": -1 }'
        2022-03-16 14:39:31 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 2429366550, "indefinite": -1 }'
        2022-03-16 14:39:32 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2731303065, "done": 2731243266, "indefinite": -1 }'
        2022-03-16 14:39:32 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 2734803150, "done": 2734803150, "indefinite": -1 }'
        

      Attachments

        Issue Links

          Activity

            ralf.gebhardt@mariadb.com julien.fritsch This needs to be moved on next release.

            jplindst Jan Lindström (Inactive) added a comment - ralf.gebhardt@mariadb.com julien.fritsch This needs to be moved on next release.

            branch: preview-10.9-MDEV-26971-JSON-status

            Contains progress reporting for mariabackup. Testing should include testing without pv tool and with pv tool.

            jplindst Jan Lindström (Inactive) added a comment - branch: preview-10.9- MDEV-26971 -JSON-status Contains progress reporting for mariabackup. Testing should include testing without pv tool and with pv tool.

            Yurchenko As discussed, changes to the node state are not reflected in the wsrep_status_file when SST/IST is triggered or the node state is forcibly changed. The node wsrep state is always shown as disconnected in wsrep_status_file.

             	"status": {
            		"state": "DISCONNECTED",
            		"comment": "Disconnected",
            		"progress": { "from": -1, "to": -1, "total": -1, "done": -1, "indefinite": -1 }
            	}
             
            MariaDB [(none)]> show status like '%wsrep_local_state%';
            +---------------------------+--------------------------------------+
            | Variable_name             | Value                                |
            +---------------------------+--------------------------------------+
            | wsrep_local_state_uuid    | e4b724da-8efa-11ec-af60-76a360145ddc |
            | wsrep_local_state         | 4                                    |
            | wsrep_local_state_comment | Synced                               |
            +---------------------------+--------------------------------------+
            3 rows in set (0.001 sec)
             
            MariaDB [(none)]> SET GLOBAL wsrep_provider_options = 'gmcast.isolate=1';
            Query OK, 0 rows affected (0.000 sec)
             
            MariaDB [(none)]> 
             
            MariaDB [(none)]> show status like '%wsrep_local_state%';
            +---------------------------+--------------------------------------+
            | Variable_name             | Value                                |
            +---------------------------+--------------------------------------+
            | wsrep_local_state_uuid    | e4b724da-8efa-11ec-af60-76a360145ddc |
            | wsrep_local_state         | 0                                    |
            | wsrep_local_state_comment | Initialized                          |
            +---------------------------+--------------------------------------+
            3 rows in set (0.001 sec)
             
            MariaDB [(none)]> 
            

            ramesh Ramesh Sivaraman added a comment - Yurchenko As discussed, changes to the node state are not reflected in the wsrep_status_file when SST/IST is triggered or the node state is forcibly changed. The node wsrep state is always shown as disconnected in wsrep_status_file. "status" : { "state" : "DISCONNECTED" , "comment" : "Disconnected" , "progress" : { "from" : -1, "to" : -1, "total" : -1, "done" : -1, "indefinite" : -1 } }   MariaDB [(none)]> show status like '%wsrep_local_state%' ; + ---------------------------+--------------------------------------+ | Variable_name | Value | + ---------------------------+--------------------------------------+ | wsrep_local_state_uuid | e4b724da-8efa-11ec-af60-76a360145ddc | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | + ---------------------------+--------------------------------------+ 3 rows in set (0.001 sec)   MariaDB [(none)]> SET GLOBAL wsrep_provider_options = 'gmcast.isolate=1' ; Query OK, 0 rows affected (0.000 sec)   MariaDB [(none)]>   MariaDB [(none)]> show status like '%wsrep_local_state%' ; + ---------------------------+--------------------------------------+ | Variable_name | Value | + ---------------------------+--------------------------------------+ | wsrep_local_state_uuid | e4b724da-8efa-11ec-af60-76a360145ddc | | wsrep_local_state | 0 | | wsrep_local_state_comment | Initialized | + ---------------------------+--------------------------------------+ 3 rows in set (0.001 sec)   MariaDB [(none)]>

            ok to push

            ramesh Ramesh Sivaraman added a comment - ok to push
            sysprg Julius Goryavsky added a comment - - edited

            Added post-fix for SST scripts: https://github.com/MariaDB/server/commit/ab1a7925714727743e5dd7bb1b83014044fc5660

            1) Fixed a bug with incorrect calculation of the payload length when using the --innodb-undo-directory option on the command line (not in the .cnf file);
            2) Fixed a bug with incorrect processing of progress=none in the presence of *.qp-files;
            3) Progress processing for *.qp-files has been made more logically unified with the semantics of the adjust_progress function;
            4) Added forgotten option "--maxdepth 1" when searching for *.qp-files;
            5) Eliminated "cosmetic" differences with the SST scripts in previous versions, which otherwise will cause problems when automatically migrating future changes;
            6) Fixed a bug caused by automatic migration of the MDEV-27524.

            sysprg Julius Goryavsky added a comment - - edited Added post-fix for SST scripts: https://github.com/MariaDB/server/commit/ab1a7925714727743e5dd7bb1b83014044fc5660 1) Fixed a bug with incorrect calculation of the payload length when using the --innodb-undo-directory option on the command line (not in the .cnf file); 2) Fixed a bug with incorrect processing of progress=none in the presence of *.qp-files; 3) Progress processing for *.qp-files has been made more logically unified with the semantics of the adjust_progress function; 4) Added forgotten option "--maxdepth 1" when searching for *.qp-files; 5) Eliminated "cosmetic" differences with the SST scripts in previous versions, which otherwise will cause problems when automatically migrating future changes; 6) Fixed a bug caused by automatic migration of the MDEV-27524 .

            For the SST script you can use following in configuration file:

            [sst]
            progress=none
            

            jplindst Jan Lindström (Inactive) added a comment - For the SST script you can use following in configuration file: [sst] progress=none

            ralf.gebhardt As discussed, these are my observations regarding this feature.

            Test case 1
            ****************************************
            wsrep_debug=0
            wsrep_status_file=wsrep_status.json
             
            [sst]
            progress=none
             
            in this configuration wsrep_status_file will be generated irrespective of any progress options (none/1/<filename>) and wsrep-debug=on/off
             
            SST log
            -------
            {
                    "date": "2023-07-20 10:06:49.241",
                    "timestamp": 1689836809.24185443,
                    "errors": [
                    ],
                    "warnings": [
                            {
                                    "timestamp": 1689836683.00000000,
                                    "msg": "WSREP: Could not open state file for reading: '\/test\/GAL_MD100723-mariadb-11.2.0-linux-x86_64-dbg\/node2\/\/grastate.dat'"
                            }
                    ],
                    "events": [
                    ],
                    "status": {
                            "state": "SYNCED",
                            "comment": "Operational",
                            "progress": { "from": -1, "to": -1, "total": 0, "done": 0, "indefinite": -1 }
                    }
            }
             
            IST log
            -------
             
            {
                    "date": "2023-07-20 10:26:03.868",
                    "timestamp": 1689837963.86892033,
                    "errors": [
                    ],
                    "warnings": [
                    ],
                    "events": [
                    ],
                    "status": {
                            "state": "JOINING",
                            "comment": "Receiving IST",
                            "progress": { "from": 1, "to": 3, "total": 44812, "done": 0, "undefined": -1 }
                    }
            }
             
            Test case 2
            ****************************************
            wsrep_debug=1
            wsrep_status_file=wsrep_status.json
             
            [sst]
            progress=none
             
            In this configuration, even after wsrep-debug=1 is enabled, SST progress is not written to the error log as  "progress=none" in the SST section 
             
            Test case 3
            ****************************************
            wsrep_debug=1
            wsrep_status_file=wsrep_status.json
             
            [sst]
            #progress=none
             
            SST progress is written to the error log as progress=none commented out in the SST section and wsrep-debug is enabled. 
             
             
            SREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:12603,reuseaddr stdio | { pv -f -i 1 -n -b 2>&3 | '/test/GAL_MD100723-mariadb-11.2.0-linux-x86_64-dbg//bin/mbstream' -x; } 3>&1 | stdbuf -oL tr '\r' '\n' | xargs 
            -n1 echo complete; RC=( ${PIPESTATUS[@]} ) (20230720 10:32:13.916)
            WSREP_SST: [INFO] Proceeding with SST (20230720 10:32:13.916)
            2023-07-20 10:32:13 0 [Note] WSREP: (89a27fca-931d, 'tcp://127.0.0.1:12273') turning message relay requesting off
            WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20230720 10:32:13.939)
            WSREP_SST: [INFO] Waiting for SST streaming to complete! (20230720 10:32:14.047)
            2023-07-20 10:32:25 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44352189, "done": 44352189, "indefinite": -1 }'
            2023-07-20 10:32:25 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44644937, "done": 44644937, "indefinite": -1 }'
            2023-07-20 10:32:26 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44937938, "done": 44937938, "indefinite": -1 }'
            2023-07-20 10:32:27 0 [Note] WSREP: 0.0 (galapq): State transfer to 1.0 (galapq) complete.
             
             
            Test case 4
            ****************************************
            wsrep_debug=1
            wsrep_status_file=wsrep_status.json
             
            [sst]
            progress=sst_progress.txt
             
            SST progress is written to the file which is mentioned in progress option and wsrep-debug is enabled. 
             
            $ cat node2/sst_progress.txt 
               joiner: => Rate:[3.36MiB/s] Avg:[3.36MiB/s] Elapsed:0:00:13  Bytes: 45.6MiB 
            $ 
             
             
            This confirms that wsrep-debug must be enabled to get SST progress in the error log or in a separate file. 
            Also wsrep status JSON will be generated if we enable wsrep_status_file in configuration. It is not depened on wsrep-debug and SST progress option. This file is mainly used to get wsrep node state
            

            ramesh Ramesh Sivaraman added a comment - ralf.gebhardt As discussed, these are my observations regarding this feature. Test case 1 **************************************** wsrep_debug=0 wsrep_status_file=wsrep_status.json   [sst] progress=none   in this configuration wsrep_status_file will be generated irrespective of any progress options (none/1/<filename>) and wsrep-debug=on/off   SST log ------- { "date": "2023-07-20 10:06:49.241", "timestamp": 1689836809.24185443, "errors": [ ], "warnings": [ { "timestamp": 1689836683.00000000, "msg": "WSREP: Could not open state file for reading: '\/test\/GAL_MD100723-mariadb-11.2.0-linux-x86_64-dbg\/node2\/\/grastate.dat'" } ], "events": [ ], "status": { "state": "SYNCED", "comment": "Operational", "progress": { "from": -1, "to": -1, "total": 0, "done": 0, "indefinite": -1 } } }   IST log -------   { "date": "2023-07-20 10:26:03.868", "timestamp": 1689837963.86892033, "errors": [ ], "warnings": [ ], "events": [ ], "status": { "state": "JOINING", "comment": "Receiving IST", "progress": { "from": 1, "to": 3, "total": 44812, "done": 0, "undefined": -1 } } }   Test case 2 **************************************** wsrep_debug=1 wsrep_status_file=wsrep_status.json   [sst] progress=none   In this configuration, even after wsrep-debug=1 is enabled, SST progress is not written to the error log as "progress=none" in the SST section   Test case 3 **************************************** wsrep_debug=1 wsrep_status_file=wsrep_status.json   [sst] #progress=none   SST progress is written to the error log as progress=none commented out in the SST section and wsrep-debug is enabled.     SREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:12603,reuseaddr stdio | { pv -f -i 1 -n -b 2>&3 | '/test/GAL_MD100723-mariadb-11.2.0-linux-x86_64-dbg//bin/mbstream' -x; } 3>&1 | stdbuf -oL tr '\r' '\n' | xargs -n1 echo complete; RC=( ${PIPESTATUS[@]} ) (20230720 10:32:13.916) WSREP_SST: [INFO] Proceeding with SST (20230720 10:32:13.916) 2023-07-20 10:32:13 0 [Note] WSREP: (89a27fca-931d, 'tcp://127.0.0.1:12273') turning message relay requesting off WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20230720 10:32:13.939) WSREP_SST: [INFO] Waiting for SST streaming to complete! (20230720 10:32:14.047) 2023-07-20 10:32:25 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44352189, "done": 44352189, "indefinite": -1 }' 2023-07-20 10:32:25 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44644937, "done": 44644937, "indefinite": -1 }' 2023-07-20 10:32:26 0 [Note] WSREP: REPORTING SST PROGRESS: '{ "from": 1, "to": 3, "total": 44937938, "done": 44937938, "indefinite": -1 }' 2023-07-20 10:32:27 0 [Note] WSREP: 0.0 (galapq): State transfer to 1.0 (galapq) complete.     Test case 4 **************************************** wsrep_debug=1 wsrep_status_file=wsrep_status.json   [sst] progress=sst_progress.txt   SST progress is written to the file which is mentioned in progress option and wsrep-debug is enabled.   $ cat node2/sst_progress.txt joiner: => Rate:[3.36MiB/s] Avg:[3.36MiB/s] Elapsed:0:00:13 Bytes: 45.6MiB $     This confirms that wsrep-debug must be enabled to get SST progress in the error log or in a separate file. Also wsrep status JSON will be generated if we enable wsrep_status_file in configuration. It is not depened on wsrep-debug and SST progress option. This file is mainly used to get wsrep node state

            People

              jplindst Jan Lindström (Inactive)
              Yurchenko Alexey
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.