[MXS-4472] Secondary monitor cannot get locks after Primary monitor power failed Created: 2023-01-09  Updated: 2023-02-14  Resolved: 2023-02-14

Status: Closed
Project: MariaDB MaxScale
Component/s: Monitor
Affects Version/s: 22.08.3
Fix Version/s: 6.4.5

Type: New Feature Priority: Major
Reporter: Maetee Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Two MaxScale with cooperative monitoring. Three MariaDB backed.



 Description   

either the power failed or the primary MaxScale node was disconnected from the network. The locks on backend servers are still exist. MaxScale secondary node cannot acquire locks and cannot perform cluster operation, such as automatic failover or manual switch over.

  • Error when try to switch over:
    Cannot perform cluster operation because this MaxScale does not have exclusive locks on a majority of servers. Run "SELECT IS_USED_LOCK('maxscale_mariadbmonitor');" on the servers to find out which connection id has a lock.
  • SELECT IS_USED_LOCK() at backend servers:
    • At master mariadb backend
      MariaDB [(none)]> SELECT IS_USED_LOCK('maxscale_mariadbmonitor'), IS_USED_LOCK('maxscale_mariadbmonitor_master')\G
      IS_USED_LOCK('maxscale_mariadbmonitor'): 26061
      IS_USED_LOCK('maxscale_mariadbmonitor_master'): 26061
    • At one of slave mariadb backend server
      MariaDB [(none)]> SELECT IS_USED_LOCK('maxscale_mariadbmonitor'), IS_USED_LOCK('maxscale_mariadbmonitor_master')\G
      IS_USED_LOCK('maxscale_mariadbmonitor'): 26061
      IS_USED_LOCK('maxscale_mariadbmonitor_master'): NULL

This does not happen when the primary MaxScale is graceful shutdown.

I think set a small wait_timeout in monitor session should help remove the stale backend locks.



 Comments   
Comment by markus makela [ 2023-01-12 ]

If there are stale locks, it means the client connections to them are still open. A lower wait_timeout seems like a pretty good idea and it could be set to something like (monitor_interval + timeouts) * 2.

Comment by Maetee [ 2023-02-02 ]

I try setting the wait_timeout hard coded to 30 seconds.
The locks are no longer stale.

diff --git a/server/core/monitor.cc b/server/core/monitor.cc
index b4c7af7..ed88b21 100644
--- a/server/core/monitor.cc
+++ b/server/core/monitor.cc
@@ -1144,6 +1144,7 @@ MonitorServer::ping_or_connect_to_db(const MonitorServer::ConnectionSettings& se
         mysql_optionsv(pConn, MYSQL_OPT_WRITE_TIMEOUT, &sett.write_timeout);
         mysql_optionsv(pConn, MYSQL_PLUGIN_DIR, mxs::connector_plugindir());
         mysql_optionsv(pConn, MARIADB_OPT_MULTI_STATEMENTS, nullptr);
+        mysql_optionsv(pConn, MYSQL_INIT_COMMAND, (void *)"SET SESSION wait_timeout=30");
 
         if (server.proxy_protocol())
         {

Comment by Esa Korhonen [ 2023-02-02 ]

Check the "Releasing locks"-section in monitor documentation.

Comment by Esa Korhonen [ 2023-02-14 ]

Handled with tcp connection timeout settings.

Generated at Thu Feb 08 04:28:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.