Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
25.01.6, 25.10.2
-
None
-
MXS-SPRINT-270
Description
During cooperative_monitoring, the monitor sets wait_timeout for server connections so that in the case of broken network, the MariaDB Server quickly closes the client connection and releases the lock. The value of wait_timeout is roughly monitor_interval + 2 * backend_timeout. The assumption is that a normal monitor tick cannot take longer than wait_timeout, even if server is slow to respond to monitor queries.
This assumption may not hold in all cases: A monitor update of one server contains several steps, if every step is slow (up to the configured timeout), the update as a whole can take longer than wait_timeout. If one server finishes its update quickly but another is slow, the fast server may hit wait_timeout before the next tick begins.
A backend_connect_attempts value of greater than 1 can also cause issues: if one server responds slowly or drops connection attempts, the monitor tick may take longer than wait_timeout, since backend_connect_attempts does NOT affect the used wait_timeout value.
To protect against these issues, once the monitor has a completed an update on a server, the monitor should keep pinging that server until all updates are complete.
Attachments
Issue Links
- relates to
-
MXS-6192 Log message when a server monitor update exceeds wait_timeout
-
- Closed
-