Details
-
Task
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
25.01.5
-
None
-
None
Description
During cooperative_monitoring, the monitor sets wait_timeout for server connections so that in the case of broken network, the MariaDB Server quickly closes the client connection and releases the lock. The value of wait_timeout is roughly monitor_interval + 2 * backend_timeout. The assumption is that a normal monitor tick cannot take longer than wait_timeout, even if server is slow to respond to monitor queries.
This assumption may not hold in all cases: A monitor update of one server contains several steps, if every step is slow (up to the configured timeout), the update as a whole can take longer than wait_timeout. If one server finishes its update quickly but another is slow, the fast server may hit wait_timeout before the next tick begins.
For now, just detecting this situation and adding warning log messages would simplify problem diagnostics. To alleviate the issue, the fast server could be pinged while waiting for the slow server to finish. In 25.10 and later, this issue is less severe, as the monitor combines multiple queries into one multiquery, which reduces the number of roundtrips per monitor tick.