Details
-
Task
-
Status: In Progress (View Workflow)
-
Minor
-
Resolution: Unresolved
-
25.01.5
-
None
-
MXS-SPRINT-269
Description
During cooperative_monitoring, the monitor sets wait_timeout for server connections so that in the case of broken network, the MariaDB Server quickly closes the client connection and releases the lock. The value of wait_timeout is roughly monitor_interval + 2 * backend_timeout. The assumption is that a normal monitor tick cannot take longer than wait_timeout, even if server is slow to respond to monitor queries.
This assumption may not hold in all cases: A monitor update of one server contains several steps, if every step is slow (up to the configured timeout), the update as a whole can take longer than wait_timeout. If one server finishes its update quickly but another is slow, the fast server may hit wait_timeout before the next tick begins.
For now, just detecting this situation and adding warning log messages would simplify problem diagnostics. In 25.10 and later, this issue is less severe, as the monitor combines multiple queries into one multiquery, which reduces the number of roundtrips per monitor tick.
Attachments
Issue Links
- relates to
-
MXS-6463 Ping servers to maintain cooperative monitoring locks
-
- Open
-