[MXS-4474] MaxScale hangs with warning about "Worker 1 attempted to send a message to worker 1" Created: 2023-01-09  Updated: 2023-01-10  Resolved: 2023-01-10

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.5.24
Fix Version/s: 2.5.25, 6.4.5

Type: Bug Priority: Major
Reporter: markus makela Assignee: markus makela
Resolution: Fixed Votes: 1
Labels: None


 Description   

An existing bug in older 2.5 versions manifested as a lost event. In 2.5.24 this manifests as a hang which will, under intensive workloads, end up cause the SystemD watchdog to kill MaxScale.

The reason for this is the inappropriate use of internal messages for tasks which other mechanisms should be used. Previously this misuse manifested as an error about the failure to write a message to a worker which also caused hanging client connections.



 Comments   
Comment by markus makela [ 2023-01-10 ]

Although the problem is only theoretically encountered in 2.5, in practice it seems extremely unlikely. In the current 6.4 development version this ended up being far more likely to occur which means this should not impact any released versions in any meaningful way.

Versions 22.08 and newer are not affected by this bug.

Comment by markus makela [ 2023-01-10 ]

It also turns out that 2.5 is unnecessarily pessimistic about writes into pipes and uses O_DIRECT to put the pipe into a "message mode". This causes each message to take a page of memory instead of the actual message size (24 bytes) which in turn causes the queue to fill up much faster. The O_DIRECT isn't required as all systems do not support it and thus disabling it basically increases the effective pipe size roughly by a factor of 170.

Generated at Thu Feb 08 04:28:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.