[MXS-4153] Graceful Restart Created: 2022-06-02 Updated: 2023-12-15 |
|
| Status: | Open |
| Project: | MariaDB MaxScale |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | New Feature | Priority: | Major |
| Reporter: | Rob Schwyzer | Assignee: | Joe Cotellese |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Markus put this well in MXS-4149-
In short, many customers have reported getting into states where it becomes necessary to restart MaxScale to avoid a crash or other issue. An example is due to rising/runaway memory usage. In many cases, the causes leading up to this are detectable via monitoring- ex, by tracking a server's remaining free memory or storage space. This means customers can proactively trigger the restart rather than waiting for a crash or true emergency. These customers are already leveraging techniques like cooperative monitoring to obtain HA from multiple MaxScale nodes. So why is a regular restart not good enough? Because a regular restart terminates and bounces back connections currently open on the MaxScale node being restarted. This makes MaxScale's HA setup appear and behave unreliably in these cases to applications/clients/etc. It should instead be possible for MaxScale to be aware it has "sister" nodes which it could migrate connections or transactions to in these cases. A "graceful restart" mechanism which has MaxScale drain its active and future connections to a "sister" node before restarting would resolve this concern and provide customers with a valuable tool needed for them and their operations teams to help themselves. Beyond-initial scope, but once MXS-4149 is related to this issue as MXS-4149 is the preferable, desired future-state. However, the graceful restart functionality requested in this feature is expected to be easier and quicker to implement and should provide a manual solution customers can benefit from ASAP and build around as necessary. MXS-4149 and other, further improvements would be ways for MaxScale to add value. |
| Comments |
| Comment by markus makela [ 2022-06-10 ] |
|
In order for this to work, |
| Comment by Johan Wikman [ 2022-09-12 ] |
|
As commented above, this can't be implemented unless the MDEV-15935 is implemented by the connectors. As there currently apparently is no activity on that front, the fix-version is tentatively moved to 23.08. As the need for this is for dealing with "rising/runaway memory usage" that often is caused by MaxScale having been incorrectly configured - e.g. if threads=auto is used when MaxScale is running in a container that does not have as many CPUs and as much memory as the host computer the container is running on - MXS-4161 may help in avoiding the problem in the first place. |
| Comment by markus makela [ 2023-06-28 ] |
|
The connection_metadata parameter that was added for |