[MXS-4153] Graceful Restart - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: LongTerm
Component/s: None
Labels:
None

Description

Markus put this well in MXS-4149-

An alternative way to deal with these sort of situations would be to have graceful shutdowns of MaxScale nodes. This would allow open connections to be migrated to a replacement node once they're done with their active transactions. This wouldn't save transactions that are lost due to unexpected outages but the use-case for "needing to restart" would be served quite well with this.

In short, many customers have reported getting into states where it becomes necessary to restart MaxScale to avoid a crash or other issue. An example is due to rising/runaway memory usage. In many cases, the causes leading up to this are detectable via monitoring- ex, by tracking a server's remaining free memory or storage space. This means customers can proactively trigger the restart rather than waiting for a crash or true emergency.

These customers are already leveraging techniques like cooperative monitoring to obtain HA from multiple MaxScale nodes. So why is a regular restart not good enough? Because a regular restart terminates and bounces back connections currently open on the MaxScale node being restarted. This makes MaxScale's HA setup appear and behave unreliably in these cases to applications/clients/etc.

It should instead be possible for MaxScale to be aware it has "sister" nodes which it could migrate connections or transactions to in these cases. A "graceful restart" mechanism which has MaxScale drain its active and future connections to a "sister" node before restarting would resolve this concern and provide customers with a valuable tool needed for them and their operations teams to help themselves.

Beyond-initial scope, but once ~~MXS-3822~~ is implemented, there will be a lot of runway in future MaxScale versions to enhance this feature by enabling automatic graceful restarts and such.

MXS-4149 is related to this issue as MXS-4149 is the preferable, desired future-state. However, the graceful restart functionality requested in this feature is expected to be easier and quicker to implement and should provide a manual solution customers can benefit from ASAP and build around as necessary. MXS-4149 and other, further improvements would be ways for MaxScale to add value.

Attachments

Issue Links

blocks

MXS-4149 Cooperative Transaction Replay

Open

is blocked by

CONC-599 Add support for connection redirection on the Connector/C

Open

CONCPP-101 Add support for connection redirection on the Connector/C++

Open

CONJ-981 Add support for connection redirection

Closed

CONJS-207 Add support for connection redirection

Closed

CONPY-207 Add support for connection redirection on the Connector/Python

Open

MDEV-15935 Connection Redirection Mechanism in MariaDB Client/Server Protocol

Closed

ODBC-364 Add support for connection redirection on the Connector/ODBC

Open

R2DBC-66 Add support for connection redirection

Closed

is duplicated by

MXS-4737 Graceful shutdown of MaxScale when using more than one MaxScale instance

Closed

relates to

MDEV-32053 New features requested by customer on 2023-08-28

Open

MXS-4635 Provide load balancing metadata to connectors

Closed

MDEV-33926 Graceful shutdown proxy hint feature - MariaDB version of pxc_maint_mode

Open

(4 is blocked by, 1 is duplicated by, 3 relates to)

Activity

People

Assignee:: Unassigned

Reporter:: Rob Schwyzer

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2022-06-02 18:57

Updated:: 2024-10-04 09:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.