[MXS-2678] RSU on Galera via MaxScale Created: 2019-09-16 Updated: 2022-09-08 Resolved: 2022-09-08 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | N/A |
| Affects Version/s: | None |
| Fix Version/s: | N/A |
| Type: | New Feature | Priority: | Major |
| Reporter: | Assen Totin (Inactive) | Assignee: | Todd Stoffel (Inactive) |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | Galera | ||
| Epic Link: | Galera Compatibility |
| Description |
|
Long-running DDLs are known to cause problems on Galera clusters, hence we promote the concept or RSU in MariaDB server as a replacement However, the traditional RSU requires external management of the process and while it is doable via MaxScale API, having a native ability in MaxScale for this would be much appreciated. It may go along these lines (or anything similar as long as it is easy to use and can be triggered by the client within its session without the need to have access to the MaxScale API):
The actual way the DDL is executed on the Galera node is left to the node itself, so we don't want to implement more complex stuff like the shadow DDL of pt-osc. Also, blocking the client is OK as having a non-blocking DDL on the client will be probably be more complex and confusing; keeping the connection intact is a matter of the client - likely we only need to specify what happens if the connection is dropped before the process is completed, but, I guess, would be OK to continue the RSU until completed (as our DDL is non-transactional). |
| Comments |
| Comment by markus makela [ 2019-09-26 ] |
|
Would executing SQL statements serially on all nodes solves this? What I'm thinking is that with a few modifications existing code can be changed to do this. For example running ALTER TABLE ... with three configured servers (A, B and C), the statement would first be sent to A. After A is done executing it, the statement would be sent to B and after B is done, the statement is sent to C. If this would be adequate, the Cat router could be modified to do this quite easily. |
| Comment by Assen Totin (Inactive) [ 2019-09-27 ] |
|
I think yes, my idea was to serialise DDL execution: at the expense of keeping the client blocked, but with the benefit - and this is the key here - of not blocking the cluster entirely during the DDL run, so that all other queries may be run on all remaining nodes. This is why I thought of first removing a node (drain, maintenance mode, whatever), then applying the DDL, then rejoining (hopefully fitting into IST time window). I'm afraid that if we don't disconnect the node while the DDL is running, the queries it receive may get slow or be blocked completely. |