[MXS-3412] Cancel drain causes connection problems Created: 2021-02-22  Updated: 2021-03-17  Resolved: 2021-03-17

Status: Closed
Project: MariaDB MaxScale
Component/s: maxctrl
Affects Version/s: 2.5.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Farkas Assignee: markus makela
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Debian 10, Maria 10.5, Maxscale 2.5.8~buster-1


Sprint: MXS-SPRINT-127

 Description   

If you run drain and then cancel (ctrl-C) before it ends, maxscale create temporary cnf file - /var/lib/maxscale/maxscale.cnf.d/Splitter-Service.cnf

In targets will be only 2 of 3 server (excluding drained server). If drained server is master, you will get unwanted state, when in "list servers" you have all 3 servers running, but connection doesnt work.
In log is:

error  : (1) [readwritesplit] (Splitter-Service) Couldn't find suitable Master from 2 candidates.

Restart doesnt fix this issue, you have to remove cnf file and restart maxscale or drain server again to complete the process.

Step by step:

maxctrl list servers - WORKING
│ Server │ Address    │ Port │ Connections │ State                   │ GTID          │
│ db1    │ 10.0.0.31 │ 3306 │ 0           │ Slave, Synced, Running  │ 0-1-690759442 │
│ db2    │ 10.0.0.32 │ 3306 │ 0           │ Slave, Synced, Running  │ 0-2-690033556 │
│ db3    │ 10.0.0.33 │ 3306 │ 0           │ Master, Synced, Running │ 0-3-690166487 │
 
maxctrl drain server db3
^C
maxctrl
 maxctrl list servers
│ Server │ Address    │ Port │ Connections │ State                   │ GTID          │
│ db1    │ 10.0.0.31 │ 3306 │ 0           │ Slave, Synced, Running  │ 0-1-690759442 │
│ db2    │ 10.0.0.32 │ 3306 │ 0           │ Slave, Synced, Running  │ 0-2-690033556 │
│ db3    │ 10.0.0.33 │ 3306 │ 0           │ Master, Synced, Running │ 0-3-690166511 │
 
# cat /var/lib/maxscale/maxscale.cnf.d/Splitter-Service.cnf | grep targets
targets=db1,db2
 
# mysql -u user -h 10.0.0.10 -p
Enter password: 
ERROR 1815 (HY000): Internal error: Session creation failed

I think it is necessary to make some trap or lock...



 Comments   
Comment by markus makela [ 2021-02-22 ]

This happens due to the fact that the drain command is implemented as multiple HTTP calls instead of a single atomic operation. There should be a rollback step of some sort in the code that removes the servers from the servicse to prevent exactly this from happening.

Meanwhile, you can use maxctrl set server <name> drain to more gracefully drain the server . The drain command is an older implementation and should probably be deprecated in favor of the explicit Drain state.

Comment by markus makela [ 2021-03-17 ]

Since the fix isn't simple and there's a better way to do the same thing, we won't be fixing this as we'll be deprecating maxctrl drain server in 2.6.

Generated at Thu Feb 08 04:21:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.