Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
1.2.1, 1.3.0
-
None
Description
My understanding is that when user uses readwritesplit router and connects to MaxScale, slave failures should be handled gracefully. Please correct me if I'm wrong.
If that's the case, I can easily reproduce crashes using:
while true ; do sysbench --test=/root/sysbench/sysbench/tests/db/oltp.lua --num-threads=2 --max-requests=0 --max-time=0 --mysql-host=172.30.4.15 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=4008 --oltp-tables-count=32 --report-interval=10 --oltp-skip-trx=on --oltp-table-size=1000000 run ; done
on MaxScale 1.2.1 and 1.3.0 using attached maxscale.cnf.
It's enough to restart slave A and then, when it recovers, slave B and after one of the restarts see errors like below:
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.5: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 2
Report intermediate results every 10 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
ALERT: mysql_drv_query() returned error 2003 (Lost connection to backend server.) for query 'SELECT c FROM sbtest1 WHERE id=501119'
ALERT: mysql_drv_query() returned error 2003 (Lost connection to backend server.) for query 'SELECT c FROM sbtest8 WHERE id=502367'
Looking at Com_select on both slaves it seems like MaxScale picks one of them as an 'active' slave and its failure impacts the backend availability. Restart of the 'non-active' slave does not impact the application. Let me know if there's anything wrong with my setup - from what I remember this bit worked correctly in MaxScale 1.0.