Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
2.3.9
-
None
-
Centos 7, Percona XtraDB Cluster 5.7.26, MaxScale 2.3.9
Description
Markus, we have a potential customer that is having problems with MaxScale galeramon and readwritesplitter. It looks as though the transaction_replay is not functioning as expected in a Percona XtraDB Cluster (Galera 3).
[MaxScale]
|
threads=2 |
admin_host=0.0.0.0 |
|
[GaleraMonitor]
|
type=monitor
|
module=galeramon
|
servers=db1,db2,db3
|
user=maxscale
|
password=demo_password
|
available_when_donor=false |
monitor_interval=100 |
|
[Splitter]
|
type=service
|
router=readwritesplit
|
servers=db1,db2,db3
|
user=maxscale
|
password=demo_password
|
transaction_replay=true |
|
[SplitterListener]
|
type=listener
|
service=Splitter
|
protocol=MariaDBClient
|
port=3306 |
|
[db1]
|
type=server
|
address=10.10.10.10 |
port=3306 |
protocol=MariaDBBackend
|
|
[db2]
|
type=server
|
address=10.10.10.11 |
port=3306 |
protocol=MariaDBBackend
|
|
[db3]
|
type=server
|
address=10.10.10.12 |
port=3306 |
protocol=MariaDBBackend
|
I ran the following benchmark test:
sysbench /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=127.0.0.1 --mysql-user=dba --mysql-password=demo_password --mysql-db=sbtest --oltp-tables-count=2 --oltp-table-size=500000 --report-interval=5 --oltp-skip-trx=on --oltp-read-only=off --mysql-ignore-errors=1062 --rand-init=on --max-requests=0 --time=300 --threads=100 run |
maxctrl list servers
|
┌────────┬─────────────┬──────┬─────────────┬─────────────────────────┬──────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
|
│ db1 │ 10.10.10.10 │ 3306 │ 100 │ Slave, Synced, Running │ │
|
├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
|
│ db2 │ 10.10.10.11 │ 3306 │ 100 │ Master, Synced, Running │ │
|
├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
|
│ db3 │ 10.10.10.12 │ 3306 │ 100 │ Slave, Synced, Running │ │
|
└────────┴─────────────┴──────┴─────────────┴─────────────────────────┴──────┘
|
Then I stop mysql process on node 2 to simulate a server crash:
systemctl stop mysql
|
All connections drop from MaxScale to all nodes.
Then I get a fatal error from sysbench:
FATAL: `thread_run' function failed: /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua:103: db_query() failed |
Error in my_thread_global_end(): 21 threads didn't exit |
┌────────┬─────────────┬──────┬─────────────┬─────────────────────────┬──────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
|
│ db1 │ 10.10.10.10 │ 3306 │ 0 │ Master, Synced, Running │ │
|
├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
|
│ db2 │ 10.10.10.11 │ 3306 │ 0 │ Down │ │
|
├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
|
│ db3 │ 10.10.10.12 │ 3306 │ 0 │ Slave, Synced, Running │ │
|
└────────┴─────────────┴──────┴─────────────┴─────────────────────────┴──────┘
|
I get a couple of errors like this:
2019-07-16 04:01:54 error : Failed to execute query on server 'db2' ([10.10.10.11]:3306): Can't connect to MySQL server on '10.10.10.11' (115) |
in the maxscale.log but not much else.
I tried this test with mysqlslap and got the same results. The app should not know that any of the nodes in the cluster went down and it should certainly not drop the connection.
Any ideas?