Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2603

MaxScale causes connections to break in Percona PXC Cluster

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.9
    • Fix Version/s: 2.4.1
    • Component/s: galeramon, readwritesplit
    • Labels:
      None
    • Environment:
      Centos 7, Percona XtraDB Cluster 5.7.26, MaxScale 2.3.9

      Description

      Markus, we have a potential customer that is having problems with MaxScale galeramon and readwritesplitter. It looks as though the transaction_replay is not functioning as expected in a Percona XtraDB Cluster (Galera 3).

      [MaxScale]
      threads=2
      admin_host=0.0.0.0
       
      [GaleraMonitor]
      type=monitor
      module=galeramon
      servers=db1,db2,db3
      user=maxscale
      password=demo_password
      available_when_donor=false
      monitor_interval=100
       
      [Splitter]
      type=service
      router=readwritesplit
      servers=db1,db2,db3
      user=maxscale
      password=demo_password
      transaction_replay=true
       
      [SplitterListener]
      type=listener
      service=Splitter
      protocol=MariaDBClient
      port=3306
       
      [db1]
      type=server
      address=10.10.10.10
      port=3306
      protocol=MariaDBBackend
       
      [db2]
      type=server
      address=10.10.10.11
      port=3306
      protocol=MariaDBBackend
       
      [db3]
      type=server
      address=10.10.10.12
      port=3306
      protocol=MariaDBBackend
      

      I ran the following benchmark test:

      sysbench /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=127.0.0.1 --mysql-user=dba --mysql-password=demo_password --mysql-db=sbtest --oltp-tables-count=2 --oltp-table-size=500000 --report-interval=5 --oltp-skip-trx=on --oltp-read-only=off --mysql-ignore-errors=1062 --rand-init=on --max-requests=0 --time=300 --threads=100 run
      

      maxctrl list servers

      ┌────────┬─────────────┬──────┬─────────────┬─────────────────────────┬──────┐
      │ Server │ Address     │ Port │ Connections │ State                   │ GTID │
      ├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
      │ db1    │ 10.10.10.10 │ 3306 │ 100         │ Slave, Synced, Running  │      │
      ├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
      │ db2    │ 10.10.10.11 │ 3306 │ 100         │ Master, Synced, Running │      │
      ├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
      │ db3    │ 10.10.10.12 │ 3306 │ 100         │ Slave, Synced, Running  │      │
      └────────┴─────────────┴──────┴─────────────┴─────────────────────────┴──────┘
      

      Then I stop mysql process on node 2 to simulate a server crash:

      systemctl stop mysql

      All connections drop from MaxScale to all nodes.

      Then I get a fatal error from sysbench:

      FATAL: `thread_run' function failed: /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua:103: db_query() failed
      Error in my_thread_global_end(): 21 threads didn't exit
      

      ┌────────┬─────────────┬──────┬─────────────┬─────────────────────────┬──────┐
      │ Server │ Address     │ Port │ Connections │ State                   │ GTID │
      ├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
      │ db1    │ 10.10.10.10 │ 3306 │ 0           │ Master, Synced, Running │      │
      ├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
      │ db2    │ 10.10.10.11 │ 3306 │ 0           │ Down                    │      │
      ├────────┼─────────────┼──────┼─────────────┼─────────────────────────┼──────┤
      │ db3    │ 10.10.10.12 │ 3306 │ 0           │ Slave, Synced, Running  │      │
      └────────┴─────────────┴──────┴─────────────┴─────────────────────────┴──────┘
      

      I get a couple of errors like this:

      2019-07-16 04:01:54   error  : Failed to execute query on server 'db2' ([10.10.10.11]:3306): Can't connect to MySQL server on '10.10.10.11' (115)
      

      in the maxscale.log but not much else.

      I tried this test with mysqlslap and got the same results. The app should not know that any of the nodes in the cluster went down and it should certainly not drop the connection.

      Any ideas?

        Attachments

          Activity

            People

            Assignee:
            markus makela markus makela
            Reporter:
            toddstoffel Todd Stoffel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: