Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-3441

Maxscale loses connection to master when mysqldump is ran

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Not a Bug
    • Affects Version/s: 2.5.5
    • Fix Version/s: N/A
    • Component/s: Monitor
    • Labels:
      None
    • Environment:
      Debian buster with 1 Maxscale host, 1 master mariadb and 1 slave mariadb.
      Maxscale 2.5.5
      DB version: mysqldump Ver 10.17 Distrib 10.4.15-MariaDB, for Linux (x86_64)

      Description

      Hi, we have a farily large database, 500G, and when we run mysqldump against maxscale host, after about 30 seconds it exits with an error.
      Maxscale prints this:
      2021-03-13 14:40:26 error : Monitor timed out when connecting to server dbA[xx.xx.xx.01:3370] : ''
      2021-03-13 14:40:26 notice : Server changed state: dbA[xx.xx.xx.01:3370]: master_down. [Master, Running] -> [Down]
      2021-03-13 14:40:33 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|
      2021-03-13 14:40:33 notice : Executed monitor script on event 'master_down'. Script was: '/opt/runtime/maxscaleNotification.sh'
      2021-03-13 14:40:33 warning: [mariadbmon] Master has failed. If master does not return in 4 monitor tick(s), failover begins.
      2021-03-13 14:40:33 error : (46) [readwritesplit] (rwSplitter) Lost connection to the master server, closing session. Lost connection to master server while waiting for a result. Connection has been idle for 52 seconds. Error caused by: #HY000: (Generated event) backend server: connection closed by peer. Last close reason: <none>. Last error:
      2021-03-13 14:40:33 error : (42) [readwritesplit] (rwSplitter) Server 'dbA' was lost in the middle of a resultset, cannot continue the session: #HY000: (Generated event) backend server: connection closed by peer
      2021-03-13 14:40:33 error : (54) [readwritesplit] (rwSplitter) Couldn't find suitable Master from 2 candidates.
      2021-03-13 14:40:33 notice : Server changed state: dbA[xx.xx.xx.01:3370]: master_up. [Down] -> [Master, Running]
      2021-03-13 14:40:33 error : (54) (rwSplitter) Failed to create new router session for service 'rwSplitter'. See previous errors for more details.
      2021-03-13 14:40:33 error : (54) [mariadbclient] Failed to create session for 'someuser'@'::ffff:xx.xx.xx.01'.
      2021-03-13 14:40:36 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|INFO SignalAccount - Config file is in use by another instance, waiting…

      At the same time, master mariadb prints:
      2021-03-13 9:40:20 10 [Warning] Aborted connection 10 to db: 'unconnected' user: 'orchestrator' host: 'maxScaleHostIP' (Got an error reading communication packets)
      2021-03-13 9:40:23 40 [Warning] Aborted connection 40 to db: 'unconnected' user: 'unauthenticated' host: 'maxScaleHostIP' (This connection closed normally without authentication)
      2021-03-13 9:40:26 41 [Warning] Aborted connection 41 to db: 'unconnected' user: 'unauthenticated' host: 'maxScaleHostIP' (This connection closed normally without authentication)

      After further testing, it looks like behavior is the same whether mysqldump is ran against individual mariadb instances or even if mariabackup tool is used. In either case maxscale monitor complains that the master is down.

      Here is the maxscale log while running mariabackup
      2021-03-13 15:34:29 error : Monitor timed out when connecting to server dbA[10.68.200.107:3370] : ''
      2021-03-13 15:34:29 notice : Server changed state: dbA[xx.xx.xx.01:3370]: slave_down. [Slave, Running] -> [Down]
      2021-03-13 15:34:32 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|1615649671531
      2021-03-13 15:34:32 notice : Executed monitor script on event 'slave_down'. Script was: '/opt/runtime/maxscaleNotification.sh'
      2021-03-13 15:34:38 notice : Server 'dbA' charset: latin1
      2021-03-13 15:34:38 notice : Server changed state: dbA[xx.xx.xx.01:3370]: slave_up. [Down] -> [Slave, Running]
      2021-03-13 15:34:41 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|1615649680507
      2021-03-13 15:34:41 notice : Executed monitor script on event 'slave_up'. Script was: '/opt/runtime/maxscaleNotification.sh'
      2021-03-13 15:35:34 error : Monitor timed out when connecting to server dbA[xx.xx.xx.01:3370] : ''
      2021-03-13 15:35:34 notice : Server changed state: dbA[xx.xx.xx.01:3370]: slave_down. [Slave, Running] -> [Down]
      2021-03-13 15:35:36 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|1615649736263
      2021-03-13 15:35:36 notice : Executed monitor script on event 'slave_down'. Script was: '/opt/runtime/maxscaleNotification.sh'
      2021-03-13 15:36:00 notice : Server changed state: dbA[xx.xx.xx.01:3370]: slave_up. [Down] -> [Slave, Running]
      2021-03-13 15:36:03 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|1615649762662
      2021-03-13 15:36:03 notice : Executed monitor script on event 'slave_up'. Script was: '/opt/runtime/maxscaleNotification.sh'
      2021-03-13 15:36:12 error : Monitor timed out when connecting to server dbA[xx.xx.xx.01:3370] : ''
      2021-03-13 15:36:12 notice : Server changed state: dbA[xx.xx.xx.01:3370]: slave_down. [Slave, Running] -> [Down]
      2021-03-13 15:36:14 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|1615649774428
      2021-03-13 15:36:14 notice : Executed monitor script on event 'slave_down'. Script was: '/opt/runtime/maxscaleNotification.sh'
      2021-03-13 15:36:38 notice : Server changed state: dbA[xx.xx.xx.01:3370]: slave_up. [Down] -> [Slave, Running]
      2021-03-13 15:36:41 notice : /opt/runtime/maxscaleNotification.sh: OK|nil|1615649800784
      2021-03-13 15:36:41 notice : Executed monitor script on event 'slave_up'. Script was: '/opt/runtime/maxscaleNotification.sh'

      UPDATE: I updated to maxscale 2.5.9 and i am getting a bit different error:
      2021-03-13 15:49:55 error : Monitor timed out when connecting to server dbA[10.68.200.107:3370] : 'Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 110'
      2021-03-13 15:49:55 notice : Server changed state: dbA[xx.xx.xx.01:3370]: slave_down. [Slave, Running] -> [Down]

      Any suggestions ?

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            vitalyb Vitaly
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: