Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-4777

Maxscale crash due to systemd timeout

    XMLWordPrintable

Details

    Description

      This happened during a usage spike, ooms occurred on the underlying galera nodes during this time as well.

      maxscale 1 lost sight to all db nodes

      2023-09-25 13:19:18.614   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:19:32.314   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:23:00.946   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:23:16.987   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server at 'handshake: reading initial communication packet', system error: 110
      2023-09-25 13:23:45.717   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:23:56.736   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:24:11.889   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:24:20.371   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:24:30.386   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:26:20.262   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:26:31.049   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:27:03.119   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:27:22.378   error  : (mxs_mysql_real_connect): Failed to set latin1 character set: Lost connection to server during query
      2023-09-25 13:27:41.054   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:28:14.823   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:28:47.975   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:28:58.986   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB03' ([10.225.27.183]:3306): Lost connection to server during query
      2023-09-25 13:29:09.953   error  : (mxs_mysql_real_connect): Failed to set latin1 character set: Lost connection to server during query
      2023-09-25 13:32:34.385   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 13:41:44.306   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 13:42:05.403   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 13:42:19.746   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 13:42:30.649   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 13:42:44.061   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 13:43:09.442   error  : (mxs_mysql_real_connect): Failed to set latin1 character set: Lost connection to server during query
      2023-09-25 13:43:18.010   error  : (mxs_mysql_real_connect): Failed to set latin1 character set: Lost connection to server during query
      2023-09-25 13:43:18.027   notice : (log_state_change): Server changed state: L1LPVERGENTDB04[10.225.27.118:3306]: slave_down. [Slave, Synced, Running] -> [Down]
      2023-09-25 13:43:18.027   error  : (983789) [readwritesplit] (rwsplit-service); (handleError): Server 'L1LPVERGENTDB04' was lost in the middle of a resultset, cannot continue the session: #HY000: Lost connection to backend server: connection closed by peer (L1LPVERGENTDB04: Generated event)
      2023-09-25 13:43:22.377   notice : (log_state_change): Server changed state: L1LPVERGENTDB04[10.225.27.118:3306]: slave_up. [Down] -> [Slave, Synced, Running]
      2023-09-25 14:07:29.827   error  : (mon_report_query_error): Failed to execute query on server 'L1LPVERGENTDB04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 14:08:10.003   notice : (log_state_change): Server changed state: L1LPVERGENTDB04[10.225.27.118:3306]: slave_down. [Slave, Synced, Running] -> [Down]
      2023-09-25 14:09:06.851   error  : (985612) (L1LPVERGENTDB03); (socket_write): Write to Backend DCB 10.225.27.183 in state DCB::State::POLLING failed: 104, Connection reset by peer
      2023-09-25 14:09:30.579   error  : [galeramon] (post_tick): There are no cluster members
      

      Maxscales 2 and 3 became unresponsive and hit a signal 6

      2023-09-25 13:24:34.499   notice : (log_state_change): Server changed state: L1LPVERGENTDB03[10.225.27.183:3306]: new_slave. [Running] -> [Slave, Synced, Running]
      2023-09-25 13:25:29.198   warning: (notify_systemd_watchdog): Thread 'Worker-01' has not reported back in 30 seconds.
      2023-09-25 13:25:59.198   warning: (notify_systemd_watchdog): Thread 'Worker-01' has not reported back in 30 seconds.
      alert  : MaxScale 6.4.10 received fatal signal 6. Commit ID: e21972b152407ccc41a5c835c91774578c64c472 System name: Linux Release string: PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
       
       
      2023-09-25 13:25:59.268   alert  : (sigfatal_handler): MaxScale 6.4.10 received fatal signal 6. Commit ID: e21972b152407ccc41a5c835c91774578c64c472 System name: Linux Release string: PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
      2023-09-25 13:25:59.268   alert  : (sigfatal_handler): Statement currently being classified: none/unknown
      2023-09-25 13:25:59.268   notice : (sigfatal_handler): For a more detailed stacktrace, install GDB and add 'debug=gdb-stacktrace' under the [maxscale] section.
        /lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x56) [0x7f0e0895dd56]: <binutils not installed>
        /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x104) [0x7f0e09412b34]: <binutils not installed>
        /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x4f) [0x7f0e09412e9f]: <binutils not installed>
        /usr/bin/maxscale(main+0x1c87) [0x55bb71fb9fe7]: <binutils not installed>
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f0e08885d0a]: <binutils not installed>
        /usr/bin/maxscale(_start+0x2a) [0x55bb71fba5ea]: <binutils not installed>
      alert  : Writing core dump.
       
       
      MariaDB MaxScale  /var/log/maxscale/maxscale.log  Mon Sep 25 14:26:07 2023
      ----------------------------------------------------------------------------
      2023-09-25 14:26:07   notice : (WatchdogNotifier): The systemd watchdog is Enabled. Internal timeout = 30s
      

      We are trying to determine why maxscale would become unresponsive after issues on 1 maxscale node and OOMs on the monitored DBs

      Attachments

        Activity

          People

            markus makela markus makela
            bbancroft Bryan Bancroft (Inactive)
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.