Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11306

Galera hangs on INSERT queries after CONNECT table unable to connect to remove server

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.1.19
    • N/A
    • Galera
    • Gentoo Linux, MariaDB Galera 10.1.19 - 3 nodes
    • 10.1.20

    Description

      Hi,

      we use MariaDB 10.1.19 in our Galera cluster based on 3 nodes.

      Today, after midnight, our first Galera server hangs on lot of simple inserts into one non-important table.

      There is a MySQL event running after midnight on all Galera nodes, which calls a procedure. This procedure fills some tables by some another CONNECT tables. These CONNECT tables are connected to remote MariaDB and this MariaDB was temporary down.

      MySQL procedure exit with this error at 00:00:06:

      2016-11-18  0:00:06 140300245776128 [ERROR] Event Scheduler: [root@localhost][project_cz.dbsync_updateMemAndMvwTables] Got error 174 '(2003) Can't connect to MySQL server on '10.234.4.28' (111 "Connection refused")' from CONNECT
      2016-11-18  0:00:06 140300245776128 [Note] Event Scheduler: [root@localhost].[project_cz.dbsync_updateMemAndMvwTables] event execution failed.```
      

      Immediatelly after that, all inserts into other tables hangs with status "query end".

      After 2-3 minutes, databases was inaccessible with "Too many connections".

      It hangs

      I this, that after error from CONNECT engine, some Galera-state was not cleaned properly and Galera locks for any changes.

      I attach ZIP file with complete MySQL processlist and status after ~ 1 minute after hangs.

      BUT, on second Galera node, in the same moment, there was hang queries from these procedure:

      +------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
      | Id   | User                 | Host      | db         | Command | Time | State                           | Info                                                                   | Progress |
      +------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
      | 1    | system user          |           |            | Sleep   | 2232 | wsrep aborter idle              |                                                                        | 0.000    |
      | 2    | system user          |           |            | Sleep   | 52   | committed 1112277               |                                                                        | 0.000    |
      | 4    | event_scheduler      | localhost |            | Daemon  | 2196 | Waiting for next activation     |                                                                        | 0.000    |
      | 5    | system user          |           | project_cz | Sleep   | 50   | Waiting for table metadata lock | TRUNCATE TABLE connect__dbsync_localhost_mem                           | 0.000    |
      | 6    | system user          |           |            | Sleep   | 50   | applied write set 1112281       |                                                                        | 0.000    |
      | 7    | system user          |           |            | Sleep   | 50   | applied write set 1112280       |                                                                        | 0.000    |
      | 8    | system user          |           |            | Sleep   | 50   | applied write set 1112282       |                                                                        | 0.000    |
      | 9    | system user          |           |            | Sleep   | 52   | committed 1112276               |                                                                        | 0.000    |
      | 3169 | root                 | localhost | project_cz | Connect | 56   | query end                       | INSERT INTO connect__dbsync_localhost_mem SELECT * FROM dbsync_data_vw | 0.000    |
      | 3244 | monitoring           | localhost |            | Query   | 0    | init                            | show processlist                                                       | 0.000    |
      | 3245 | unauthenticated user | localhost |            | Connect | 0    | Reading from net                |                                                                        | 0.000    |
      +------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
      

      And third node, it was OK..

      +---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
      | Id      | User            | Host      | db | Command | Time   | State                  | Info             | Progress |
      +---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
      | 1       | system user     |           |    | Sleep   | 813926 | wsrep aborter idle     |                  | 0.000    |
      | 2       | system user     |           |    | Sleep   | 51     | committed 1112309      |                  | 0.000    |
      | 4       | event_scheduler | localhost |    | Daemon  | 813923 | Waiting on empty queue |                  | 0.000    |
      | 5       | system user     |           |    | Sleep   | 51     | committed 1112313      |                  | 0.000    |
      | 6       | system user     |           |    | Sleep   | 51     | committed 1112312      |                  | 0.000    |
      | 7       | system user     |           |    | Sleep   | 51     | committed 1112310      |                  | 0.000    |
      | 8       | system user     |           |    | Sleep   | 51     | committed 1112308      |                  | 0.000    |
      | 9       | system user     |           |    | Sleep   | 51     | committed 1112311      |                  | 0.000    |
      | 1023837 | monitoring      | localhost |    | Query   | 0      | init                   | show processlist | 0.000    |
      +---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
      

      We save MySQL processlist/statuses every 5-10 seconds on all nodes. Attached ZIP is only from first Galera node. When you need also same ZIP from same time from second and third server, tell me.

      I think, it's a bug with unexpected unhandled state.

      Thank you for your help.

      Attachments

        Activity

          People

            sachin.setiya.007 Sachin Setiya (Inactive)
            jan.reges Ján Regeš
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.