Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.1.19
-
Gentoo Linux, MariaDB Galera 10.1.19 - 3 nodes
-
10.1.20
Description
Hi,
we use MariaDB 10.1.19 in our Galera cluster based on 3 nodes.
Today, after midnight, our first Galera server hangs on lot of simple inserts into one non-important table.
There is a MySQL event running after midnight on all Galera nodes, which calls a procedure. This procedure fills some tables by some another CONNECT tables. These CONNECT tables are connected to remote MariaDB and this MariaDB was temporary down.
MySQL procedure exit with this error at 00:00:06:
2016-11-18 0:00:06 140300245776128 [ERROR] Event Scheduler: [root@localhost][project_cz.dbsync_updateMemAndMvwTables] Got error 174 '(2003) Can't connect to MySQL server on '10.234.4.28' (111 "Connection refused")' from CONNECT
|
2016-11-18 0:00:06 140300245776128 [Note] Event Scheduler: [root@localhost].[project_cz.dbsync_updateMemAndMvwTables] event execution failed.```
|
Immediatelly after that, all inserts into other tables hangs with status "query end".
After 2-3 minutes, databases was inaccessible with "Too many connections".
It hangs
I this, that after error from CONNECT engine, some Galera-state was not cleaned properly and Galera locks for any changes.
I attach ZIP file with complete MySQL processlist and status after ~ 1 minute after hangs.
BUT, on second Galera node, in the same moment, there was hang queries from these procedure:
+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
|
| 1 | system user | | | Sleep | 2232 | wsrep aborter idle | | 0.000 |
|
| 2 | system user | | | Sleep | 52 | committed 1112277 | | 0.000 |
|
| 4 | event_scheduler | localhost | | Daemon | 2196 | Waiting for next activation | | 0.000 |
|
| 5 | system user | | project_cz | Sleep | 50 | Waiting for table metadata lock | TRUNCATE TABLE connect__dbsync_localhost_mem | 0.000 |
|
| 6 | system user | | | Sleep | 50 | applied write set 1112281 | | 0.000 |
|
| 7 | system user | | | Sleep | 50 | applied write set 1112280 | | 0.000 |
|
| 8 | system user | | | Sleep | 50 | applied write set 1112282 | | 0.000 |
|
| 9 | system user | | | Sleep | 52 | committed 1112276 | | 0.000 |
|
| 3169 | root | localhost | project_cz | Connect | 56 | query end | INSERT INTO connect__dbsync_localhost_mem SELECT * FROM dbsync_data_vw | 0.000 |
|
| 3244 | monitoring | localhost | | Query | 0 | init | show processlist | 0.000 |
|
| 3245 | unauthenticated user | localhost | | Connect | 0 | Reading from net | | 0.000 |
|
+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
|
And third node, it was OK..
+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
|
| 1 | system user | | | Sleep | 813926 | wsrep aborter idle | | 0.000 |
|
| 2 | system user | | | Sleep | 51 | committed 1112309 | | 0.000 |
|
| 4 | event_scheduler | localhost | | Daemon | 813923 | Waiting on empty queue | | 0.000 |
|
| 5 | system user | | | Sleep | 51 | committed 1112313 | | 0.000 |
|
| 6 | system user | | | Sleep | 51 | committed 1112312 | | 0.000 |
|
| 7 | system user | | | Sleep | 51 | committed 1112310 | | 0.000 |
|
| 8 | system user | | | Sleep | 51 | committed 1112308 | | 0.000 |
|
| 9 | system user | | | Sleep | 51 | committed 1112311 | | 0.000 |
|
| 1023837 | monitoring | localhost | | Query | 0 | init | show processlist | 0.000 |
|
+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
|
We save MySQL processlist/statuses every 5-10 seconds on all nodes. Attached ZIP is only from first Galera node. When you need also same ZIP from same time from second and third server, tell me.
I think, it's a bug with unexpected unhandled state.
Thank you for your help.