[MDEV-11306] Galera hangs on INSERT queries after CONNECT table unable to connect to remove server Created: 2016-11-18  Updated: 2017-06-26  Resolved: 2017-06-26

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.19
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Ján Regeš Assignee: Sachin Setiya (Inactive)
Resolution: Incomplete Votes: 1
Labels: galera, need_feedback
Environment:

Gentoo Linux, MariaDB Galera 10.1.19 - 3 nodes


Attachments: Zip Archive MariaDB_Galera_hangs_2016-11-18.zip    
Sprint: 10.1.20

 Description   

Hi,

we use MariaDB 10.1.19 in our Galera cluster based on 3 nodes.

Today, after midnight, our first Galera server hangs on lot of simple inserts into one non-important table.

There is a MySQL event running after midnight on all Galera nodes, which calls a procedure. This procedure fills some tables by some another CONNECT tables. These CONNECT tables are connected to remote MariaDB and this MariaDB was temporary down.

MySQL procedure exit with this error at 00:00:06:

2016-11-18  0:00:06 140300245776128 [ERROR] Event Scheduler: [root@localhost][project_cz.dbsync_updateMemAndMvwTables] Got error 174 '(2003) Can't connect to MySQL server on '10.234.4.28' (111 "Connection refused")' from CONNECT
2016-11-18  0:00:06 140300245776128 [Note] Event Scheduler: [root@localhost].[project_cz.dbsync_updateMemAndMvwTables] event execution failed.```

Immediatelly after that, all inserts into other tables hangs with status "query end".

After 2-3 minutes, databases was inaccessible with "Too many connections".

It hangs

I this, that after error from CONNECT engine, some Galera-state was not cleaned properly and Galera locks for any changes.

I attach ZIP file with complete MySQL processlist and status after ~ 1 minute after hangs.

BUT, on second Galera node, in the same moment, there was hang queries from these procedure:

+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
| Id   | User                 | Host      | db         | Command | Time | State                           | Info                                                                   | Progress |
+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+
| 1    | system user          |           |            | Sleep   | 2232 | wsrep aborter idle              |                                                                        | 0.000    |
| 2    | system user          |           |            | Sleep   | 52   | committed 1112277               |                                                                        | 0.000    |
| 4    | event_scheduler      | localhost |            | Daemon  | 2196 | Waiting for next activation     |                                                                        | 0.000    |
| 5    | system user          |           | project_cz | Sleep   | 50   | Waiting for table metadata lock | TRUNCATE TABLE connect__dbsync_localhost_mem                           | 0.000    |
| 6    | system user          |           |            | Sleep   | 50   | applied write set 1112281       |                                                                        | 0.000    |
| 7    | system user          |           |            | Sleep   | 50   | applied write set 1112280       |                                                                        | 0.000    |
| 8    | system user          |           |            | Sleep   | 50   | applied write set 1112282       |                                                                        | 0.000    |
| 9    | system user          |           |            | Sleep   | 52   | committed 1112276               |                                                                        | 0.000    |
| 3169 | root                 | localhost | project_cz | Connect | 56   | query end                       | INSERT INTO connect__dbsync_localhost_mem SELECT * FROM dbsync_data_vw | 0.000    |
| 3244 | monitoring           | localhost |            | Query   | 0    | init                            | show processlist                                                       | 0.000    |
| 3245 | unauthenticated user | localhost |            | Connect | 0    | Reading from net                |                                                                        | 0.000    |
+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+

And third node, it was OK..

+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
| Id      | User            | Host      | db | Command | Time   | State                  | Info             | Progress |
+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+
| 1       | system user     |           |    | Sleep   | 813926 | wsrep aborter idle     |                  | 0.000    |
| 2       | system user     |           |    | Sleep   | 51     | committed 1112309      |                  | 0.000    |
| 4       | event_scheduler | localhost |    | Daemon  | 813923 | Waiting on empty queue |                  | 0.000    |
| 5       | system user     |           |    | Sleep   | 51     | committed 1112313      |                  | 0.000    |
| 6       | system user     |           |    | Sleep   | 51     | committed 1112312      |                  | 0.000    |
| 7       | system user     |           |    | Sleep   | 51     | committed 1112310      |                  | 0.000    |
| 8       | system user     |           |    | Sleep   | 51     | committed 1112308      |                  | 0.000    |
| 9       | system user     |           |    | Sleep   | 51     | committed 1112311      |                  | 0.000    |
| 1023837 | monitoring      | localhost |    | Query   | 0      | init                   | show processlist | 0.000    |
+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+

We save MySQL processlist/statuses every 5-10 seconds on all nodes. Attached ZIP is only from first Galera node. When you need also same ZIP from same time from second and third server, tell me.

I think, it's a bug with unexpected unhandled state.

Thank you for your help.



 Comments   
Comment by Sachin Setiya (Inactive) [ 2016-12-02 ]

Hi,

I am not able to reproduce it. I have defined a event which calls a procedure which fill table t2 from CONNECT table r_t1.
CREATE PROCEDURE work()
INSERT INTO t2 select * from r_t1;
CREATE EVENT test_event_03
ON SCHEDULE EVERY 1 SECOND
STARTS CURRENT_TIMESTAMP
DO
call work()

And one more event which simply inserts into t3.
CREATE EVENT test_event_01
ON SCHEDULE EVERY 1 SECOND
STARTS CURRENT_TIMESTAMP
DO
insert into t3 values(1);
I have tried turning on and off remote connect server, but All three nodes works fine.
It will be good if you can provide reproduce-able test cases.

Comment by Sergei Golubchik [ 2017-06-26 ]

No feedback for half a year. Closing...

Generated at Thu Feb 08 07:48:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.