[MDEV-11306] Galera hangs on INSERT queries after CONNECT table unable to connect to remove server - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.1.19
Fix Version/s: N/A
Component/s: Galera
Labels:
- galera
- need_feedback
Environment:
Gentoo Linux, MariaDB Galera 10.1.19 - 3 nodes

Sprint:
10.1.20

Description

Hi,

we use MariaDB 10.1.19 in our Galera cluster based on 3 nodes.

Today, after midnight, our first Galera server hangs on lot of simple inserts into one non-important table.

There is a MySQL event running after midnight on all Galera nodes, which calls a procedure. This procedure fills some tables by some another CONNECT tables. These CONNECT tables are connected to remote MariaDB and this MariaDB was temporary down.

MySQL procedure exit with this error at 00:00:06:

2016-11-18  0:00:06 140300245776128 [ERROR] Event Scheduler: [root@localhost][project_cz.dbsync_updateMemAndMvwTables] Got error 174 '(2003) Can't connect to MySQL server on '10.234.4.28' (111 "Connection refused")' from CONNECT

2016-11-18  0:00:06 140300245776128 [Note] Event Scheduler: [root@localhost].[project_cz.dbsync_updateMemAndMvwTables] event execution failed.```

Immediatelly after that, all inserts into other tables hangs with status "query end".

After 2-3 minutes, databases was inaccessible with "Too many connections".

It hangs

I this, that after error from CONNECT engine, some Galera-state was not cleaned properly and Galera locks for any changes.

I attach ZIP file with complete MySQL processlist and status after ~ 1 minute after hangs.

BUT, on second Galera node, in the same moment, there was hang queries from these procedure:

+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+

| Id   | User                 | Host      | db         | Command | Time | State                           | Info                                                                   | Progress |

+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+

| 1    | system user          |           |            | Sleep   | 2232 | wsrep aborter idle              |                                                                        | 0.000    |

| 2    | system user          |           |            | Sleep   | 52   | committed 1112277               |                                                                        | 0.000    |

| 4    | event_scheduler      | localhost |            | Daemon  | 2196 | Waiting for next activation     |                                                                        | 0.000    |

| 5    | system user          |           | project_cz | Sleep   | 50   | Waiting for table metadata lock | TRUNCATE TABLE connect__dbsync_localhost_mem                           | 0.000    |

| 6    | system user          |           |            | Sleep   | 50   | applied write set 1112281       |                                                                        | 0.000    |

| 7    | system user          |           |            | Sleep   | 50   | applied write set 1112280       |                                                                        | 0.000    |

| 8    | system user          |           |            | Sleep   | 50   | applied write set 1112282       |                                                                        | 0.000    |

| 9    | system user          |           |            | Sleep   | 52   | committed 1112276               |                                                                        | 0.000    |

| 3169 | root                 | localhost | project_cz | Connect | 56   | query end                       | INSERT INTO connect__dbsync_localhost_mem SELECT * FROM dbsync_data_vw | 0.000    |

| 3244 | monitoring           | localhost |            | Query   | 0    | init                            | show processlist                                                       | 0.000    |

| 3245 | unauthenticated user | localhost |            | Connect | 0    | Reading from net                |                                                                        | 0.000    |

+------+----------------------+-----------+------------+---------+------+---------------------------------+------------------------------------------------------------------------+----------+

And third node, it was OK..

+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+

| Id      | User            | Host      | db | Command | Time   | State                  | Info             | Progress |

+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+

| 1       | system user     |           |    | Sleep   | 813926 | wsrep aborter idle     |                  | 0.000    |

| 2       | system user     |           |    | Sleep   | 51     | committed 1112309      |                  | 0.000    |

| 4       | event_scheduler | localhost |    | Daemon  | 813923 | Waiting on empty queue |                  | 0.000    |

| 5       | system user     |           |    | Sleep   | 51     | committed 1112313      |                  | 0.000    |

| 6       | system user     |           |    | Sleep   | 51     | committed 1112312      |                  | 0.000    |

| 7       | system user     |           |    | Sleep   | 51     | committed 1112310      |                  | 0.000    |

| 8       | system user     |           |    | Sleep   | 51     | committed 1112308      |                  | 0.000    |

| 9       | system user     |           |    | Sleep   | 51     | committed 1112311      |                  | 0.000    |

| 1023837 | monitoring      | localhost |    | Query   | 0      | init                   | show processlist | 0.000    |

+---------+-----------------+-----------+----+---------+--------+------------------------+------------------+----------+

We save MySQL processlist/statuses every 5-10 seconds on all nodes. Attached ZIP is only from first Galera node. When you need also same ZIP from same time from second and third server, tell me.

I think, it's a bug with unexpected unhandled state.

Thank you for your help.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Hide
MariaDB_Galera_hangs_2016-11-18.zip
2016-11-18 00:26
51 kB
Ján Regeš
Extracting archive...
Show
MariaDB_Galera_hangs_2016-11-18.zip
2016-11-18 00:26
51 kB
Ján Regeš

Activity

People

Assignee:: Sachin Setiya (Inactive)

Reporter:: Ján Regeš

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2016-11-18 00:43

Updated:: 2017-06-26 19:57

Resolved:: 2017-06-26 19:57

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.