[MDEV-12369] Crash on idle Galera node (libgalera_smm, libssl, libcrypto) Created: 2017-03-27 Updated: 2019-05-20 Resolved: 2019-05-20 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, SSL |
| Affects Version/s: | 10.1.22 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Ján Regeš | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | Crash, Galera | ||
| Environment: |
Gentoo, kernel 4.4.6, 8x Intel Xeon E5-2643 0 @ 3.30GHz, 8GB RAM, 3 Galera nodes on the same stable network (no geo-replication) |
||
| Attachments: |
|
| Description |
|
Hi, today morning at 08:10:24 crashed one node from 3-node Galera cluster. All 3 nodes were very idle, in average about 10 selects/s and 1 insert or update/s. CPU, RAM, IO, all were idle. Below i attach crash log. When you need it, I can send my.cnf from all 3 nodes. Thank you for your support.
|
| Comments |
| Comment by Daniel Black [ 2017-03-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
A config file of the crashed node would be great - assuming all nodes are similar with respect to galera configuration. This looks like multicast network right? Did other nodes crash in the same way? Was there anything odd in the logs of the other servers? Also what galera version do you have? What OpenSSL build? Do you have debug symbols for galera and/or openssl to be able to map this stacktrace to line numbers? Can you tell what update/insert was happening (from binary logs or otherwise)? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Ján Regeš [ 2017-03-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi Daniel, i attach my.cnf and other requested information. Config file: 2017-03-27_ab-arbitrator-crash_my.cnf I have no debug symbols for now. For the further debugging, I will activate "debug" USE flag to MariaDB in Gentoo and reinstall all 3 nodes. Other 2 nodes (our names: master+slave) worked fine after crash of third node (our name: arbitrator). Just for clarification, all 3 nodes are fully-featured MariaDB data instances with multi-master (there is no dummy "arbitrator node"). Log from our "master" server from the time of third node crash below. It looks like a SSL error?
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2017-03-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
FEATURES=splitdebug SSL seems a like factor - the crash was in ssl3_dispatch_alert Can you just validate your ssl certs with gnutl-cli or the openssl s_client/s_server against each other? Having said that, it shouldn't crash regardless of the state of the certs. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Ján Regeš [ 2017-03-28 ] | ||||||||||||||||||||||||||||||||||||||||
|
All 3 nodes have same, identical (shared) certificate generated by commands below. It looks fine. In the past, we used wan-replication, that was a motive to SSL for replication traffic. Now are all nodes on the same LAN network, so I will remove SSL.
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2019-05-20 ] | ||||||||||||||||||||||||||||||||||||||||
|
This does not look like a Galera problem rather a external library problem. |