Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
10.1.13
-
Ubuntu Xenial [using Wily packages], Ubuntu Trusty
Description
(Sorry, this is a bit longish.)
Basic setup: Three vservers ("vserver03", "vserver04", "vserver05"), using internal IPv4 addresses 172.16.0.[123] with the following settings (excerpt):
wsrep_node_name=vserver04
|
wsrep_node_address=172.16.0.1
|
wsrep_cluster_address="gcomm://"
|
##other nodes have 'wsrep_cluster_address="gcomm://172.16.0.1"' or the like in place
|
wsrep_provider_options="gmcast.listen_addr=tcp://172.16.0.1:4567; gcache.size=128M; gcache.name=/tmp/galera.cache; gcache.page_size=128M"
|
##other nodes use 'gmcast.listen_addr=tcp://172.16.0.[23]:4567' here
|
(Note: The above works when using the vservers' public IP addresses, so the basic mariadb/galera configuration appears to be ok.)
The above obviously implies that the separate machines cannot contact each other directly; in order to remedy this, stunnel is used as shown below (in short, stunnel instances on vserver03/vserver05 would listen on ports 172.16.0.1:[4444,4567,4568] whereas another stunnel instance on vserver04 would listen (client=no) on 172.16.0.[23]:... and connect (client=yes) to 172.16.0.1:.. in order to forward everything sent from vserver03/vserver05--all directions are covered, of course).
Example: After starting MariaDB on vserver04 using "service mysql start --wsrep-new-cluster", it listens on port 4567 using IP address 172.16.0.1:
[2016-04-13T16:06:55] root@vserver04:/etc/mysql# netstat -tulpn | grep 4567
|
tcp 0 0 172.16.0.1:4567 0.0.0.0:* LISTEN 32676/mysqld
|
tcp 0 0 172.16.0.3:4567 0.0.0.0:* LISTEN 20363/stunnel4
|
tcp 0 0 172.16.0.2:4567 0.0.0.0:* LISTEN 20363/stunnel4
|
However, waiting a few seconds and then starting MariaDB on vserver05 using "service mysql start", vserver04's /var/log/syslog shows the following:
Apr 13 16:06:44 vserver04 systemd[1]: Started MariaDB database server.
|
Apr 13 16:07:13 vserver04 stunnel: LOG5[56]: Service [vserver04-galera_p1-ipv6] accepted connection from 2a03:nnnn:nnnn:nnnn::1:52568
|
[...]
|
Apr 13 16:07:13 vserver04 stunnel: LOG6[56]: SSL accepted: new session negotiated
|
Apr 13 16:07:13 vserver04 stunnel: LOG6[56]: Negotiated TLSv1.2 ciphersuite ECDHE-RSA-AES256-GCM-SHA384 (256-bit encryption)
|
Apr 13 16:07:13 vserver04 stunnel: LOG6[56]: s_connect: connecting 172.16.0.1:4567
|
Apr 13 16:07:13 vserver04 stunnel: LOG5[56]: s_connect: connected 172.16.0.1:4567
|
Apr 13 16:07:13 vserver04 stunnel: LOG6[56]: persistence: 172.16.0.1:4567 cached
|
Apr 13 16:07:13 vserver04 stunnel: LOG5[56]: Service [vserver04-galera_p1-ipv6] connected remote server from 172.16.0.1:53170
|
Apr 13 16:07:13 vserver04 mysqld[26689]: 2016-04-13 16:07:13 140570907375360 [ERROR] WSREP: exception from gcomm, backend must be restarted: Trying to add self addr tcp://172.16.0.1:4567 to addr list (FATAL)
|
Apr 13 16:07:13 vserver04 mysqld[26689]: #011 at gcomm/src/gmcast.cpp:insert_address():762
|
Apr 13 16:07:13 vserver04 mysqld[26689]: 2016-04-13 16:07:13 140570907375360 [Note] WSREP: gcomm: terminating thread
|
Apr 13 16:07:13 vserver04 mysqld[26689]: 2016-04-13 16:07:13 140570907375360 [Note] WSREP: gcomm: joining thread
|
Apr 13 16:07:13 vserver04 mysqld[26689]: 2016-04-13 16:07:13 140570907375360 [Note] WSREP: gcomm: closing backend
|
Apr 13 16:07:13 vserver04 mysqld[26689]: 2016-04-13 16:07:13 140570907375360 [Note] WSREP: Forced PC close
|
Apr 13 16:07:13 vserver04 mysqld[26689]: 2016-04-13 16:07:13 140570907375360 [Note] WSREP: gcomm: closed
|
Apr 13 16:07:13 vserver04 mysqld[26689]: 2016-04-13 16:07:13 140570898982656 [ERROR] WSREP: gcs/src/gcs_core.cpp:core_handle_comp_msg():718: Malformed component message. Ignoring
|
Apr 13 16:07:17 vserver04 stunnel: LOG6[56]: SSL closed (SSL_read)
|
Apr 13 16:07:18 vserver04 stunnel: LOG5[57]: Service [vserver04-galera_p1-ipv6] accepted connection from 2a03:nnnn:nnnn:nnnn::1:52578
|
Apr 13 16:07:18 vserver04 stunnel: LOG6[57]: SSL accepted: previous session reused
|
Apr 13 16:07:18 vserver04 stunnel: LOG6[57]: s_connect: connecting 172.16.0.1:4567
|
Apr 13 16:07:18 vserver04 stunnel: LOG3[57]: s_connect: connect 172.16.0.1:4567: Connection refused (111)
|
Apr 13 16:07:18 vserver04 stunnel: LOG5[57]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
|
Apr 13 16:07:19 vserver04 stunnel: LOG5[58]: Service [vserver04-galera_p1-ipv6] accepted connection from 2a03:nnnn:nnnn:nnnn::1:52582
|
[...]
|
For some reason, the second node is unable to join. This might be related to the use of stunnel (which works elsewhere, btw), but the reported issue here is about the fact that the local instance should not be affected from this as follows:
At this point in time, MariaDB is still up and running on vserver04 (queries work flawlessly), but port 172.16.0.1:4567 is now closed and won't come back up (see the exemplary "Connection refused" error message on the bottom for all subsequent connection attempts from other nodes).
It is also not possible to cleanly shut down MariaDB anymore: It takes three minutes for "service mysql stop" to return on the command line, showing the following in /var/log/syslog:
Apr 13 16:08:32 vserver04 systemd[1]: Stopping MariaDB database server...
|
Apr 13 16:08:32 vserver04 mysqld[26689]: 2016-04-13 16:08:32 140571185154816 [Note] /usr/sbin/mysqld: Normal shutdown
|
Apr 13 16:08:32 vserver04 mysqld[26689]: 2016-04-13 16:08:32 140571185154816 [Note] WSREP: Stop replication
|
Apr 13 16:08:32 vserver04 mysqld[26689]: 2016-04-13 16:08:32 140571185154816 [Note] WSREP: Closing send monitor...
|
Apr 13 16:08:32 vserver04 mysqld[26689]: 2016-04-13 16:08:32 140571185154816 [Note] WSREP: Closed send monitor.
|
Apr 13 16:08:32 vserver04 mysqld[26689]: 2016-04-13 16:08:32 140571185154816 [Warning] WSREP: gcomm: backend already closed
|
[...]
|
Apr 13 16:10:02 vserver04 systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
|
Apr 13 16:11:33 vserver04 systemd[1]: mariadb.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
|
Apr 13 16:11:33 vserver04 systemd[1]: Stopped MariaDB database server.
|
Apr 13 16:11:33 vserver04 systemd[1]: mariadb.service: Unit entered failed state.
|
Apr 13 16:11:33 vserver04 systemd[1]: mariadb.service: Failed with result 'timeout'.
|
What's worse, this also leaves behind an instance of /usr/sbin/mysqld that still holds locks until you terminate it with "kill -9":
[2016-04-13T16:15:04] root@vserver04:/etc/mysql# ps axu | grep 26689
|
mysql 26689 0.1 2.8 1187736 172348 ? Ssl 16:06 0:01 /usr/sbin/mysqld
|
Trying to re-execute "service start mysql --wsrep-new-cluster" without killing the above yields
Apr 13 16:04:31 vserver04 mysqld[25641]: 2016-04-13 16:04:31 140137967790336 [ERROR] mysqld: Can't lock aria control file '/var/lib/mysql/aria_log_control' for exclusive
|
use, error: 11. Will retry for 30 seconds
|
[...]
|
Apr 13 16:05:02 vserver04 mysqld[25641]: 2016-04-13 16:05:02 140137967790336 [ERROR] Plugin 'Aria' init function returned error.
|
Apr 13 16:05:02 vserver04 mysqld[25641]: 2016-04-13 16:05:02 140137967790336 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
|
[...]
|
Apr 13 16:05:02 vserver04 mysqld[25641]: 2016-04-13 16:05:02 140137967790336 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
|
Apr 13 16:05:02 vserver04 mysqld[25641]: 2016-04-13 16:05:02 140137967790336 [Note] InnoDB: Retrying to lock the first data file
|
[...]
|
Apr 13 16:06:07 vserver04 mysqld[25641]: 2016-04-13 16:06:07 140137967790336 [ERROR] Aria engine is not enabled or did not start. The Aria engine must be enabled to continue as mysqld was configured with --with-aria-tmp-tables
|
Apr 13 16:06:07 vserver04 mysqld[25641]: 2016-04-13 16:06:07 140137967790336 [ERROR] Aborting
|
Apr 13 16:06:12 vserver04 systemd[1]: mariadb.service: Main process exited, code=killed, status=9/KILL
|
Apr 13 16:06:12 vserver04 systemd[1]: Failed to start MariaDB database server.
|
Apr 13 16:06:12 vserver04 systemd[1]: mariadb.service: Unit entered failed state.
|
Apr 13 16:06:12 vserver04 systemd[1]: mariadb.service: Failed with result 'signal'.
|
(The above is not tied to a certain vserver or the use of Ubuntu Xenial vs. Trusty--any variation of the above yields the very same results.)