Details
-
Bug
-
Status: Closed (View Workflow)
-
Trivial
-
Resolution: Fixed
-
10.0.26-galera
-
None
-
Ubuntu 14.04
Description
When MariaDB starts or - more frequently - restarts and needs a state transfer, there is a recurring issue when port 4567 is already taken by a stray socat instance.
I've solved this by adding "killall -9 socat" to the start) section of the init script. It works for me, but it's of course not an universal solution. My init script hack is overwritten by MariaDB upgrades so I'd like to have this issue solved upstream.
What do you think would be the correct solution? I'm thinking pgrep -U mysql socat, but that may still not be robust enough.
Attachments
Issue Links
- relates to
-
MDEV-14394 Nodes can't join with socat Address already in use
-
- Closed
-
- links to
Activity
the same Error was observed on when joining new node to the cluster with srver 10.2.12
on CentOS 7.4
2018-01-22 15:25:21 140167331440384 [Note] WSREP: STATE EXCHANGE: got state msg: b1f7499a-ff77-11e7-85b9-ee24c1b0321e from 1 (t4w5)
2018-01-22 15:25:21 140167331440384 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 253,
members = 1/2 (joined/total),
act_id = 31,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = ca2bb6d4-ff5d-11e7-8871-7f48103f96c0
2018-01-22 15:25:21 140167331440384 [Note] WSREP: Flow-control interval: [23, 23]
2018-01-22 15:25:21 140167331440384 [Note] WSREP: Trying to continue unpaused monitor
2018-01-22 15:25:21 140167331440384 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 31)
2018-01-22 15:25:21 140167538136832 [Note] WSREP: State transfer required:
Group state: ca2bb6d4-ff5d-11e7-8871-7f48103f96c0:31
Local state: 00000000-0000-0000-0000-000000000000:-1
2018-01-22 15:25:21 140167538136832 [Note] WSREP: New cluster view: global state: ca2bb6d4-ff5d-11e7-8871-7f48103f96c0:31, view# 254: Primary, number of nodes: 2, my index: 0, protocol version 3
2018-01-22 15:25:21 140167538136832 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-01-22 15:25:21 140167323047680 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.195' --datadir '/var/lib/mysql/' --parent '14715' '' '
WSREP_SST: [INFO] Streaming with xbstream (20180122 15:25:21.348)
WSREP_SST: [INFO] Using socat as streamer (20180122 15:25:21.351)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20180122 15:25:21.357)
2018-01-22 15:25:21 140167538136832 [Note] WSREP: Prepared SST request: mariabackup|192.168.104.195:4444/xtrabackup_sst//1
2018-01-22 15:25:21 140167538136832 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-22 15:25:21 140167538136832 [Note] WSREP: REPL Protocols: 7 (3, 2)
2018-01-22 15:25:21 140167538136832 [Note] WSREP: Assign initial position for certification: 31, protocol version: 3
2018-01-22 15:25:21 140167580100352 [Note] WSREP: Service thread queue flushed.
2018-01-22 15:25:21 140167538136832 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ca2bb6d4-ff5d-11e7-8871-7f48103f96c0): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2018-01-22 15:25:21 140167331440384 [Note] WSREP: Member 0.0 (t4w5) requested state transfer from 'any'. Selected 1.0 (t4w5)(SYNCED) as donor.
2018-01-22 15:25:21 140167331440384 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 31)
2018-01-22 15:25:21 140167538136832 [Note] WSREP: Requesting state transfer: success, donor: 1
2018-01-22 15:25:21 140167538136832 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> ca2bb6d4-ff5d-11e7-8871-7f48103f96c0:31
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20180122 15:25:21.412)
2018/01/22 15:25:21 socat[15000] E bind(6,
, 16): Address already in use
WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 1 0 (20180122 15:25:21.422)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20180122 15:25:21.425)
2018-01-22 15:25:21 140167323047680 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.195' --datadir '/var/lib/mysql/' --parent '14715' '' : 32 (Broken pipe)
2018-01-22 15:25:21 140167323047680 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2018-01-22 15:25:21 140167672174720 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2018-01-22 15:25:21 140167672174720 [ERROR] Aborting
2018-01-22 15:25:21 140167331440384 [Warning] WSREP: 1.0 (t4w5): State transfer to 0.0 (t4w5) failed: -32 (Broken pipe)
2018-01-22 15:25:21 140167331440384 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
Note2 :
add port 4444/tcp as permanent to firewall or iptables rules
avoids the Errors socat[15000] E bind(6,
, 16): Address already in use
WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 1 0 (20180122 15:25:21.422)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20180122 15:25:21.425)
2018-01-22 15:25:21 140167323047680 [ERROR] WSREP: Process completed with error:
winstone can you provide the log from where socat was first launched successfully?
If this is repeatable and you are using systemd, can you try setting 'killmode=control-group' (https://www.freedesktop.org/software/systemd/man/systemd.kill.html#KillMode=) by following https://mariadb.com/kb/en/library/systemd/ . Please tell us if this corrects the problem.
actually the issue
socat[28262] E bind(6, {AF=2 0.0.0.0:4444}, 16): Address already in use
|
was observed with both service or systemctl utilities ; miss to mention that it's was used wsrep_sst_method mariabackup ; and also that mysqld is restating endless after the received failure ; --A
However as mentioned yesterday add port 4444/tcp as permanent to firewall or iptables rules resolves the problem --B
and 1 more note – if Node is joining for the first time to the cluster start mysql with option will prevent from socat[28262] E bind(6,
, 16): Address already in use --C
A.
service mysql start
|
Starting mysql (via systemctl): Job for mariadb.service failed because the cont rol process exited with error code. See "systemctl status mariadb.service" and " journalctl -xe" for details.
|
[FAILED]
|
|
systemctl start mysql
|
Job for mariadb.service failed because a fatal signal was delivered to the contr ol process. See "systemctl status mariadb.service" and "journalctl -xe" for deta ils.
|
|
B. enable port 4444
[root@t4w1 ~]# firewall-cmd --zone=public --add-port=4444/tcp --permanent
|
success
|
[root@t4w1 ~]# firewall-cmd --reload
|
success
|
|
C. start mysql service with option pouting out address of cluster active Node
[root@t4w1 ~]# service mysql start --wsrep_cluster_address=gcomm://192.168.104.1
|
Starting mysql (via systemctl): [ OK ]
|
|
then node is joining and updating db successfully
[root@t4w1 ~]# service mysql restart
|
Restarting mysql (via systemctl): [ OK ]
|
[root@t4w1 ~]# service mysql stop
|
Stopping mysql (via systemctl): [ OK ]
|
[root@t4w1 ~]# service mysql start
|
Starting mysql (via systemctl): [ OK ]
|
[root@t4w1 ~]# mysql -u root
|
|
problem Address already in use" on restart is observed again on 10.3.5 and the problem derives from
mysqld --wsrep process was not stopped upon restart command
- systemctl restart mariadb.service
or - service mysql restart
2018-04-02 17:24:05 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
|
2018-04-02 17:24:05 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
|
2018-04-02 17:24:05 0 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <info@codership.com> loaded successfully.
|
2018-04-02 17:24:05 0 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
|
2018-04-02 17:24:05 0 [Note] WSREP: Found saved state: dff6e041-1005-11e8-85c9-965f304f37bc:-1, safe_to_bootstrap: 1
|
2018-04-02 17:24:05 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.104.196; base_port = 4567; cert.log_con flicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.ina ctive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.susp ect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.me m_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gc s.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q _hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.bootstrap = YES; pc
|
2018-04-02 17:24:05 0 [Note] WSREP: GCache history reset: dff6e041-1005-11e8-85c9-965f304f37bc:0 -> dff6e041-1005-11e8-85c9-965f304f37bc:131618
|
2018-04-02 17:24:05 0 [Note] WSREP: Assign initial position for certification: 131618, protocol version: -1
|
2018-04-02 17:24:05 0 [Note] WSREP: wsrep_sst_grab()
|
2018-04-02 17:24:05 0 [Note] WSREP: Start replication
|
2018-04-02 17:24:05 0 [Note] WSREP: Setting initial position to dff6e041-1005-11e8-85c9-965f304f37bc:131618
|
2018-04-02 17:24:05 0 [Note] WSREP: protonet asio version 0
|
2018-04-02 17:24:05 0 [Note] WSREP: Using CRC-32C for message checksums.
|
2018-04-02 17:24:05 0 [Note] WSREP: backend: asio
|
2018-04-02 17:24:05 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
|
2018-04-02 17:24:05 0 [Note] WSREP: restore pc from disk successfully
|
2018-04-02 17:24:05 0 [Note] WSREP: GMCast version 0
|
2018-04-02 17:24:05 0 [Note] WSREP: (de19f79a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
|
2018-04-02 17:24:05 0 [Note] WSREP: (de19f79a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
|
2018-04-02 17:24:05 0 [Note] WSREP: EVS version 0
|
2018-04-02 17:24:05 0 [Note] WSREP: gcomm: connecting to group 'cluster1', peer '192.168.104.193:,192.168.104.195:,192.168.104.196:'
|
2018-04-02 17:24:05 0 [ERROR] WSREP: bind: Address already in use
|
2018-04-02 17:24:05 0 [ERROR] WSREP: failed to open gcomm backend connection: 98: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_bloc king=1', asio error 'bind: Address already in use': 98 (Address already in use)
|
at gcomm/src/asio_tcp.cpp:listen():830
|
2018-04-02 17:24:05 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -98 (Address already in use)
|
2018-04-02 17:24:05 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'cluster1' at 'gcomm://192.168.104.193,192.168.104.19 5,192.168.104.196': -98 (Address already in use)
|
2018-04-02 17:24:05 0 [ERROR] WSREP: gcs connect failed: Address already in use
|
2018-04-02 17:24:05 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.104.193,192.168.104.195,192.168.104.196) failed: 7
|
2018-04-02 17:24:05 0 [ERROR] Aborting
|
|
|
|
A good view of the current and previous mysqld processes will be required.
Switching between init systems is bound to cause troubles as they both (should) prevent a duplicate instance in different ways.
Related is a systemd bug I filed that can cause it on that side of things: https://github.com/systemd/systemd/issues/8630. SendSIGKILL=yes would be a workaround (https://mariadb.com/kb/en/mariadb/systemd/).
Maybe for sysv init scripts - pkill --parent
{mysqldpid}(before kiling mysqld otherwise the parent pid will go to 1)
Hi,
Experiencing the same problem, nodes can't join on Ubuntu Xenial.
mysqld stops with:
socat[21170] E bind(6, {AF=2 0.0.0.0:4444}, 16): Address already in use
and socat, xbstream, wsrep_sst_xtrabackup-v2 keep running, parenthood inherited by 'init'.
As soon as you kill all the sst processes systemd restarts mysqld and the problems is back, mysqld stops, sst processes keep running.
10.1.28 seems not to contain this fix.
In sql/wsrep_utils.cc:
}
err_ = posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETSIGDEF |
POSIX_SPAWN_SETSIGMASK |
POSIX_SPAWN_USEVFORK);
{