Details
-
Bug
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.2.22
-
None
-
debian 9
Description
When doing SST on joiner the socat receiver once donor completes SST transfer the joiner hangs in TCP connection ESTABLISHED while the socat on donor already ends. The solution is to wait 7200 seconds which is default timeout for dead tcp connections or manually kill socat on joiner which will continue with SST or my workaround currently is configuration on joiner in my.cnf -
[sst]
sockopt=,keepalive,keepidle=10,keepintvl=10,keepcnt=2
which will close dead tcp connection. I suggest to put this keepalive directly into the /usr/bin/wsrep_sst_mariabackup
I also suggest to revise or find out why the socat on donor does not sends FIN or sends EOF over the network to joiner.
Here are some logs:
on Donor:
Feb 18 04:04:50 s1 -innobackupex-backup: [00] 2019-02-18 04:04:50 completed OK!
Feb 18 04:04:50 s1 -wsrep-sst-donor: Total time on donor: 0 seconds
Feb 18 04:04:50 s1 -wsrep-sst-donor: Cleaning up temporary directories
on Joiner:
Feb 18 02:20:31 s3 -wsrep-sst-joiner: Waiting for SST streaming to complete!
Feb 18 04:08:04 s3 -wsrep-sst-joiner: 2019/02/18 04:08:04 socat[20811] E read(7, 0x55845e0c55b0, 8192): Connection timed out Feb 18 04:08:04 s3 -wsrep-sst-joiner: [00] 2019-02-18 04:08:04 xb_stream_read_chunk(): my_read() failed. Feb 18 04:08:04 s3 -wsrep-sst-joiner: Error while getting data from donor node: exit codes: 1 1 Feb 18 04:08:04 s3 -wsrep-sst-joiner: Preparing the backup at /data/mysql//.sst Feb 18 04:08:04 s3 -wsrep-sst-joiner: Evaluating /usr//bin/mariabackup --innobackupex --apply-log $rebuildcmd ${DATA} 2>&1 | logger -p daemon.err -t -innobackupex-apply
Feb 18 04:08:04 s3 -innobackupex-apply: 190218 04:08:04 innobackupex: Starting the apply-log operation
without the extra socket option (,keepalive,keepidle=10,keepintvl=10,keepcnt=2) the timeout will happen after 2 hours and not that fast