|
It appears to be a duplicate of MDEV-15254.
|
|
Not necessary a duplicate because previous version of SST scripts do not work on 10.2.13 which seemed to be the work around on MDEV-15254
|
|
Good Day
We've also hit the bug
Log from joining node:
2018-03-01 11:05:17 139928247703296 [Note] WSREP: Flow-control interval: [28, 28]
2018-03-01 11:05:17 139928247703296 [Note] WSREP: Trying to continue unpaused monitor
2018-03-01 11:05:17 139928247703296 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
2018-03-01 11:05:17 139929437337344 [Note] WSREP: State transfer required:
Group state: 43f70abb-1d27-11e8-9ae9-078170937149:0
Local state: 00000000-0000-0000-0000-000000000000:-1
2018-03-01 11:05:17 139929437337344 [Note] WSREP: New cluster view: global state: 43f70abb-1d27-11e8-9ae9-078170937149:0, view# 3: Primary, number of nodes: 3, my index: 2, protocol version 3
2018-03-01 11:05:17 139929437337344 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-03-01 11:05:17 139928239310592 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.0.129' --datadir '/var/lib/mysql/' --parent '27035' --binlog '/var/log/mysql/mariadb-bin' '
WSREP_SST: [INFO] Streaming with xbstream (20180301 11:05:17.873)
WSREP_SST: [INFO] Using socat as streamer (20180301 11:05:17.875)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20180301 11:05:17.903)
2018-03-01 11:05:19 139928256096000 [Note] WSREP: (46ff4a4e, 'tcp://0.0.0.0:4572') connection to peer 46ff4a4e with addr tcp://192.168.0.129:4572 timed out, no messages seen in PT3S
2018-03-01 11:05:19 139928256096000 [Note] WSREP: (46ff4a4e, 'tcp://0.0.0.0:4572') turning message relay requesting off
Terminated
WSREP_SST: [ERROR] Removing /tmp/tmp.SfeqE6ca6Q/xtrabackup_galera_info file due to signal (20180301 11:06:46.657)
WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 143 143 (20180301 11:06:46.661)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20180301 11:06:46.663)
2018-03-01 11:06:46 139928239310592 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.0.129' --datadir '/var/lib/mysql/' --parent '27035' --binlog '/var/log/mysql/mariadb-bin'
Read: '(null)'
2018-03-01 11:06:46 139928239310592 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.0.129' --datadir '/var/lib/mysql/' --parent '27035' --binlog '/var/log/mysql/mariadb-bin' : 32 (Broken pipe)
2018-03-01 11:06:46 139929437337344 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2018-03-01 11:06:46 139929437337344 [ERROR] Aborting
Error in my_thread_global_end(): 1 threads didn't exit
Log from donor node:
2018-03-01 11:05:17 139958632097536 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2018-03-01 11:05:17 139958632097536 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 474c5f92-1d27-11e8-98e3-268766cef611
2018-03-01 11:05:17 139958632097536 [Note] WSREP: STATE EXCHANGE: sent state msg: 474c5f92-1d27-11e8-98e3-268766cef611
2018-03-01 11:05:17 139958632097536 [Note] WSREP: STATE EXCHANGE: got state msg: 474c5f92-1d27-11e8-98e3-268766cef611 from 0 (galeratest-n1)
2018-03-01 11:05:17 139958632097536 [Note] WSREP: STATE EXCHANGE: got state msg: 474c5f92-1d27-11e8-98e3-268766cef611 from 1 (garb)
2018-03-01 11:05:17 139958632097536 [Note] WSREP: STATE EXCHANGE: got state msg: 474c5f92-1d27-11e8-98e3-268766cef611 from 2 (galeratest-n2)
2018-03-01 11:05:17 139958632097536 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 2,
members = 2/3 (joined/total),
act_id = 0,
last_appl. = 0,
protocols = 0/8/3 (gcs/repl/appl),
group UUID = 43f70abb-1d27-11e8-9ae9-078170937149
2018-03-01 11:05:17 139958632097536 [Note] WSREP: Flow-control interval: [28, 28]
2018-03-01 11:05:17 139958632097536 [Note] WSREP: Trying to continue unpaused monitor
2018-03-01 11:05:17 139958623704832 [Note] WSREP: New cluster view: global state: 43f70abb-1d27-11e8-9ae9-078170937149:0, view# 3: Primary, number of nodes: 3, my index: 0, protocol version 3
2018-03-01 11:05:17 139958623704832 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-03-01 11:05:17 139958623704832 [Note] WSREP: REPL Protocols: 8 (3, 2)
2018-03-01 11:05:17 139958623704832 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2018-03-01 11:05:17 139958674061056 [Note] WSREP: Service thread queue flushed.
2018-03-01 11:05:20 139958640490240 [Note] WSREP: (43f6a5f6, 'tcp://0.0.0.0:4572') turning message relay requesting off
2018-03-01 11:06:48 139958640490240 [Note] WSREP: declaring 44d5dbcd at tcp://192.168.0.152:4572 stable
2018-03-01 11:06:48 139958640490240 [Note] WSREP: forgetting 46ff4a4e (tcp://192.168.0.129:4572)
2018-03-01 11:06:48 139958640490240 [Note] WSREP: Node 43f6a5f6 state prim
2018-03-01 11:06:48 139958640490240 [Note] WSREP: view(view_id(PRIM,43f6a5f6,4) memb
{
43f6a5f6,0
44d5dbcd,0
}
joined {
} left {
} partitioned
{
46ff4a4e,0
}
)
2018-03-01 11:06:48 139958640490240 [Note] WSREP: save pc into disk
2018-03-01 11:06:48 139958640490240 [Note] WSREP: forgetting 46ff4a4e (tcp://192.168.0.129:4572)
2018-03-01 11:06:48 139958632097536 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2018-03-01 11:06:48 139958632097536 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 7dd64ee1-1d27-11e8-ae63-1e794acab87f
2018-03-01 11:06:48 139958632097536 [Note] WSREP: STATE EXCHANGE: sent state msg: 7dd64ee1-1d27-11e8-ae63-1e794acab87f
2018-03-01 11:06:48 139958632097536 [Note] WSREP: STATE EXCHANGE: got state msg: 7dd64ee1-1d27-11e8-ae63-1e794acab87f from 0 (galeratest-n1)
2018-03-01 11:06:48 139958632097536 [Note] WSREP: STATE EXCHANGE: got state msg: 7dd64ee1-1d27-11e8-ae63-1e794acab87f from 1 (garb)
2018-03-01 11:06:48 139958632097536 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 3,
members = 2/2 (joined/total),
act_id = 0,
last_appl. = 0,
protocols = 0/8/3 (gcs/repl/appl),
group UUID = 43f70abb-1d27-11e8-9ae9-078170937149
2018-03-01 11:06:48 139958632097536 [Note] WSREP: Flow-control interval: [23, 23]
2018-03-01 11:06:48 139958632097536 [Note] WSREP: Trying to continue unpaused monitor
2018-03-01 11:06:48 139958623704832 [Note] WSREP: New cluster view: global state: 43f70abb-1d27-11e8-9ae9-078170937149:0, view# 4: Primary, number of nodes: 2, my index: 0, protocol version 3
2018-03-01 11:06:48 139958623704832 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-03-01 11:06:48 139958623704832 [Note] WSREP: REPL Protocols: 8 (3, 2)
2018-03-01 11:06:48 139958623704832 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2018-03-01 11:06:48 139958674061056 [Note] WSREP: Service thread queue flushed.
2018-03-01 11:06:54 139958640490240 [Note] WSREP: cleaning up 46ff4a4e (tcp://192.168.0.129:4572)
|
|
Can you try the fix from MDEV-15254?
https://jira.mariadb.org/secure/attachment/45259/45259_wsrep_sst_xtrabackup-v2.sh
|
|
I have already used script from mariadb 10.2.8 and it works fine. Thank you. I just want to have this bug fixed in the next release
|
|
It will be. MDEV-15254 already lists 10.2.14 as the fixed version.
toddstoffel your description of previous versions do not work is a bit vague. Does the fixed script complete a SST?
|
|
@danblack,
As I stated previously, we tried the SST scripts from several previous versions. The link you provided is a little late as the client has already rolled back their install to 10.2.12.
Creating custom shell scripts as a "workaround", or offering ANY "workaround" for a product that was broken on release is not really adequate. Let's hope 10.2.14 goes better than 10.2.13 or 10.2.10* went.
*See (https://jira.mariadb.org/browse/MDEV-14256)
|
|
Agree. FYI MDEV-15409 is blocker
|
|
FOUND : the problem come from --defaults-file with : (look it's not loaded in particular order or combined with an other option).
First I tried to upgrade my galera cluster from 10.2.12 to 10.2.13, i stopped one node to upgrade and put back, but the node fail to IST and then fail to SST. (I pushed a bug about this : https://jira.mariadb.org/browse/MDEV-15453?filter=-2)
Then when we tried to deploy a new cluster :
At begining we asked ourself why the upgrade of password failed with SALT (https://docs.saltstack.com/en/latest/ref/states/all/salt.states.mysql_user.html). in fact I think this trouble and the sst is exactly the same the cannot connect to mysql (particular sign, you can see : ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
(or whatever you put in /root/.my.cnf)
It's mean that all stuff witch use salt are falling also. I don't know where it is in the code but i know it's linked with : "libmariadbclient18" and special options of client to connect to MySQL/MariaDB
Work :
ii libmariadbclient18:amd64 10.1.26-0+deb9u1 amd64 MariaDB database client library
|
(i don't know why Debian installed this version for mariadb 10.2.13)
Not Work :
(with apt-get upgrade)
ii libmariadbclient18 10.2.13+maria~stretch amd64 Virtual package to satisfy external depends
|
Full listing :
Work :
dpkg -l | grep -E "mysql|maria"
|
ii libdbd-mysql-perl 4.041-2 amd64 Perl5 database interface to the MariaDB/MySQL database
|
ii libmariadbclient18:amd64 10.1.26-0+deb9u1 amd64 MariaDB database client library
|
ii mariadb-backup-10.2 10.2.13+maria~stretch amd64 Backup tool for MariaDB server
|
ii mariadb-client-10.2 10.2.13+maria~stretch amd64 MariaDB database client binaries
|
ii mariadb-client-core-10.2 10.2.13+maria~stretch amd64 MariaDB database core client binaries
|
ii mariadb-common 10.2.13+maria~stretch all MariaDB database common files (e.g. /etc/mysql/conf.d/mariadb.cnf)
|
ii mariadb-server-10.2 10.2.13+maria~stretch amd64 MariaDB database server binaries
|
ii mariadb-server-core-10.2 10.2.13+maria~stretch amd64 MariaDB database core server files
|
ii mysql-common 10.2.13+maria~stretch all MariaDB database common files (e.g. /etc/mysql/my.cnf)
|
ii python-mysqldb 1.3.7-1.1 amd64 Python interface to MySQL
|
not work :
ii libdbd-mysql-perl 4.041-2 amd64 Perl5 database interface to the MariaDB/MySQL database
|
ii libmariadb3 10.2.13+maria~stretch amd64 MariaDB database client library
|
ii libmariadbclient18 10.2.13+maria~stretch amd64 Virtual package to satisfy external depends
|
ii mariadb-backup-10.2 10.2.13+maria~stretch amd64 Backup tool for MariaDB server
|
ii mariadb-client-10.2 10.2.13+maria~stretch amd64 MariaDB database client binaries
|
ii mariadb-client-core-10.2 10.2.13+maria~stretch amd64 MariaDB database core client binaries
|
ii mariadb-common 10.2.13+maria~stretch all MariaDB database common files (e.g. /etc/mysql/conf.d/mariadb.cnf)
|
ii mariadb-server-10.2 10.2.13+maria~stretch amd64 MariaDB database server binaries
|
ii mariadb-server-core-10.2 10.2.13+maria~stretch amd64 MariaDB database core server files
|
ii mysql-common 10.2.13+maria~stretch all MariaDB database common files (e.g. /etc/mysql/my.cnf)
|
ii python-mysqldb 1.3.7-1.1 amd64 Python interface to MySQL
|
|
|
any news ?
|
|
Hi,
i tried this with the new version
libmariadbclient18 10.2.14+maria~stretch
|
the salt.states.mysql_user can't connect with this lib client with any parameters: connection_host, connection_port, connection_user, connection_pass or connection_default_file
but it's working with :
libmariadbclient18:amd64 10.1.26-0+deb9u1
|
About the SST all working well with lib "libmariadbclient18 10.2.14+maria~stretch"
|
|
Resolutions:
1. Disable SELinux on all nodes
2. Temporary disable firewall on all nodes
2. Adjust timeout of mysql.service (useful when you already have huge databases)
4. Upgrade to 10.2.16 and install "socat" and "nc" . If you are planning to use "mariabackup" please do install "MariaDB-backup".
5. On your MariaDB config under galera section
wsrep_debug=1
#mariabackup or xtrabackup-v2
wsrep_sst_method=mariabackup
5. On your MariaDB config under client section
user=root
#enclosed your password with double qoutes here
password="password-here"
Hope someone find it useful.
Actually this one gave me a headache for almost a week.
I haven't tested yet if this will work with user with this server privileges: RELOAD , PROCESS , LOCK TABLES , REPLICATION SLAVE , REPLICATION CLIENT .
I find it strange that "wsrep_sst_auth" is not being used by mariabackup and xtrabackup-v2 even if you have defined it in your configuration, according to the MariaDB documentation this should be sufficient.
|
|
Hi
Sure it doesnt use wsrep_sst_auth ?
I have just upgraded from 10.2.9 -A 10.2.18 and would claim that mariabackup indeed does use the wsrep_sst_auth values.
I admit that wsrep_sst_method=mariabackup is still, however, not working in any way. It tries to read and use the info from that _auth variable. But if you enclose it in ".." it seems to end up trying to set user = nothing .... And when removing the double quotes, it just fails with "access denied" ...
Frustrating as he11.
I'm still trying to figure out where "I took a wrong turn", because it *did" work in my test environment ...
|
|
mariabackup working for me. don't this problem anymore in 10.3.9
|
|
Never mind - I found the reason for the last two weeks fighting with my Galera/MariaDB after the latest (or, first ever) upgrade since going into production!
I have now created a /etc/systemd/system/mariadb.service.d/timeout.conf
with the sole content:
[Service]
TimeoutSec=infinity "Yes, "I N F I N I T Y" , not "zero" like so many other notes say, no: infinity.
- Would it have been an idea to write something in startup log, to someone using an OLD value for this parameter, if you decide to change the design?? (And that's a low kick to the systemd devels, not the MariaDB ones, you go that, right )
- It could have saved us a LOT of headache, until I stumbled over somewhere stating that "systemd" from versionx is using "infinity and not 0 ..
Fairly on the "uncool" side of documentation, as documentation goes
Could I have suggestions on what to troubleshoot on my GaleraCluster, because I am lost for ideas now.
I have a test env with 3 nodes (ubuntu (16.04) / MariaDB (10.2.18) ) It works fine (On VirtualBox).
But I have a production system with similar versions, on VMWare ESX as only difference, and it just cannot do SST with neither rsync nor mariabackup.
It used to be able to do SST (with the default option rsync), before the latest apt update/apt upgrade and reboots, then Galera died (apt sessions were successful, on all three nodes)
Now neither rsync nor mariabackup can do SST !?
It just gets connection aborted/ pipe broken. I suspect it to be some kind of timeout function somewhere, but can't find where it could be,
Both servers are on same VMware host, so network is not the cause- I have tried sending 10GB over a "nc" stream on port 4444, and it finished successfully, but MariaDB/Galera fails doing "the same", so I suspect reason is somewhere in OS/MariaDB/Galera settings.
On DONOR log it will die on " -wsrep-sst-donor: socat[10625] E write(6, 0x68dd30, 8192): Broken pipe"
On JOINER log it will report:
"Oct 15 16:20:00 DB-6 -wsrep-sst-joiner: Waiting for SST streaming to complete!
Oct 15 16:20:05 DB-6 systemd[1]: mariadb.service: Start operation timed out. Terminating."
All ports required, are open in firewall on both servers
the libmariadbclient mentioned here in ticket is apparently libmysqlclient20:amd64 (5.7.23-0ubuntu0.16.04.1)
I will try to paste the results of trying an (mariadb) SST run here, and hope some one will feel like running through and maybe suggest what I could look for next?
sst_mariabackup_fail.txt
|
|
systemd code around TimeoutSec (before and after this change, https://github.com/systemd/systemd/commit/fb27be3f60147685fe5272ce3fdb1384f92390b5) seems to indicate 0 is treated as infinity.
|
|
MDEV-15607 will extend the timeout of systemd without a configuration change in systemd v236+
|
|
So the joiner is timing out and causing the donor to die. Can you show `systemctl show mariadb.service` on DB-6?
|
|
Thanks for pointing to the code, Daniel - accustomed to Oracle, I completely forgot that's possible with OpenSource ....
However, followiing multiple boots, trying to figure out what's causing my grief, I'd say that the TimeoutSec=0 in my /etc/mysql/conf.d/galera.cnf definitely did not do the trick, until I edited it to TimeoutSec=infinity, reloaded systemd and started MariaDB.
The only thing I wonder is, will systemd be able to ignore a change to its config files if one does not run the daemon-reload? That could explain it - because I don't recall if I remembered doing exactly that reload or simply rebooted Ubuntu...
Yes, the Joiner timed out, but only caused the Join to fail. The DONOR was unaffected the whole time.
Before setting timoutsec=infinity, I remember checking the "systemctl show mariadb.service|grep -i timeout" and it responded "1m 30s" - despite that config file having TimeoutSec=0
Thanks,
Brian
|
|
Ahh, you said MDEV-15607 , in v236+ will extend timeout w/o config change, so I tried cheking my systemd version:
it is apparently "229-4ubuntu21.4" - would that explain something too ?
Could you please reveal how I check that I would need 236+, to have this fix, I don't think I can see references in the https://jira.mariadb.org/browse/MDEV-15607 ?
I just figured out that 236+ seems to be for Ubuntu 18, so I can assume that this will not be fixed for the Ubuntu 16.04 that I'm using?
|
|
v236 is from when the systemd feature was added that extends timeouts: https://github.com/systemd/systemd/commit/a327431bd168b2f327f3cd422379e213c643f2a5 . The Timeoutsec= wil suffice in the meantime.
|
|
Hi, is this issue resolved if not can you try with more recent version of mariadb 10.2 server and Galera library,
|