Input from a community user about improvement in the rsync.sh script, which will equalize the front-end um from um1 (pm1) to the other nodes when mysql-replication is enabled
I notice that when we call mcsadmin enableMySQLReplication one of the steps is the call to the script /usr/local/mariadb/columnstore/bin/rsync.sh
If I understand well this is used to create the "zero point" each time, before activate the standard mysql replication.
What I'm not understanding is why this script not use the delete option of the rsync.
Here the problem:
1) I disable the MySQL replication with mcsadmin disableMySQLReplication
2) On the Primary Front-End MariaDB ColumnStore Module (pm1 /um1) I call a simple DROP TABLE or a DROP DATABASE
3) I enable the MySQL replication with mcsadmin enableMySQLReplication
In this case what I'm expecting is that also on the other UM (or pm if combinated) the table or the database that i drop were removed.
Instead all the ibd and frm files remain in place.
I suggest you to upgrade the rsync.sh script with the "--delete" option, the "-axvz" options instead of "-vopgr" and removing the --exclude=mysql/ and --exclude=test/ clause, somethings like:
Nico
added a comment - - edited According to MCOL-1063 and after some test it's better:
set COMMAND "rsync -az --stats --delete -e ssh --exclude=mysql/ --exclude=infinidb_vtable/ --exclude=infinidb_querystats/ --exclude=calpontsys/ --include=*/ --include=*/* --exclude=* $INSTALLDIR/mysql/db/ $USERNAME@$SERVER:$INSTALLDIR/mysql/db/"
After some test I refactor a bit the rsync.sh rsync.sh
In this version "expect" doesn't wait the timeout to go over in some case were it sent the password. In the previous version the second declaration of expect never match nothing in case of 'ssh' and so it wait for the timeout.
I'm also increasing the timeout to 600 sec. The previous one on big database can fall in problems.
I sill have a problem when I call it with ma enableMySQLReplication :
If the rsync takes more then X seconds (I think a minute) the command above return: **** enableRep Failed : API Failure return in enableMySQLRep API
I check in the debug.log and I found this:
{{Jan 25 09:40:49 prod-cs-1121 ProcessManager[2929]: 49.305394 |0|0|0| E 17 CAL0000: line: 6901 sendMsgProcMon: ProcMon Msg timeout on module pm1
Jan 25 09:40:49 prod-cs-1121 ProcessManager[2929]: 49.305508 |0|0|0| E 17 CAL0000: line: 11251 setMySQLReplication: ERROR: Error getting MySQL Replication Master Information
Jan 25 09:40:49 prod-cs-1121 ProcessManager[2929]: 49.305547 |0|0|0| I 17 CAL0000: Enable MySQL Replication status: 1}}
Calling ma enableMySQLReplication more times, rsync takes progressive less time and the replication finish correctly.
So, I think you need to increase the timeout for this command, it's still too short, I think 600sec is enough.
I hope can be included in next releases.
Thanks
Nico
added a comment - After some test I refactor a bit the rsync.sh
rsync.sh
In this version "expect" doesn't wait the timeout to go over in some case were it sent the password. In the previous version the second declaration of expect never match nothing in case of 'ssh' and so it wait for the timeout.
I'm also increasing the timeout to 600 sec. The previous one on big database can fall in problems.
I sill have a problem when I call it with ma enableMySQLReplication :
If the rsync takes more then X seconds (I think a minute) the command above return:
**** enableRep Failed : API Failure return in enableMySQLRep API
I check in the debug.log and I found this:
{{Jan 25 09:40:49 prod-cs-1121 ProcessManager [2929] : 49.305394 |0|0|0| E 17 CAL0000: line: 6901 sendMsgProcMon: ProcMon Msg timeout on module pm1
Jan 25 09:40:49 prod-cs-1121 ProcessManager [2929] : 49.305508 |0|0|0| E 17 CAL0000: line: 11251 setMySQLReplication: ERROR: Error getting MySQL Replication Master Information
Jan 25 09:40:49 prod-cs-1121 ProcessManager [2929] : 49.305547 |0|0|0| I 17 CAL0000: Enable MySQL Replication status: 1}}
Calling ma enableMySQLReplication more times, rsync takes progressive less time and the replication finish correctly.
So, I think you need to increase the timeout for this command, it's still too short, I think 600sec is enough.
I hope can be included in next releases.
Thanks
According to
MCOL-1063and after some test it's better:set COMMAND "rsync -az --stats --delete -e ssh --exclude=mysql/ --exclude=infinidb_vtable/ --exclude=infinidb_querystats/ --exclude=calpontsys/ --include=*/ --include=*/* --exclude=* $INSTALLDIR/mysql/db/ $USERNAME@$SERVER:$INSTALLDIR/mysql/db/"