[MCOL-2140] timeout for replication of 1 minute is to small - timing out on system with 4 nodes Created: 2019-02-05 Updated: 2023-03-20 Resolved: 2023-03-06 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | N/A |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | David Hill (Inactive) | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 1 |
| Labels: | None | ||
| Environment: |
2um 2pm with local query |
||
| Description |
|
Customer reported that the replication wasnt working and the slaves wasnt being setup on there 2pm 2um with local query. It turns out that the distributed request failed dur to a timeout on PM1 procmgr waiting on UM1 procmon. The distrbute command took longer than 1 minute on a 4 node system where it has to distribute to 3 slave nodes. PM1 Feb 4 15:29:45 usfit-scdb6 ProcessManager[171189]: 45.342198 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module um1 UM1 15:29:45 to 15:31:06 Feb 4 15:29:45 usfit-scdb1 ProcessMonitor[101017]: 45.338509 |0|0|0| I 18 CAL0000: MSG RECEIVED: Run Master DB Distribute command Feb 4 15:29:45 usfit-scdb1 ProcessMonitor[101017]: 45.350897 |0|0|0| D 18 CAL0000: cmd = /usr/local/mariadb/columnstore/bin/rsync.sh 192.168.212.39 ssh /usr/local/mariadb/columnstore 1 > /scdbprd_tmp//master-dist_um2.log Feb 4 15:30:08 usfit-scdb1 ProcessMonitor[101017]: 08.522949 |0|0|0| D 18 CAL0000: cmd = /usr/local/mariadb/columnstore/bin/rsync.sh 192.168.212.47 ssh /usr/local/mariadb/columnstore 1 > /scdbprd_tmp//master-dist_pm1.log Feb 4 15:30:38 usfit-scdb1 ProcessMonitor[101017]: 38.745774 |0|0|0| D 18 CAL0000: cmd = /usr/local/mariadb/columnstore/bin/rsync.sh 192.168.212.48 ssh /usr/local/mariadb/columnstore 1 > /scdbprd_tmp//master-dist_pm2.log Feb 4 15:31:06 usfit-scdb1 ProcessMonitor[101017]: 06.437841 |0|0|0| I 18 CAL0000: MASTERDIST: runMasterRep - ACK back to ProcMgr return status = 0 |
| Comments |
| Comment by David Hill (Inactive) [ 2019-02-05 ] |
|
Also the timeouts in the rsync script itself needs to be increased if its taking 30 seconds. set timeout 20 |
| Comment by Nico [ 2019-03-19 ] |
|
It's the same I notice in |
| Comment by Todd Stoffel (Inactive) [ 2023-03-06 ] |
|
This ticket was opened prior to convergence with the server. It may have been rendered obsolete. If this issue still exists in a modern version, please open a new request. |