Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
1.4.0
-
None
-
2020-5, 2020-6, 2020-7
Description
The load_brm program is not using the correct path on non-primary node startup. This will cause a node that was down/out of service to fail startup when columnstore restarts.
Example to reproduce:
3pm combined UM/PM
Take PM3 out of service with ungraceful shutdown.
Wait for system to normalize.
Bring PM3 back online.
errors will occur when PM3 attempts to download BRM_save files and run load_brm because it is not looking in correct path.
errors in log files will appear like following:
Mar 26 15:01:37 testPM3 ProcessMonitor[1616]: 37.822073 |0|0|0| D 18 CAL0000: BRM reset_locks script run
|
Mar 26 15:01:38 testPM3 ProcessMonitor[1616]: 38.260824 |0|0|0| D 18 CAL0000: Clear Shared Memory script run
|
Mar 26 15:01:38 testPM3 ProcessMonitor[1616]: 38.260944 |0|0|0| D 18 CAL0000: load_brm cmd = load_brm /var/lib/columnstore/data1/systemFiles/dbrm/0a30099b-a5ae-40d7-a7ef-420a71886490/BRM_saves > /var/log/mariadb/columnstore/load_brm.log1 2>&1
|
Mar 26 15:01:38 testPM3 IDBFile[4447]: 38.307307 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_journal, exception: unable to open Buffered file
|
Mar 26 15:01:38 testPM3 ProcessMonitor[1616]: 38.313567 |0|0|0| E 18 CAL0000: Error return DBRM load_brm
|
Mar 26 15:01:38 testPM3 ProcessMonitor[1616]: 38.314009 |0|0|0| D 18 CAL0000: Send SET Alarm ID 27 on device DBRM
|
Mar 26 15:01:38 testPM3 ProcessMonitor[1616]: 38.314762 |0|0|0| D 18 CAL0000: StatusUpdate of Process DBRMWorkerNode State = 7 PID = 0
|
Mar 26 15:01:38 testPM3 ProcessMonitor[1616]: 38.317351 |0|0|0| I 18 CAL0000: STARTALL: ACK back to ProcMgr, return status = 1
|
Mar 26 15:01:39 testPM3 ServerMonitor[4420]: 39.844808 |0|0|0| I 09 CAL0000: processInitComplete Successfully Called
|
Mar 26 15:01:44 testPM3 ProcessMonitor[1616]: 44.620073 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
|
Mar 26 15:01:44 testPM3 ProcessMonitor[1616]: 44.620501 |0|0|0| I 18 CAL0000: UPDATECONFIGFILE: Completed
|
[root@testPM3 ~]# cat /var/log/mariadb/columnstore/load_brm.log1
|
Error opening journal file /var/lib/columnstore/data1/systemFiles/dbrm/0a30099b-a5ae-40d7-a7ef-420a71886490/BRM_saves_journal
|
Recovering from this can be done by running following on PM1:
mcsadmin alterSystem-enableModule pm3
|
mcsadmin restartsystem y
|
This issue is related to failures with glusterfs failovers observed in 1.4 –
MCOL-3842