[MCOL-892] 1.0.11 upgrade failed when base directory is nfs mounted Created: 2017-08-25  Updated: 2023-10-26  Resolved: 2017-08-31

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.0.11
Fix Version/s: 1.0.12, 1.1.0

Type: Bug Priority: Critical
Reporter: David Hill (Inactive) Assignee: David Hill (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Environment:

PanFS storage operating system


Sprint: 2017-17

 Description   

During both postConfigure and start System, Procmgr would hang when trying to do a file lock command on AlarmConfig.xml (which was newly added to 1.0.11). User upgrade from 1.0.10 to 1.0.11 on their system that was setup using PanFS storage operating system, which is NFS type setup.

No workaround - User fall back to 1.0.10.



 Comments   
Comment by David Hill (Inactive) [ 2017-08-29 ]

core file for 1.0.11 centos 7 build

ftp://ftp.mariadb.com/downloads/core.WriteEngineServ.6265

Comment by David Hill (Inactive) [ 2017-08-29 ]

getting a handle on this issue.

1. originally the problem with 1.0.11-1 was ProcMgr was calling oam api to update the AlarmConfig.xml file. A change was made to do a flock around this file, which caused the original problem in 1.0.11 on NFS systems.
2. Next I updated the oam api to use fcntl instead of flock, which fixed the original issue.
3. Also in this process, I saw that flock was being used in alarmmanager.cpp when access the alarm log file that reside in /var/log/mariadb/columnstore. I original made a change here also to use fcntl from flock and it was THAT changed that was causing crashes in WES and other processes. Not sure why, just strange. BUT I backed out these changes in alarmmanager.cpp and when back to using the flock and WES and other processes stopped crashing at startup...

So it now looks like develop-1.0, which Im calling 1.0.11-2, is working withw NFS setup. I plan to further. If all looks good, Im thinking about providing ABS global a link to this version of 1.0.11-2 and see if they can test it for us on there Panasis NFS system to make sure it works there..

BUT the good news is I have a hand on things now...

Comment by David Hill (Inactive) [ 2017-08-30 ]

Issue fixed in develop-1.0 build called 1.0.11-2. provided version to customer to try on their Panasas setup.

Comment by David Hill (Inactive) [ 2017-08-30 ]

fixes checked into 1.1.0

Comment by David Hill (Inactive) [ 2017-08-31 ]

tested by customer and inhouse

Generated at Thu Feb 08 02:24:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.