[MCOL-1149] ColumnStore not starting in CentOS 6 Created: 2018-01-08  Updated: 2018-04-20  Resolved: 2018-04-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: installation, ProcMgr
Affects Version/s: 1.0.13, 1.1.3
Fix Version/s: 1.0.13, 1.1.3

Type: Bug Priority: Critical
Reporter: Andrew Hutchings (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

postConfigure doesn't appear to start ProcMgr in CentOS 6 so the "please wait" hangs forever. Manually starting ProgMgr means the newly spawned one is killed, and ColumnStore starts one. Startup then continues and then hangs around DMLProc start.



 Comments   
Comment by David Hill (Inactive) [ 2018-01-08 ]

could not reproduce on build machine centos 6.7

system build and installed

Running the MariaDB ColumnStore setup scripts

post-mysqld-install Successfully Completed
post-mysql-install Successfully Completed

Starting MariaDB Columnstore Database Platform

MariaDB ColumnStore Database Platform Starting, please wait ....... DONE

System Catalog Successfull Created

MariaDB ColumnStore Install Successfully Completed, System is Active

Enter the following command to define MariaDB ColumnStore Alias Commands

. /usr/local/mariadb/columnstore/bin/columnstoreAlias

Enter 'mcsmysql' to access the MariaDB ColumnStore SQL console
Enter 'mcsadmin' to access the MariaDB ColumnStore Admin console

[root@ip-172-30-0-72 bin]# cat /etc/issue

Comment by Andrew Hutchings (Inactive) [ 2018-01-08 ]

Triggered by this commit:

26f7344dc05bc7fcf9111d75420e91e20536dfbc is the first bad commit
commit 26f7344dc05bc7fcf9111d75420e91e20536dfbc
Author: Ben Thompson <ben.thompson@mariadb.com>
Date:   Tue Dec 5 16:59:45 2017 -0600
 
    MCOL-445: Modify getConfig and setConfig to be case insensitive on variable names.

Comment by David Hill (Inactive) [ 2018-01-10 ]

problem related to system status initialing setting, which is set to DOWN = 9, and the stop in main.ccp that reads and checks the status to determine if it needs to launch processes, which is shown here

main.cpp:736 if ( systemstatus.SystemOpState != MAN_OFFLINE && !DISABLED) {

At the time line is hit, the system status = MAN_OFFLINE (0). It should be DOWN (9).

[root@ip-172-30-0-167 ~]# ma getsystemi
getsysteminfo Wed Jan 10 15:15:37 2018

System columnstore-1

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System MAN_OFFLINE

So need to see why its not DOWN

-----------------------------------

Breakpoint 1, main (argc=1, argv=0x7fffffffe648) at /home/builder/mariadb-columnstore-server/mariadb-columnstore-engine/procmon/main.cpp:736
736 if ( systemstatus.SystemOpState != MAN_OFFLINE && !DISABLED) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.el6_9.2.x86_64 libgcc-4.4.7-18.el6.x86_64 libstdc++-4.4.7-18.el6.x86_64 libxml2-2.7.6-21.el6_8.1.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) p systemstatus.SystemOpState
$1 = 0

Comment by David Hill (Inactive) [ 2018-01-10 ]

ok,this explains it.. the thread that is setting up the shared memory and setting the initial settings to DOWN is taking place after the main.cpp code, where it got launch is making the call to get the status... Code has a sleep of 6 seconds to allow the shared memory to be completed, but that is not long enough.
So either need to increase the sleep or better yet, need to set a initial flag where main will continue once the shared memory has been initialization.

Jan 10 17:10:15 ip-172-30-0-167 ProcessMonitor[11724]: 15.163827 |0|0|0| D 18 CAL0000:
Jan 10 17:10:15 ip-172-30-0-167 ProcessMonitor[11724]: 15.163864 |0|0|0| D 18 CAL0000: *********Process Monitor Started*********
Jan 10 17:10:15 ip-172-30-0-167 ProcessMonitor[11724]: 15.166618 |0|0|0| D 18 CAL0000: ProcMon: Starting as ACTIVE Parent
Jan 10 17:10:15 ip-172-30-0-167 ProcessMonitor[11724]: 15.166743 |0|0|0| D 18 CAL0000: createDataDirs called
Jan 10 17:10:15 ip-172-30-0-167 ProcessMonitor[11724]: 15.166879 |0|0|0| D 18 CAL0000: Message Thread started ..
Jan 10 17:10:15 ip-172-30-0-167 ProcessMonitor[11724]: 15.184297 |0|0|0| D 18 CAL0000: checkDataMount called
Jan 10 17:10:15 ip-172-30-0-167 ProcessMonitor[11724]: 15.194652 |0|0|0| D 18 CAL0000: statusControlThread Thread started ..
Jan 10 17:10:21 ip-172-30-0-167 ProcessMonitor[11724]: 21.195821 |0|0|0| D 18 CAL0000: SYSTEM STATUS = 0
Jan 10 17:10:21 ip-172-30-0-167 ProcessMonitor[11724]: 21.195952 |0|0|0| D 18 CAL0000: StatusUpdate of Process ProcessMonitor State = 1 PID = 11724
Jan 10 17:10:21 ip-172-30-0-167 ProcessMonitor[11724]: 21.197916 |0|0|0| D 18 CAL0000: mysqld Monitoring Thread started ..
Jan 10 17:10:49 ip-172-30-0-167 ProcessMonitor[11724]: 49.086310 |0|0|0| D 18 CAL0000: Process Status shared Memory allocated and Initialized
Jan 10 17:10:49 ip-172-30-0-167 ProcessMonitor[11724]: 49.086575 |0|0|0| D 18 CAL0000: SET System Status TO 9
Jan 10 17:10:49 ip-172-30-0-167 ProcessMonitor[11724]: 49.086622 |0|0|0| D 18 CAL0000: System/Module Status shared Memory allociated and Initialized

Comment by David Hill (Inactive) [ 2018-01-10 ]

pull request is done, please review.

Code change, replaced a sleep with a resume main thread flag to resolve problem

Comment by David Hill (Inactive) [ 2018-01-10 ]

to test, should be done on a centos 6.9, but fix wasnt really specifically related to 6.9.problem was a timing issue..

Just perform a binary install and run postConfigure for a single-server install and make sure postConfigure completes and system is ACTIVE.

Comment by Andrew Hutchings (Inactive) [ 2018-04-20 ]

Closing this issue as it was released months ago and slipped through the cracks.

Generated at Thu Feb 08 02:26:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.