[MCOL-3564] um1 can't recover from `columnstore stop && columnstore start` Created: 2019-10-17 Updated: 2019-12-17 Resolved: 2019-11-27 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | MariaDB Server |
| Affects Version/s: | 1.4.0 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jens Röwekamp (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | SkySQLMVP | ||
| Environment: |
CentOS 7 - 1UM 2PM multi node cluster - on VirtualBox gitversionEngine: 1f47534 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
After the creation of a 1UM 2PM multi node cluster (root installation) UM1 won't recover from a restart of the ColumnStore daemon. Executed commands:
|
| Comments |
| Comment by Andrew Hutchings (Inactive) [ 2019-11-12 ] | ||
|
Marked as duplicate of | ||
| Comment by Andrew Hutchings (Inactive) [ 2019-11-20 ] | ||
|
Reopened as Jens is still hitting this after the "stop" fix. | ||
| Comment by Andrew Hutchings (Inactive) [ 2019-11-21 ] | ||
|
My notes: In the provided logs ProcMon on UM1 appears to get started twice on the "start" command. This causes a fight between the two and lots of process restarts. | ||
| Comment by Andrew Hutchings (Inactive) [ 2019-11-22 ] | ||
|
the "columnstore" script will now make sure ProcMon and ProcMgr are shut down during 'stop' and also during 'start' if they have been left behind. | ||
| Comment by Jens Röwekamp (Inactive) [ 2019-11-25 ] | ||
|
Hi LinuxJedi I tested the latest nightly RPMs (gitversionEngine: 6b91667) against | ||
| Comment by Jens Röwekamp (Inactive) [ 2019-11-26 ] | ||
|
For future reference: "mcsadmin getsysteminfo" will confirm that ColumnStore is in an ACTIVE state. Unfortunately DML on UM 1 just hangs. ColumnStore support report and notes to reproduce on virtual machines attached. | ||
| Comment by Daniel Lee (Inactive) [ 2019-11-27 ] | ||
|
Build verified: 1.4.1-1 UM1 [root@localhost bin]# ./columnstore stop WARNING: running on non Parent OAM Module, can't make configuration changes in this session. stopmodule Wed Nov 27 17:01:29 2019 Stopping Module(s) On PM1 Password is actually correct. First startsystem returned an error: ERROR: Connection refused mcsadmin> startsystem vagrant startSystem command, 'columnstore' service is down, sending command to System being started, please wait...ERROR: Connection refused Invalid Password when running 'columnstore start' on module um1, can retry by providing password as the second argument
Invalid Password when running 'columnstore start' on module pm2, can retry by providing password as the second argument
startSystem command, 'columnstore' service is down, sending command to System being started, please wait................ Serious issue. inserted row lost MariaDB [mytest]> create table t1 (c1 int) engine=columnstore; MariaDB [mytest]> insert into t1 values (1); MariaDB [mytest]> select * from t1;
------
------ At this point, stopped columnstore service on um1 and restarted stack from pm1 MariaDB [mytest]> select * from t1; MariaDB [mytest]> Row is not missing insert would hang MariaDB [mytest]> insert into t1 values (2); | ||
| Comment by Andrew Hutchings (Inactive) [ 2019-11-27 ] | ||
|
This behaviour is not supported in 1.4 | ||
| Comment by Daniel Lee (Inactive) [ 2019-11-27 ] | ||
|
mcsadmin> shutdownsystem This command stops the processing of applications on all Modules within the MariaDB ColumnStore System Checking for active transactions Stopping System...
Shutting Down System... mcsadmin> startsystem vagrant startSystem command, 'columnstore' service is down, sending command to System being started, please wait......................................... TIMEOUT: ProcMon not responding to getSystemStatus
|