[MCOL-3829] ColumnStore did not start correctly when server is rebooted without executing shutdownsystem first Created: 2020-02-21  Updated: 2023-11-27  Resolved: 2020-02-24

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.4.2, 1.4.3
Fix Version/s: 1.4.3

Type: Bug Priority: Major
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File debug.log    

 Description   

Build testing: 1.4.3-1 Azure build 20200220

When ColumnStore is up and running, I just issued 'reboot" to restart the VM. When reboot is finished, ColumnStore is in the following state:

running shutdownsystem before reboot does not cause this issue.

getprocessstatus Fri Feb 21 19:48:46 2020

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor pm1 ACTIVE Fri Feb 21 19:48:05 2020 1489
ProcessManager pm1 ACTIVE Fri Feb 21 19:48:11 2020 11319
DBRMControllerNode pm1 FAILED Fri Feb 21 19:48:16 2020
ServerMonitor pm1 ACTIVE Fri Feb 21 19:48:18 2020 11705
DBRMWorkerNode pm1 ACTIVE Fri Feb 21 19:48:19 2020 11726
PrimProc pm1 ACTIVE Fri Feb 21 19:48:23 2020 11779
ExeMgr pm1 ACTIVE Fri Feb 21 19:48:27 2020 11835
WriteEngineServer pm1 ACTIVE Fri Feb 21 19:48:31 2020 11911
DDLProc pm1 ACTIVE Fri Feb 21 19:48:35 2020 11966
DMLProc pm1 FAILED Fri Feb 21 19:48:42 2020
mysqld pm1 ACTIVE Fri Feb 21 19:48:14 2020 11561
mcsadmin>

running shutdownsystem and startsystem would bring ColumnStore back to operational state.

Further testing indicates that the issue occurs only on the first "reboot without shutdownsystem" after installation. After the first reboot, running "reboot without shutdownsystem" again did not cause the issue.

1.2.5-1 does not have this issue
1.4.2-1 also has this issue

[root@localhost columnstore]# cat crit.log
Feb 21 20:07:38 localhost DMLProc[11875]: 38.019910 |0|0|0| C 20 CAL0002: DMLProc failed to start due to : Rollback will be deferred due to DBRM is in read only state.
Feb 21 20:07:39 localhost ProcessManager[11245]: 39.919530 |0|0|0| C 17 CAL0000: startSystemThread: Module failed, Set System State to FAILED: pm1
Feb 21 20:07:39 localhost ProcessManager[11245]: 39.934460 |0|0|0| C 17 CAL0000: startMgrProcessThread Exit with a failure, error returned from startSystemThread

[root@localhost columnstore]# cat err.log
Feb 21 20:07:14 localhost ProcessMonitor[1486]: 14.359121 |0|0|0| E 18 CAL0000: Error return DBRM load_brm
Feb 21 20:07:38 localhost DMLProc[11875]: 38.019910 |0|0|0| C 20 CAL0002: DMLProc failed to start due to : Rollback will be deferred due to DBRM is in read only state.
Feb 21 20:07:39 localhost ProcessManager[11245]: 39.919530 |0|0|0| C 17 CAL0000: startSystemThread: Module failed, Set System State to FAILED: pm1
Feb 21 20:07:39 localhost controllernode[11245]: 39.925319 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::clearSystemState() failed (network)
Feb 21 20:07:39 localhost controllernode[11245]: 39.926380 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::clearSystemState() failed (network)
Feb 21 20:07:39 localhost ProcessManager[11245]: 39.934460 |0|0|0| C 17 CAL0000: startMgrProcessThread Exit with a failure, error returned from startSystemThread

Please also see attached debug.log file.



 Comments   
Comment by Daniel Lee (Inactive) [ 2020-02-24 ]

Build verified: 1.4.3-2 BB (2020/02/24)
Engine commit:
295ba657246d96dce2b97400b46f27e992cac774'.

Repeated the original test 5 times. The reported issue is no longer happening.

Generated at Thu Feb 08 02:45:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.