handling multi server columnstore failover
(MCOL-1466)
|
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | N/A |
| Affects Version/s: | None |
| Fix Version/s: | N/A |
| Type: | Sub-Task | Priority: | Critical |
| Reporter: | Developer | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Hi David, |
| Comments |
| Comment by David Hill (Inactive) [ 2018-07-25 ] |
|
So the same issue showed up in the second scenerio. The system was in a non-funtioning state because there were missing DBRM files. This is the set of DBRM files from pm2 off of dbroot 1. As shown in the other MCOL, it looks like just a startup set of files. There is the OID file missing, which is causing the system not to startup. total 16
Again, not sure what is happening to these files during the pm1 to pm2 failover. I'm trying to produice the issue. Also here is some info on how PM DBROOT assignmenst work and how it is handle on FAILOVER. After a normal install, PM1 is the Parent and it will have DBROOT 1 assigned to it. DBROOT 1 is "ALWAYS" assigned to the Parent Module. And PM2 has DBROOT 2 and PM3 has DBOOT 3. In the case, PM2 is the HOT-STANDBY Parent When PM1 GOES DOWN, this is work happens: When PM1 RECOVERS: So this failover process is all working as designed. But in your case, DBRM files that are on DBROOT 1 get deleted, lost, something leaving the system will partial DBRM files and the system failing to startup. So that is your issue, will see if I can reproduce |
| Comment by David Hill (Inactive) [ 2018-07-25 ] |
|
couldnt reproduce issue. process shown below shows the dbrm before outage with down and after recovery. didnt have any dbrm file issue and system recovered with pm1 up and running again. – Disk BRM Data files – total 16
/home/mariadb-user/mariadb/columnstore/data1/systemFiles/dbrm/BRM_saves --------------------------------- new system install - dbrm files from PM1 / DBROOT1 ll PM1 STOP INSTANCE - FROM PM2 DBROOT 1,2 ASSIGNED AND ALL DBRM FILES EXIST Performance Module (DBRoot) Storage Type = external ll PM1 START INSTANCE Performance Module (DBRoot) Storage Type = external Amazon EC2 Volume Name/Device Name/Amazon Device Name for DBRoot1: vol-0dbf71303b5d79d46, /dev/sdg, /dev/xvdg [mariadb-user@ip-172-31-46-54 dbrm]$ home System 1.1.5-ebs System and Module statuses Component Status Last Status Change Module um1 ACTIVE Wed Jul 25 20:20:14 2018 Active Parent OAM Performance Module is 'pm2' MariaDB ColumnStore Process statuses Process Module Status Last Status Change Process ID ProcessMonitor pm1 ACTIVE Wed Jul 25 20:18:57 2018 1006 ProcessMonitor pm2 ACTIVE Wed Jul 25 20:03:27 2018 2099 ProcessMonitor pm3 ACTIVE Wed Jul 25 20:03:28 2018 2001 Active Alarm Counts: Critical = 2, Major = 0, Minor = 0, Warning = 0, Info = 0 |
| Comment by Developer [ 2018-07-25 ] |
|
Hi David, Still I can't understood reason for below case. Please review this. We found system has moved parentOAM to other PM and PM1 become disabled but its dbroot had not moved. Also we noticed database become readonly access means It allows only SELECT operation but "CREATE TABLE, UPDATE, INSERT, DELETE" had stopped working. Why? Also can you check this 2 more questions? 3 > Can you also provide us details about which EBS Volume Type (gp2, io1, sc1, st1, standard) is best suitable for large amount of data we have some tables which has more than 50 Million records? Thanks. |
| Comment by Developer [ 2018-07-26 ] |
|
Also I want to know how you are generating fail-over on PM1 (ParentOAM)? We are stopping PM1 instance from AWS Console to generate fail-over. is there any problem in that? |
| Comment by Todd Stoffel (Inactive) [ 2021-04-05 ] |
|
OAM has been deprecated and all of these old bash scripts were removed as part of a cleanup sweep that was done recently. |