[MCOL-3308] Cannot move DBRoot to resurrected PM after automatic fail-over Created: 2019-05-14 Updated: 2023-10-26 Resolved: 2022-02-16 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ? |
| Affects Version/s: | 1.2.3 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Critical |
| Reporter: | Assen Totin (Inactive) | Assignee: | Unassigned |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Newly installed system to test a setup for a prospect,1 UM + 3 PM. 3 DBRoots, initially one per PM. PM1 is the OAM. Turning off PM3 resulted in automatic fail-over so DBRoot3 got attached to PM1 and queries were processed properly. (We only have 1 database with one small testing table.) After PM3 was booted again, it came up in MAN_DISABLED state (guess this was expected?). To initiate a DBRoot3 move back to PM3, we first had to activate the module with "alterSystem-EnableModule pm3", after which PM3 changed state to MAN_OFFLINE state. To be able to initiate a DBRoot move, we next had to stop the system processing with "stopSystem", after which the whole system state became MAN_OFFLINE. Component Status Last Status Change Module um1 MAN_OFFLINE Tue May 14 17:43:31 2019 We then triggered the move (DBRoot3 from PM1 to PM3): mcsadmin> movePmDbrootConfig pm1 3 pm3 DBRoot IDs currently assigned to 'pm1' = 1, 3 DBroot IDs being moved, please wait... DBRoot IDs newly assigned to 'pm1' = 1, 3 As can be seen, the DBRoot was not moved. Starting the system was not posisble, because PM3 has no DBRoot attached. May 14 17:19:54 p2w1 ProcessManager[12373]: 54.438248 |0|0|0| C 17 CAL0000: startSystemThread failed: Module 'pm3' has no DBRoots assigned to it We had to manually disable PM3 in order to start the system, which then came up and began processing queries. The error log has no entries related to the movePmDbrootConfig command. The debug logs has some, which seem to suggest that the move was successful. One line stands out (I put it in bold), after the unmountDBRoot for DBRoot3 is sent to the PM1 (correct), the mountDBRoot is sent again to pm1 (?!). Am I missing anything here? May 14 17:46:09 p2w1 oamcpp[6898]: 09.591518 |0|0|0| D 08 CAL0000: manualMovePmDbroot: 3 from pm1 to pm3 |
| Comments |
| Comment by David Hill (Inactive) [ 2019-05-14 ] |
|
This is a BUG. Previous MCOL wsa been opened |
| Comment by Assen Totin (Inactive) [ 2019-05-15 ] |
|
It is not a bug, it is a completely missing stuff. Check Oam::manualMovePmDbroot function - it only modifies the todbrootConfigList and residedbrootConfigList if DataRedundancyConfig is set, which in turn is only true when Gluster is enabled. In our case we don't have Gluster (because Gluster is slow and we are testing a solution that will have to ingest 50K rows per second constantly). Was this ever working? How can we have automated failover which works with NFS, (i.e. does move the dbroots properly) but not have manual move when we need to rejoin a resurrected node? |
| Comment by Roman [ 2019-07-29 ] |
|
I agree. This looks like a missing functionality. |
| Comment by Roman [ 2022-02-16 ] |
|
We don't have OAM anymore. CMAPI CLI will have this functionality. |