[MCOL-5058] CMAPI and local smcat runs can access Storage Manager too early causing assertion in SM runtime Created: 2022-04-18 Updated: 2022-06-27 |
|
| Status: | Open |
| Project: | MariaDB ColumnStore |
| Component/s: | cmapi, Storage Manager |
| Affects Version/s: | 5.6.5, 6.2.3 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | Roman | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
Consider a part of a startup procedure for S3-based installation [1]. CMAPI that gets cluster/start REST call initiates node/start calls at all nodes. node/start in its turn starts with mcs-workernode@1 | mcs-workernode@2. The last two units initiate mcs-loadbrm systemd unit startup that in its turn initiates its local SM running systemctl start mcs-storagemanager. There is a period when SM doesn't fill up its internal structure prefix cache[2] yet when SM bootstraps itself. SM throws an assert exception [3] if SM request[4] comes when prefix cache isn't yet filled up. This failure causes mcs-workernode@ {1,2}units to fail [5]. The most severe issue is that non-primary nodes might look like they are OK but they have a reduced and corrupted extent maps in /dev/shm so that any extent map write operation distributed by the controllernode will set the cluster into read-only. Together with Alan we introduced an explicit delays b/w SM and actual extent map image load at the customer's site. However this workaround can't be used as an appropriate long-term solution. IMHO there are two long-term solution options:
The second approach doesn't solve the issues with ahead-of-time local smcat runs though so the first one looks more appropriate. 1. Here I consider systemd startup however the logic is the same for non-systemd container startup. CMAPI REST endpoints. |