[MCOL-5594] Interactive "mcs cluster stop" command for CMAPI - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 23.10.2
Component/s: cmapi
Labels:
- cmapi

Sprint:
2023-11, 2023-12, 2024-1

Description

Right now `mcs cluster stop` command has a fixed 5 minute timeout waiting for transaction to finish/rollback, For better UX `mcs cluster stop` must interactively ask a user to initiate force shutdown after a minute of waiting for the cluster to stop. There must be parameters to the command to set the timeout and enable force shutdown w/o confirmation, e.g.

```
mcs cluster shutdown
... after a minute
There were data changing operations running on the cluster that are now rolling back. Do you want to initiate a force shutdown?
!!! Force shutdown might affect the availability of tables used by the mentioned operations !!!

Attachments

Issue Links

relates to

MCOL-5105 Reduced systemd timeouts results in corrupted EM

Closed

MCOL-5617 Add working timeout for non interactive cluster stop.

Open

MCOL-5773 Time out for non-interactive cluster stop.

Closed

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Susil Behera added a comment - 2024-04-24 14:13

Now the fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3150 has been merged into develop. So, I did a fresh testing. Here are some findings (potential bugs),

1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout.
The following result is on a fresh cluster where there is no active writes going on,

[root@mcs1 /]# ps aux | grep DMLProc; date; mcs cluster stop -i -t 300; date;
root 1532 0.1 0.7 456248 114668 ? Sl 13:57 0:00 /usr/bin/DMLProc
root 1827 0.0 0.0 12128 1016 ? S+ 13:59 0:00 grep --color=auto DMLProc
Wed Apr 24 13:59:32 UTC 2024

{ "timestamp": "2024-04-24 13:59:33.099015" }

Wed Apr 24 14:04:37 UTC 2024

2. In interactive mode if I choose no, it continues asking the same again again.
Ideally it should exit after first No.

mcs cluster stop -i -t 5
DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] y { "timestamp": "2024-04-24 13:21:36.284813" }

3. --timeout option should not be accepted in the non-interactive mode (default)

mcs cluster stop -t 5 { "timestamp": "2024-04-24 13:37:54.499705" }

Susil Behera added a comment - 2024-04-24 14:13 Now the fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3150 has been merged into develop. So, I did a fresh testing. Here are some findings (potential bugs), 1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout. The following result is on a fresh cluster where there is no active writes going on, [root@mcs1 /] # ps aux | grep DMLProc; date; mcs cluster stop -i -t 300; date; root 1532 0.1 0.7 456248 114668 ? Sl 13:57 0:00 /usr/bin/DMLProc root 1827 0.0 0.0 12128 1016 ? S+ 13:59 0:00 grep --color=auto DMLProc Wed Apr 24 13:59:32 UTC 2024 { "timestamp": "2024-04-24 13:59:33.099015" } Wed Apr 24 14:04:37 UTC 2024 2. In interactive mode if I choose no, it continues asking the same again again. Ideally it should exit after first No. mcs cluster stop -i -t 5 DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] y { "timestamp": "2024-04-24 13:21:36.284813" } 3. --timeout option should not be accepted in the non-interactive mode (default) mcs cluster stop -t 5 { "timestamp": "2024-04-24 13:37:54.499705" }

Susil Behera added a comment - 2024-05-02 14:37

Here is the fourth problem.
4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster.

Repro
------
Initiate a write operation which takes >10sec>
MariaDB [test]> insert into mcst2 select * from mcst2;
Query OK, 18048 rows affected (14.843 sec)
Records: 18048 Duplicates: 0 Warnings: 0

MariaDB [test]> select now(); insert into mcst2 select * from mcst2; select now();
---------------------

now()

---------------------

2024-05-02 14:25:39

---------------------
1 row in set (0.001 sec)

ERROR 1815 (HY000): Internal error: MCS-2004: Cannot connect to ExeMgr.
---------------------

now()

---------------------

2024-05-02 14:25:55

---------------------
1 row in set (0.000 sec)

When the above write operation is on initiate cluster stop>

date; mcs cluster stop -i -t 5; date;
Thu May 2 14:25:42 UTC 2024 { "timestamp": "2024-05-02 14:25:43.164337" }
Thu May 2 14:25:52 UTC 2024

See 'cluster stop' didn't prompt any warning message.

Susil Behera added a comment - 2024-05-02 14:37 Here is the fourth problem. 4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster. Repro ------ Initiate a write operation which takes >10sec> MariaDB [test] > insert into mcst2 select * from mcst2; Query OK, 18048 rows affected (14.843 sec) Records: 18048 Duplicates: 0 Warnings: 0 MariaDB [test] > select now(); insert into mcst2 select * from mcst2; select now(); --------------------- now() --------------------- 2024-05-02 14:25:39 --------------------- 1 row in set (0.001 sec) ERROR 1815 (HY000): Internal error: MCS-2004: Cannot connect to ExeMgr. --------------------- now() --------------------- 2024-05-02 14:25:55 --------------------- 1 row in set (0.000 sec) When the above write operation is on initiate cluster stop> date; mcs cluster stop -i -t 5; date; Thu May 2 14:25:42 UTC 2024 { "timestamp": "2024-05-02 14:25:43.164337" } Thu May 2 14:25:52 UTC 2024 See 'cluster stop' didn't prompt any warning message.

Alan Mologorsky added a comment - 2024-06-05 05:01

susil.behera

1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout.

The following result is on a fresh cluster where there is no active writes going on,

In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released).

2. In interactive mode if I choose no, it continues asking the same again again.

Ideally it should exit after first No.

It's expected logic. If you prefer to change it, let's discuss with drrtuy and leonid.fedorov

3. --timeout option should not be accepted in the non-interactive mode (default)

It changes nothing right now. But should be working in nearest future.

4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster.

drrtuy It's the same I've been reported to you long time ago. About not working gracefull stop of DMLProc. We should think how to fix it asap.

Alan Mologorsky added a comment - 2024-06-05 05:01 susil.behera 1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout. The following result is on a fresh cluster where there is no active writes going on, In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released). 2. In interactive mode if I choose no, it continues asking the same again again. Ideally it should exit after first No. It's expected logic. If you prefer to change it, let's discuss with drrtuy and leonid.fedorov 3. --timeout option should not be accepted in the non-interactive mode (default) It changes nothing right now. But should be working in nearest future. 4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster. drrtuy It's the same I've been reported to you long time ago. About not working gracefull stop of DMLProc. We should think how to fix it asap.

Susil Behera added a comment - 2024-06-07 08:30

1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout.

In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released).

alan.mologorsky so, are we going to make the interactive mode behave the same way as in non-interactive mode? I mean in the upcoming release.

Susil Behera added a comment - 2024-06-07 08:30 1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout. In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released). alan.mologorsky so, are we going to make the interactive mode behave the same way as in non-interactive mode? I mean in the upcoming release.

Susil Behera added a comment - 2024-06-21 06:36

Closing this issue.

Susil Behera added a comment - 2024-06-21 06:36 Closing this issue.

MariaDB ColumnStore

Interactive "mcs cluster stop" command for CMAPI

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration