Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5594

Interactive "mcs cluster stop" command for CMAPI

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 23.10.2
    • cmapi
    • 2023-11, 2023-12, 2024-1

    Description

      Right now `mcs cluster stop` command has a fixed 5 minute timeout waiting for transaction to finish/rollback, For better UX `mcs cluster stop` must interactively ask a user to initiate force shutdown after a minute of waiting for the cluster to stop. There must be parameters to the command to set the timeout and enable force shutdown w/o confirmation, e.g.

      ```
      mcs cluster shutdown
      ... after a minute
      There were data changing operations running on the cluster that are now rolling back. Do you want to initiate a force shutdown?
      !!! Force shutdown might affect the availability of tables used by the mentioned operations !!!

      Attachments

        Issue Links

          Activity

            susil.behera Susil Behera added a comment -

            Now the fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3150 has been merged into develop. So, I did a fresh testing. Here are some findings (potential bugs),

            1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout.
            The following result is on a fresh cluster where there is no active writes going on,

            [root@mcs1 /]# ps aux | grep DMLProc; date; mcs cluster stop -i -t 300; date;
            root 1532 0.1 0.7 456248 114668 ? Sl 13:57 0:00 /usr/bin/DMLProc
            root 1827 0.0 0.0 12128 1016 ? S+ 13:59 0:00 grep --color=auto DMLProc
            Wed Apr 24 13:59:32 UTC 2024

            { "timestamp": "2024-04-24 13:59:33.099015" }

            Wed Apr 24 14:04:37 UTC 2024

            2. In interactive mode if I choose no, it continues asking the same again again.
            Ideally it should exit after first No.

            1. mcs cluster stop -i -t 5
              DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
              DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
              DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
              DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N
              DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] y { "timestamp": "2024-04-24 13:21:36.284813" }

            3. --timeout option should not be accepted in the non-interactive mode (default)

            1. mcs cluster stop -t 5 { "timestamp": "2024-04-24 13:37:54.499705" }
            susil.behera Susil Behera added a comment - Now the fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3150 has been merged into develop. So, I did a fresh testing. Here are some findings (potential bugs), 1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout. The following result is on a fresh cluster where there is no active writes going on, [root@mcs1 /] # ps aux | grep DMLProc; date; mcs cluster stop -i -t 300; date; root 1532 0.1 0.7 456248 114668 ? Sl 13:57 0:00 /usr/bin/DMLProc root 1827 0.0 0.0 12128 1016 ? S+ 13:59 0:00 grep --color=auto DMLProc Wed Apr 24 13:59:32 UTC 2024 { "timestamp": "2024-04-24 13:59:33.099015" } Wed Apr 24 14:04:37 UTC 2024 2. In interactive mode if I choose no, it continues asking the same again again. Ideally it should exit after first No. mcs cluster stop -i -t 5 DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] N DMLProc is still running. Do you want to force stop? WARNING: Could cause data loss and/or broken cluster. [y/N] y { "timestamp": "2024-04-24 13:21:36.284813" } 3. --timeout option should not be accepted in the non-interactive mode (default) mcs cluster stop -t 5 { "timestamp": "2024-04-24 13:37:54.499705" }
            susil.behera Susil Behera added a comment -

            Here is the fourth problem.
            4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster.

            Repro
            ------
            Initiate a write operation which takes >10sec>
            MariaDB [test]> insert into mcst2 select * from mcst2;
            Query OK, 18048 rows affected (14.843 sec)
            Records: 18048 Duplicates: 0 Warnings: 0

            MariaDB [test]> select now(); insert into mcst2 select * from mcst2; select now();
            ---------------------

            now()

            ---------------------

            2024-05-02 14:25:39

            ---------------------
            1 row in set (0.001 sec)

            ERROR 1815 (HY000): Internal error: MCS-2004: Cannot connect to ExeMgr.
            ---------------------

            now()

            ---------------------

            2024-05-02 14:25:55

            ---------------------
            1 row in set (0.000 sec)

            When the above write operation is on initiate cluster stop>

            1. date; mcs cluster stop -i -t 5; date;
              Thu May 2 14:25:42 UTC 2024 { "timestamp": "2024-05-02 14:25:43.164337" }

              Thu May 2 14:25:52 UTC 2024

            See 'cluster stop' didn't prompt any warning message.

            susil.behera Susil Behera added a comment - Here is the fourth problem. 4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster. Repro ------ Initiate a write operation which takes >10sec> MariaDB [test] > insert into mcst2 select * from mcst2; Query OK, 18048 rows affected (14.843 sec) Records: 18048 Duplicates: 0 Warnings: 0 MariaDB [test] > select now(); insert into mcst2 select * from mcst2; select now(); --------------------- now() --------------------- 2024-05-02 14:25:39 --------------------- 1 row in set (0.001 sec) ERROR 1815 (HY000): Internal error: MCS-2004: Cannot connect to ExeMgr. --------------------- now() --------------------- 2024-05-02 14:25:55 --------------------- 1 row in set (0.000 sec) When the above write operation is on initiate cluster stop> date; mcs cluster stop -i -t 5; date; Thu May 2 14:25:42 UTC 2024 { "timestamp": "2024-05-02 14:25:43.164337" } Thu May 2 14:25:52 UTC 2024 See 'cluster stop' didn't prompt any warning message.

            susil.behera

            1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout.
            The following result is on a fresh cluster where there is no active writes going on,
            

            In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released).

            2. In interactive mode if I choose no, it continues asking the same again again.
            Ideally it should exit after first No.
            

            It's expected logic. If you prefer to change it, let's discuss with drrtuy and leonid.fedorov

            3. --timeout option should not be accepted in the non-interactive mode (default)
            

            It changes nothing right now. But should be working in nearest future.

            4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster.
            

            drrtuy It's the same I've been reported to you long time ago. About not working gracefull stop of DMLProc. We should think how to fix it asap.

            alan.mologorsky Alan Mologorsky added a comment - susil.behera 1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout. The following result is on a fresh cluster where there is no active writes going on, In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released). 2. In interactive mode if I choose no, it continues asking the same again again. Ideally it should exit after first No. It's expected logic. If you prefer to change it, let's discuss with drrtuy and leonid.fedorov 3. --timeout option should not be accepted in the non-interactive mode (default) It changes nothing right now. But should be working in nearest future. 4. When there is a write going on 'mcs cluster stop -i -t' didn't prompt any warning message and went ahead and stopped the cluster. drrtuy It's the same I've been reported to you long time ago. About not working gracefull stop of DMLProc. We should think how to fix it asap.
            susil.behera Susil Behera added a comment -

            1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout.

            In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released).

            alan.mologorsky so, are we going to make the interactive mode behave the same way as in non-interactive mode? I mean in the upcoming release.

            susil.behera Susil Behera added a comment - 1. mcs cluster stop -t, --timeout behaves more like a wait instead of timeout. In interactive mode it seems to be more like a wait but in non interactive mode it's technically timeout (yet not released). alan.mologorsky so, are we going to make the interactive mode behave the same way as in non-interactive mode? I mean in the upcoming release.
            susil.behera Susil Behera added a comment -

            Closing this issue.

            susil.behera Susil Behera added a comment - Closing this issue.

            People

              alan.mologorsky Alan Mologorsky
              drrtuy Roman
              Roman Roman
              Susil Behera Susil Behera
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.