Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4939

Add a method to disable failover facility in CMAPI.

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 6.1.1
    • cmapi-6.4.1
    • cmapi
    • None
    • 2021-15, 2021-16, 2021-17

    Description

      There are known customer installations that don't use shared storage so failover mechanism might break such clusters.
      There must be a knob in cmapi configuration file to disable failover facility if needed.

      New

      Changes has been made:

      • add application section with auto_failover = False parameter to default cmapi_server.conf
      • failover now is turned off by default even if there are no "application" section or no auto_failover parameter exist in cmapi_server.conf
      • failover has now three different logical states:
        • turned off - no failover thread started. To turn it on set auto_failover=True in application section of cmapi_server.conf file of each node and restart cmapi.
        • turned on and inactive - there are failover thread but it doesn't work. It becomes active automatically if nodes count >= 3
        • turned on and active - there are an active failover thread and it is activated. Can be deactivated automatically if nodes_count < 3

      Attachments

        Issue Links

          Activity

            alexey.vorovich alexey vorovich (Inactive) added a comment - - edited

            Guys,

            1. I tend to agree that default file section created at new install should be empty
            2. Let's go back to Test4 failed in https://jira.mariadb.org/browse/MCOL-4939?focusedCommentId=219832&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-219832

            I am trying to repro myself, but inconclusive so far..

            dleeyh and toddstoffel

            Besides the discrepancy between MCS and MXS in respect to master node choice , what issues with DDL/DML update do we observe ?
            Please list what has been found. My understanding is that Maxscale will direct updates to PM2.

            Also Daniel , for whatever symptoms we see , please confirm in which old release we did not see them

            alan.mologorsky drrtuy gdorman FYI

            alexey.vorovich alexey vorovich (Inactive) added a comment - - edited Guys, 1. I tend to agree that default file section created at new install should be empty 2. Let's go back to Test4 failed in https://jira.mariadb.org/browse/MCOL-4939?focusedCommentId=219832&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-219832 I am trying to repro myself, but inconclusive so far.. dleeyh and toddstoffel Besides the discrepancy between MCS and MXS in respect to master node choice , what issues with DDL/DML update do we observe ? Please list what has been found. My understanding is that Maxscale will direct updates to PM2. Also Daniel , for whatever symptoms we see , please confirm in which old release we did not see them alan.mologorsky drrtuy gdorman FYI

            alan.mologorsky dleeyh I opened a new https://jira.mariadb.org/browse/MCOL-5052 for that mismatch discussion

            The only remaining item here is for Alan and is described above

            alexey.vorovich alexey vorovich (Inactive) added a comment - alan.mologorsky dleeyh I opened a new https://jira.mariadb.org/browse/MCOL-5052 for that mismatch discussion The only remaining item here is for Alan and is described above

            Build tested: 6.3.1-1 (#4234), CMAPI-1.6.3-1 (#623)

            Preliminary test results for failover behavior. More functional tests will be done.

            3-node cluster, with gluster, schema replication, MaxScale

            For each of the follow tests, a newly installed 3-node cluster is used

            Test #1
            Default installation, auto_failover parameter has been removed from /etc/columnstore/cmapi_server.conf , default behavior is auto failover enabled.

            Failover now works the same way as it used to be. When putting PM1 back online, PM2 remained as the master node, in sync with MaxScale.

            Test #2
            On each node, added the following to /etc/columnstore/cmapi_server.conf

            [application]
            auto_failover = False
            

            and restarted cmapi

            systemctl restart mariadb-columnstore-cmapi
            

            mcsStatus on all three (3) nodes showed there is only one (1) node (pm1) in the cluster. pm2 and pm3 are no longer part of the cluster. Output is like the following:

            [rocky8:root~]# mcsStatus
            {
              "timestamp": "2022-04-14 00:43:02.932548",
              "s1pm1": {
                "timestamp": "2022-04-14 00:43:02.938951",
                "uptime": 1149,
                "dbrm_mode": "master",
                "cluster_mode": "readwrite",
                "dbroots": [],
                "module_id": 1,
                "services": [
                  {
                    "name": "workernode",
                    "pid": 9290
                  },
                  {
                    "name": "controllernode",
                    "pid": 9301
                  },
                  {
                    "name": "PrimProc",
                    "pid": 9317
                  },
                  {
                    "name": "ExeMgr",
                    "pid": 9365
                  },
                  {
                    "name": "WriteEngine",
                    "pid": 9382
                  },
                  {
                    "name": "DDLProc",
                    "pid": 9413
                  }
                ]
              },
              "num_nodes": 1
            }
            

            I tried the same test again and all nodes returned somethig like the following

            [rocky8:root~]# mcsStatus
            {
              "timestamp": "2022-04-14 01:46:02.956786",
              "s1pm1": {
                "timestamp": "2022-04-14 01:46:02.963366",
                "uptime": 1631,
                "dbrm_mode": "offline",
                "cluster_mode": "readonly",
                "dbroots": [],
                "module_id": 1,
                "services": []
              },
              "num_nodes": 1
            }
            

            Failover was not tested since there is only one node in the cluster now.

            Test #3
            On each node, added the following to /etc/columnstore/cmapi_server.conf

            [application]
            auto_failover = True
            

            and restarted cmapi

            systemctl restart mariadb-columnstore-cmapi
            

            I got the same result as Test #1 above

            dleeyh Daniel Lee (Inactive) added a comment - Build tested: 6.3.1-1 (#4234), CMAPI-1.6.3-1 (#623) Preliminary test results for failover behavior. More functional tests will be done. 3-node cluster, with gluster, schema replication, MaxScale For each of the follow tests, a newly installed 3-node cluster is used Test #1 Default installation, auto_failover parameter has been removed from /etc/columnstore/cmapi_server.conf , default behavior is auto failover enabled. Failover now works the same way as it used to be. When putting PM1 back online, PM2 remained as the master node, in sync with MaxScale. Test #2 On each node, added the following to /etc/columnstore/cmapi_server.conf [application] auto_failover = False and restarted cmapi systemctl restart mariadb-columnstore-cmapi mcsStatus on all three (3) nodes showed there is only one (1) node (pm1) in the cluster. pm2 and pm3 are no longer part of the cluster. Output is like the following: [rocky8:root~]# mcsStatus { "timestamp": "2022-04-14 00:43:02.932548", "s1pm1": { "timestamp": "2022-04-14 00:43:02.938951", "uptime": 1149, "dbrm_mode": "master", "cluster_mode": "readwrite", "dbroots": [], "module_id": 1, "services": [ { "name": "workernode", "pid": 9290 }, { "name": "controllernode", "pid": 9301 }, { "name": "PrimProc", "pid": 9317 }, { "name": "ExeMgr", "pid": 9365 }, { "name": "WriteEngine", "pid": 9382 }, { "name": "DDLProc", "pid": 9413 } ] }, "num_nodes": 1 } I tried the same test again and all nodes returned somethig like the following [rocky8:root~]# mcsStatus { "timestamp": "2022-04-14 01:46:02.956786", "s1pm1": { "timestamp": "2022-04-14 01:46:02.963366", "uptime": 1631, "dbrm_mode": "offline", "cluster_mode": "readonly", "dbroots": [], "module_id": 1, "services": [] }, "num_nodes": 1 } Failover was not tested since there is only one node in the cluster now. Test #3 On each node, added the following to /etc/columnstore/cmapi_server.conf [application] auto_failover = True and restarted cmapi systemctl restart mariadb-columnstore-cmapi I got the same result as Test #1 above

            Build verified: ColumnStore 6.3.1-1 (#4278), cmapi (#625)

            Following the steps above and using the new cmapi build, test #2 worked as expected, failover did not take place, as it is disabled in the cmapi-server.cnf file.

            dleeyh Daniel Lee (Inactive) added a comment - Build verified: ColumnStore 6.3.1-1 (#4278), cmapi (#625) Following the steps above and using the new cmapi build, test #2 worked as expected, failover did not take place, as it is disabled in the cmapi-server.cnf file.
            dleeyh Daniel Lee (Inactive) added a comment - - edited

            Build verified: ColumnStore 6.3.1-1 (#4299), cmapi 1.6.3 (#626)

            cmapi package name has been corrected: MariaDB-columnstore-cmapi-1.6.3-1.x86_6.rpm. from 1.6.2 to 1.6.3

            Verified along with the latest build of ColumnStore. Created a 3-node docker cluster.

            dleeyh Daniel Lee (Inactive) added a comment - - edited Build verified: ColumnStore 6.3.1-1 (#4299), cmapi 1.6.3 (#626) cmapi package name has been corrected: MariaDB-columnstore-cmapi-1.6.3-1.x86_6.rpm. from 1.6.2 to 1.6.3 Verified along with the latest build of ColumnStore. Created a 3-node docker cluster.

            People

              dleeyh Daniel Lee (Inactive)
              drrtuy Roman
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.