Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3945

load_brm will hang on dbroot1 failover

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • None
    • 1.2.6, 1.4.4
    • ?
    • None

    Description

      saveBRM on failover runs before the dbroot is exchanged. this could lead to saveBRM being run before the brm_saves_journal file exists on the new primary module on a OAM parent failure and could lead to load_brm hanging.

      Reproduce by setting up multi-node glusterfs installation and perform large table import. After import completes kill PM1 and wait for PM2 to take over primary roll will see save_brm command run first then dbroot1 moved to PM2 and then load_brm called in logging.

      Fix is to first move dbroot1 then run saveBRM this should allow load_brm to run successfully.

      Attachments

        Activity

          Looks ok. This will need to get into develop, and develop-1.

          {2,4}

          also.

          pleblanc Patrick LeBlanc (Inactive) added a comment - Looks ok. This will need to get into develop, and develop-1. {2,4} also.
          ben.thompson Ben Thompson (Inactive) added a comment - - edited

          Part of this fix was reverted with other failover changes in MCOL-3842. This all was merged into 1.2.6 and 1.4.4 And will have been retested by MCOL-3842. Moving to test for 1.2.6 if necessary.

          ben.thompson Ben Thompson (Inactive) added a comment - - edited Part of this fix was reverted with other failover changes in MCOL-3842 . This all was merged into 1.2.6 and 1.4.4 And will have been retested by MCOL-3842 . Moving to test for 1.2.6 if necessary.

          This was all merged in 1.2.6 with MCOL-3842 - Restesting of that MCOL in 1.2 should be sufficient for closing this if already completed.

          ben.thompson Ben Thompson (Inactive) added a comment - This was all merged in 1.2.6 with MCOL-3842 - Restesting of that MCOL in 1.2 should be sufficient for closing this if already completed.

          Build tested: 1.4.4-1 (Jenkins 20200601)

          Failover (PM1 to PM2) after a 10g lineitem import worked fine.

          According to the debug.log on PM2, save_brm is still being executed first, then dbroot moved.

          dleeyh Daniel Lee (Inactive) added a comment - Build tested: 1.4.4-1 (Jenkins 20200601) Failover (PM1 to PM2) after a 10g lineitem import worked fine. According to the debug.log on PM2, save_brm is still being executed first, then dbroot moved.

          People

            ben.thompson Ben Thompson (Inactive)
            ben.thompson Ben Thompson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.