Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5702

Auto Resolution of DBRM Table Locks When Orphaned

Details

    • New Feature
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • None
    • 23.10
    • None
    • None
    • 2024-1

    Description

      cpimport creates table locks when inserting data into a columnstore table, however during a network failure in a cluster, or unforseen circumstance/hardware issue, its possible for the cpimport PID to be killed and gone, but the table lock still exist in viewtablelock.

      The current workaround is to restart columnstore, during which loadbrm/rollback resolution resolves and frees the locks but the desire is for once the hardware issue is resolved, for these locks to auto resolve themselves. This will increase durability of the product and require less manual intervention to get back online.

      Attachments

        Activity

          allen.herrera Allen Herrera created issue -
          allen.herrera Allen Herrera made changes -
          Field Original Value New Value
          Rank Ranked higher
          julien.fritsch Julien Fritsch made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          julien.fritsch Julien Fritsch made changes -
          Assignee Allen Herrera [ JIRAUSER48651 ]
          julien.fritsch Julien Fritsch made changes -
          Assignee Allen Herrera [ JIRAUSER48651 ] Leonid Fedorov [ JIRAUSER48443 ]
          leonid.fedorov Leonid Fedorov made changes -
          Sprint 2024-1 [ 755 ]
          leonid.fedorov Leonid Fedorov made changes -
          Fix Version/s 23.10.2 [ 29807 ]

          The discussion with drrtuy highlighted these details.

          • cpimport acquires several locks while importing the data, and it has different semantics.
          • Blindly removing all the locks cpimport left after being killed could result in the loss of the table, and the only recourse would be to restore metadata from the backup.
          • Some of the locks, and probably all of them if the table was empty and there is no risk of losing the data, could be released by a simple technique. We can replace cpimport with a bash script bearing the same name, rename the cpimport binary to dump all the lock names to a text file, and then invoke table lock cleaning after the cpimport operation.

          Thus, it seems possible to automate cleaning for some cases without requiring a significant investment of development time. However, achieving a complete resolution of the request may prove tricky.

          leonid.fedorov Leonid Fedorov added a comment - The discussion with drrtuy highlighted these details. cpimport acquires several locks while importing the data, and it has different semantics. Blindly removing all the locks cpimport left after being killed could result in the loss of the table, and the only recourse would be to restore metadata from the backup. Some of the locks, and probably all of them if the table was empty and there is no risk of losing the data, could be released by a simple technique. We can replace cpimport with a bash script bearing the same name, rename the cpimport binary to dump all the lock names to a text file, and then invoke table lock cleaning after the cpimport operation. Thus, it seems possible to automate cleaning for some cases without requiring a significant investment of development time. However, achieving a complete resolution of the request may prove tricky.
          julien.fritsch Julien Fritsch made changes -
          Labels triage
          mariadb-jira-automation Jira Automation (IT) made changes -
          Zendesk Related Tickets 201635
          Zendesk active tickets 201635
          leonid.fedorov Leonid Fedorov made changes -
          Fix Version/s 23.10.3 [ 29862 ]
          Fix Version/s 23.10.2 [ 29807 ]
          julien.fritsch Julien Fritsch made changes -
          Sprint 2024-1 [ 755 ] 2024-1, 2024-2 [ 755, 764 ]
          julien.fritsch Julien Fritsch made changes -
          Description cpimport creates table locks when inserting data into a columnstore table, however
          during a network failure in a cluster, or unforseen circumstance/hardware issue, its possible for the cpimport PID to be killed and gone, but the table lock still exist in viewtablelock.

          The current workaround is to restart columnstore, during which loadbrm/rollback resolution resolves and frees the locks but the desire is for once the hardware issue is resolved, for these locks to auto resolve themselves. This will increase durability of the product and require less manual intervention to get back online.
          cpimport creates table locks when inserting data into a columnstore table, however during a network failure in a cluster, or unforseen circumstance/hardware issue, its possible for the cpimport PID to be killed and gone, but the table lock still exist in viewtablelock.

          The current workaround is to restart columnstore, during which loadbrm/rollback resolution resolves and frees the locks but the desire is for once the hardware issue is resolved, for these locks to auto resolve themselves. This will increase durability of the product and require less manual intervention to get back online.
          julien.fritsch Julien Fritsch made changes -
          Sprint 2024-1, 2024-2 [ 755, 764 ] 2024-1 [ 755 ]
          julien.fritsch Julien Fritsch made changes -
          Labels triage
          leonid.fedorov Leonid Fedorov made changes -
          Fix Version/s 23.10 [ 28540 ]
          Fix Version/s 23.10.3 [ 29862 ]

          People

            leonid.fedorov Leonid Fedorov
            allen.herrera Allen Herrera
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.