Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5352

Truncate table failed after PrimProc restarted

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 22.08.4, 23.02.2
    • 23.10.2
    • DMLProc
    • None
    • 2023-12, 2024-1

    Description

      Build tested: 22.08.4, as well as the latest in develop

      engine: 15f65eff157f8fce48c0dfb30548dc787b259eb2
      server: d3049350bb5c61340f5a7518b155d3c9dacdcb33
      buildNo: 6257

      The TRUNCATE command fails after PrimProc is restarted on single-node setup,
      or on the primary node of a multi-node cluster. If PrimProc was restarted on slave node, TRUNCATE would still succeed.

      MariaDB [mytest]> truncate lineitem;
      ERROR 1815 (HY000): Internal error: CAL0009: Truncate table failed:  MCS-2045: At least one PrimProc closed the connection unexpectedly.  
      

      Repeating the TRUNCATE command would continue to return error, unless a create table command has been processed.

      MariaDB [mytest]> truncate lineitem;
      ERROR 1815 (HY000): Internal error: CAL0009: Truncate table failed:  MCS-2045: At least one PrimProc closed the connection unexpectedly.  
      MariaDB [mytest]> create table t1 (c1 int) engine=columnstore;
      Query OK, 0 rows affected (0.085 sec)
       
      MariaDB [mytest]> truncate lineitem;
      Query OK, 0 rows affected (0.089 sec)
      

      Attachments

        Issue Links

          Activity

            Build tested:

            23.02.3
            develop branch
            engine: a90535e1a7ffefa0e5ae808fdd0d38d30cffc017
            server: 805750b3a90ed4aecbf475025e63674aaab7f7f7
            buildNo: 7829

            systemctl restart mcs-primproc

            [rocky8:root@rocky8~]# mariadb mytest
            Reading table information for completion of table and column names
            You can turn off this feature to get a quicker startup with -A

            Welcome to the MariaDB monitor. Commands end with ; or \g.
            Your MariaDB connection id is 22
            Server version: 10.6.13-8-MariaDB MariaDB Server

            Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

            MariaDB [mytest]> truncate lineitem;
            ERROR 1815 (HY000): Internal error: CAL0009: Truncate table failed: MCS-2045: At least one PrimProc closed the connection unexpectedly.

            Truncate table after "mcs cluster restart" works fine.

            dleeyh Daniel Lee (Inactive) added a comment - Build tested: 23.02.3 develop branch engine: a90535e1a7ffefa0e5ae808fdd0d38d30cffc017 server: 805750b3a90ed4aecbf475025e63674aaab7f7f7 buildNo: 7829 systemctl restart mcs-primproc [rocky8:root@rocky8~] # mariadb mytest Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 22 Server version: 10.6.13-8-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [mytest] > truncate lineitem; ERROR 1815 (HY000): Internal error: CAL0009: Truncate table failed: MCS-2045: At least one PrimProc closed the connection unexpectedly. Truncate table after "mcs cluster restart" works fine.
            drrtuy Roman added a comment - - edited

            After a discussion with leonid.fedorov we came to a conclusion that the problem is caused by TCP socket in DML/DDLProc that stuck when one of PP in a cluster restarts.
            DML/DDLProc both establish a connection with PP. When PP goes away presumably the TCP sockets used by DML/DDLProc stick for some time around(they should go down though). There are two approaches to resolve the root cause:

            • investigate why the socket to PP doesn't go away and which state it is in TIME-WAIT or something else. We might add a socket state machine event listener.
            • if the socket waits for the remote to be closed there should be a way to notify all remote services that PP has been restarted. Controllernode is a good candidate to distribute this event. The other way to reliably distribute the info is via dKVS.
            drrtuy Roman added a comment - - edited After a discussion with leonid.fedorov we came to a conclusion that the problem is caused by TCP socket in DML/DDLProc that stuck when one of PP in a cluster restarts. DML/DDLProc both establish a connection with PP. When PP goes away presumably the TCP sockets used by DML/DDLProc stick for some time around(they should go down though). There are two approaches to resolve the root cause: investigate why the socket to PP doesn't go away and which state it is in TIME-WAIT or something else. We might add a socket state machine event listener. if the socket waits for the remote to be closed there should be a way to notify all remote services that PP has been restarted. Controllernode is a good candidate to distribute this event. The other way to reliably distribute the info is via dKVS.

            verified on develop 11 April 2024

            first truncate fails but second works

            kirill.perov@mariadb.com Kirill Perov (Inactive) added a comment - verified on develop 11 April 2024 first truncate fails but second works

            What is the exact fix version, 23.10.x? Also I do not see any related commits and I wonder why?

            valerii Valerii Kravchuk added a comment - What is the exact fix version, 23.10.x? Also I do not see any related commits and I wonder why?

            People

              denis0x0D Denis Khalikov (Inactive)
              dleeyh Daniel Lee (Inactive)
              Leonid Fedorov Leonid Fedorov
              Kirill Perov Kirill Perov (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.