Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5279

MCS Cluster: After software update from 6.x to 22.08, queries hang on one node

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 22.08.2
    • 22.08.3
    • None
    • 2022-22

    Description

      "Where this join query hangs forever on the master node, but runs fine on the replicas"

      SELECT ff.acct_id, COUNT(1) 
      FROM ff JOIN dd  ON ff.acct_sk =dd.acct_sk 
      AND ff.db_source_sk =dd.db_source_sk 
      GROUP BY ff.acct_id LIMIT 10;
      

      mariadb -e "create database test; use test;"
       
      mariadb test -e "CREATE TABLE `dd`( `acct_sk` INT(11) UNSIGNED NOT NULL, `acct_id` VARCHAR(128) NOT NULL DEFAULT 'None', `db_source_sk` INT(11) UNSIGNED NOT NULL) ENGINE=Columnstore DEFAULT CHARSET=utf8mb4;"
      mariadb test -e "CREATE TABLE `ff` ( `db_source_sk` INT(11) UNSIGNED NOT NULL, `acct_id` VARCHAR(128) NOT NULL DEFAULT 'None', `acct_sk` INT(11) UNSIGNED NOT NULL DEFAULT 0 ) ENGINE=Columnstore DEFAULT CHARSET=utf8mb4;"
       
      mariadb test -e "INSERT INTO dd SELECT ROUND(RAND() * 10, 2), substring(MD5(RAND()),1,1), ROUND(RAND() * 100, 2) FROM seq_1_to_300000; "
      mariadb test -e "INSERT INTO ff SELECT ROUND(RAND() * 10, 2), substring(MD5(RAND()),1,1), ROUND(RAND() * 100, 2) FROM seq_1_to_300000;"
      
      

      Attachments

        Issue Links

          Activity

            dleeyh please try to repro this and in your mulinode tests , and then we add the test

            alexey.vorovich alexey vorovich (Inactive) added a comment - dleeyh please try to repro this and in your mulinode tests , and then we add the test

            Here's a test for some edge case

            # This test makes sense when run on a multi-node cluster.
            CREATE TABLE `dd`( `acct_sk` INT(11) UNSIGNED NOT NULL, `acct_id` VARCHAR(128) NOT NULL DEFAULT 'None', `db_source_sk` INT(11) UNSIGNED NOT NULL) ENGINE=Columnstore DEFAULT CHARSET=utf8mb4;
            CREATE TABLE `ff` ( `db_source_sk` INT(11) UNSIGNED NOT NULL, `acct_id` VARCHAR(128) NOT NULL DEFAULT 'None', `acct_sk` INT(11) UNSIGNED NOT NULL DEFAULT 0 ) ENGINE=Columnstore DEFAULT CHARSET=utf8mb4;
             
            INSERT INTO dd SELECT ROUND(RAND() * 10, 2), substring(MD5(RAND()),1,1), ROUND(RAND() * 100, 2) FROM seq_1_to_300000;
            INSERT INTO ff SELECT ROUND(RAND() * 10, 2), substring(MD5(RAND()),1,1), ROUND(RAND() * 100, 2) FROM seq_1_to_300000;
             
            # The purpose is to run the statement that doesn't hang.
            SELECT * FROM (SELECT ff.acct_id, COUNT(1) FROM ff JOIN dd  ON ff.acct_sk =dd.acct_sk AND ff.db_source_sk =dd.db_source_sk GROUP BY ff.acct_id LIMIT 10)s LIMIT 0;
             
            # Clean UP
            DROP DATABASE mcol_5279;
            

            David.Hall David Hall (Inactive) added a comment - Here's a test for some edge case # This test makes sense when run on a multi-node cluster. CREATE TABLE `dd`( `acct_sk` INT( 11 ) UNSIGNED NOT NULL, `acct_id` VARCHAR( 128 ) NOT NULL DEFAULT 'None' , `db_source_sk` INT( 11 ) UNSIGNED NOT NULL) ENGINE=Columnstore DEFAULT CHARSET=utf8mb4; CREATE TABLE `ff` ( `db_source_sk` INT( 11 ) UNSIGNED NOT NULL, `acct_id` VARCHAR( 128 ) NOT NULL DEFAULT 'None' , `acct_sk` INT( 11 ) UNSIGNED NOT NULL DEFAULT 0 ) ENGINE=Columnstore DEFAULT CHARSET=utf8mb4;   INSERT INTO dd SELECT ROUND(RAND() * 10 , 2 ), substring(MD5(RAND()), 1 , 1 ), ROUND(RAND() * 100 , 2 ) FROM seq_1_to_300000; INSERT INTO ff SELECT ROUND(RAND() * 10 , 2 ), substring(MD5(RAND()), 1 , 1 ), ROUND(RAND() * 100 , 2 ) FROM seq_1_to_300000;   # The purpose is to run the statement that doesn't hang. SELECT * FROM (SELECT ff.acct_id, COUNT( 1 ) FROM ff JOIN dd ON ff.acct_sk =dd.acct_sk AND ff.db_source_sk =dd.db_source_sk GROUP BY ff.acct_id LIMIT 10 )s LIMIT 0 ;   # Clean UP DROP DATABASE mcol_5279;

            Build verified: 22.08.3 (RC from Jenkins)

            Query no longer hangs. MTR tests were able to complete.

            dleeyh Daniel Lee (Inactive) added a comment - Build verified: 22.08.3 (RC from Jenkins) Query no longer hangs. MTR tests were able to complete.
            alexey.vorovich alexey vorovich (Inactive) added a comment - - edited

            additional scenario discovered and fixed

            sporadic hang (repoduced in VM or k8s - not in dockercompose
            In the sequence below SELECT to next host hangs. Sporadic

                    self.connect(self.HOST_0)
                    self.create_table()
                    self.ex_sql("insert into t1 (f1) values ( 1 )",silent=False)
                    self.connection.commit()
                    self.ex_sql("Select count(*)  from t1")
             
                    self.connect(self.HOST_1)
                    self.ex_sql("use d1")
                    self.ex_sql("select count(*)  from t1")
            

            alexey.vorovich alexey vorovich (Inactive) added a comment - - edited additional scenario discovered and fixed sporadic hang (repoduced in VM or k8s - not in dockercompose In the sequence below SELECT to next host hangs. Sporadic self.connect(self.HOST_0) self.create_table() self.ex_sql( "insert into t1 (f1) values ( 1 )" ,silent=False) self.connection.commit() self.ex_sql( "Select count(*) from t1" )   self.connect(self.HOST_1) self.ex_sql( "use d1" ) self.ex_sql( "select count(*) from t1" )

            People

              drrtuy Roman
              edward Edward Stoever
              Daniel Lee Daniel Lee (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.