Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12647

Galera + LOCK TABLES deadlock

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • None
    • N/A
    • Galera
    • None
    • MariaDB 10.1.18 / Galera 25.3.17

    Description

      With the following sequence of commands, node2 always gets stuck and all writes to this node hang.

      Node1:

      mariadb[node1]> CREATE TABLE t1 (id INT PRIMARY KEY);
      Query OK, 0 rows affected (0.02 sec)
       
      mariadb[node1]> LOCK TABLES t1 WRITE;
      Query OK, 0 rows affected (0.00 sec)
      
      

      Node2:

      mariadb[node2]> LOCK TABLE t1 WRITE;
      Query OK, 0 rows affected (0.00 sec)
       
      mariadb[node2]> SELECT * FROM t1;
      Empty set (0.00 sec)
      

      Node1:

      mariadb[node1]> INSERT INTO t1 VALUES (1);
      Query OK, 1 row affected (0.01 sec)
      
      

      Node2:

      mariadb[node2]> INSERT INTO t1 VALUES (2);
      -- ^ never returns 
      
      

      Node1:

      mariadb[node1]> UNLOCK TABLES;
      Query OK, 0 rows affected (0.00 sec)
       
      mariadb[node1]> SELECT * FROM t1;
      +----+
      | id |
      +----+
      |  1 |
      |  2 |
      +----+
      2 rows in set (0.00 sec)
      

      At this point, any writes on node2 will hang and even after UNLOCK TABLES on node1, the INSERT on node2 remains in a hung state. The connection on node2 holding the table lock cannot be terminated through KILL commands. I've attached gdb "thread apply all bt" output, in case it is useful.

      I do see in my error log on node2 that an abort was attempted:

      [Note] WSREP: MDL conflict db=foo table=t1 ticket=7 solved by abort 
      

      That behavior I do expect, but it did not seem to successfully unstick this particular case. Also worth mentioning is that if I continue to write to the other cluster nodes, wsrep_local_recv_queue rises on node2 - I was expecting flow control to kick in at some point given that this node state is reported as "Synced", and is using defaults (i.e. gcs.fc_limit=16).

      Also reproduced under MariaDB 10.1.22 (and tested w/ latest galera 25.3.20) , but the attached logs are from an older MariaDB 10.1.18 (galera 25.3.17) environment.

      Attachments

        Activity

          People

            Unassigned Unassigned
            andrew.garner Andrew Garner
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.