Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22052

Server with wsrep enabled doesn't respect lock wait timeouts under FLUSH TABLE WITH READ LOCK

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
    • 10.4(EOL)
    • Locking, wsrep
    • None

    Description

      Note: might be related to MDEV-22051. The test case is similar, but the one in MDEV-22051 shows what happens if DDL is attempted in the thread holding the lock, while this one – what happens if DDL is attempted concurrently with the lock.

      --source include/galera_cluster.inc
      --source include/have_innodb.inc
       
      FLUSH TABLES WITH READ LOCK;
       
      --connect (con1,localhost,root,,)
      SET lock_wait_timeout= 1;
      --error ER_LOCK_WAIT_TIMEOUT
      CREATE TABLE t1 (a INT) ENGINE=InnoDB;
      

      The expected result is that after 1 second the CREATE attempt fails with a timeout which is set for the session.. Instead, the statement hangs seemingly forever.

      Reproducible on 10.1-10.5, debug and non-debug alike.
      Not reproducible if wsrep is disabled.

      Attachments

        Issue Links

          Activity

            Fixed on edc3899d9781e98b4328931884527913ebffb11f

            jplindst Jan Lindström (Inactive) added a comment - Fixed on edc3899d9781e98b4328931884527913ebffb11f
            elenst Elena Stepanova added a comment - - edited

            It still doesn't respect lock wait timeouts.

            Only now, if it's started with configuration like galera test suite provides (whatever it is), the DDL fails with ER_UNKNOWN_COM_ERROR ("Aborting TOI: Global Read-Lock (FTWRL) in place") right away, regardless the configured lock_wait_timeout; and if the server is stared as a single node with WSREP enabled, e.g. with

            --wsrep_on=ON --wsrep_cluster_address=gcomm:// --wsrep_provider=/home/elenst/galera/galera-4.so --innodb_autoinc_lock_mode=2 --innodb_doublewrite=1 --binlog-format=row
            

            startup options, then it hangs seemingly forever as it did before, also regardless the lock_wait_timeout.

            Also, here is another test case which still hangs even with the patch above, also with the standard MTR configuration under suite/galera:

            --source include/galera_cluster.inc
             
            CREATE TABLE t1 (a INT) ENGINE=InnoDB;
            LOCK TABLE t1 WRITE;
             
            --connect (con1,localhost,root,,test)
            SET lock_wait_timeout= 1;
            --error ER_LOCK_WAIT_TIMEOUT
            CREATE VIEW v1 AS SELECT * FROM t1;
            

            elenst Elena Stepanova added a comment - - edited It still doesn't respect lock wait timeouts. Only now, if it's started with configuration like galera test suite provides (whatever it is), the DDL fails with ER_UNKNOWN_COM_ERROR ("Aborting TOI: Global Read-Lock (FTWRL) in place") right away , regardless the configured lock_wait_timeout ; and if the server is stared as a single node with WSREP enabled, e.g. with --wsrep_on=ON --wsrep_cluster_address=gcomm:// --wsrep_provider=/home/elenst/galera/galera-4.so --innodb_autoinc_lock_mode=2 --innodb_doublewrite=1 --binlog-format=row startup options, then it hangs seemingly forever as it did before, also regardless the lock_wait_timeout . Also, here is another test case which still hangs even with the patch above, also with the standard MTR configuration under suite/galera : --source include/galera_cluster.inc   CREATE TABLE t1 (a INT ) ENGINE=InnoDB; LOCK TABLE t1 WRITE;   --connect (con1,localhost,root,,test) SET lock_wait_timeout= 1; --error ER_LOCK_WAIT_TIMEOUT CREATE VIEW v1 AS SELECT * FROM t1;

            elenst Does it really need to wait up to that lock_wait_timeout seconds in my opinion if we find out problematic usage it is correct to return right away, this is only way at least on brute force transactions as they can't wait.

            jplindst Jan Lindström (Inactive) added a comment - elenst Does it really need to wait up to that lock_wait_timeout seconds in my opinion if we find out problematic usage it is correct to return right away, this is only way at least on brute force transactions as they can't wait.
            elenst Elena Stepanova added a comment - - edited

            No, it doesn't have to wait if a problem is revealed right away in a consistent manner (and if it really is kind of a problem which should make Galera fail, I'll leave it to Galera experts to decide on that).
            But please read the whole comment. This FTWRL detection only covers a subset of the problem, the comment shows at least two cases where the check doesn't help, which is why it is not a proper fix.

            lock_wait_timeout has a wide-spread effect on server operation, if it isn't respected, there are probably numerous other cases where it would be visible, a specific hack around FTWRL won't fix the whole issue.

            elenst Elena Stepanova added a comment - - edited No, it doesn't have to wait if a problem is revealed right away in a consistent manner (and if it really is kind of a problem which should make Galera fail, I'll leave it to Galera experts to decide on that). But please read the whole comment. This FTWRL detection only covers a subset of the problem, the comment shows at least two cases where the check doesn't help, which is why it is not a proper fix. lock_wait_timeout has a wide-spread effect on server operation, if it isn't respected, there are probably numerous other cases where it would be visible, a specific hack around FTWRL won't fix the whole issue.

            People

              janlindstrom Jan Lindström
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.