[MDEV-29456] Spider hangs in 'Waiting for table metadata lock' state on CREATE TABLE after LOCK TABLES Created: 2022-09-03 Updated: 2023-11-22 Resolved: 2023-11-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking, Storage Engine - Spider |
| Affects Version/s: | 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.1 |
| Fix Version/s: | 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Roel Van de Paar | Assignee: | Yuchen Pei |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hang, locking, thread_hang | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
This is a serious locking bug where Spider fails to see that table t was already created previously (and not locked?).
Leads to:
|
| Comments |
| Comment by Roel Van de Paar [ 2022-09-03 ] | ||||||||||||||||||||||||||||||||||||||
|
Removing only table t1 references (which should make no difference) leads to this odd result (it seems that instead the server should report that table t already exists):
| ||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-05 ] | ||||||||||||||||||||||||||||||||||||||
|
The table t is locked by LOCK TABLES t2 WRITE because it is referenced by t2 and the lock is kept even the failing LOCK TABLES t1 READ is executed. The table metadata lock is acquired before the table existence check, and thus, it is natural for CREATE TABLE t (c INT) ENGINE=Spider to result in 'Waiting for table metadata lock.' One thing to be noted here is that all the table locks on the root user connection are released by LOCK TABLES t1 READ, but the ones on the Spider user connection are not released. The table locks on the Spider user connection might be better to be released by the failing LOCK TABLES t1 READ too, but I think this wouldn't be a serious bug. | ||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-09-06 ] | ||||||||||||||||||||||||||||||||||||||
|
nayuta-yanagisawa Thank you for the analysis. I would like to understand better and have three questions. The last question also has three parts. 1. If I create a similar setup in InnoDB;
And, perhaps, less relevant, but interesting;
An ERROR 1050 is returned when attempting to create a table which already exists - should Spider behave in the same way? 2. In the first comment below the description, removing the t1 references should not have made any difference? 3. In the same comment, it is shown that the LOCK TABLES t2 WRITE does not lock 'c' correctly based on the ERROR 1100? Also, why did the server not report that the table existed? Finally, why does it show the 1100 ERROR? Thank you! | ||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||
|
The point is that there are two connections. One is the connection by the MariaDB client. I call this "the root user connection". The other connection is by Spider. I call this "the Spider user connection" because it uses the user named Spider in the issue description.
| ||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||
|
nayuta-yanagisawa Thank you for the input. So if I understand you correctly, for items 1+3, you are saying that LOCK TABLES t2 WRITE; - executed by the root at the CLI - will trigger the Spider engine to automatically lock table t - in a background (and non-user) thread, which uses the Spider UserID, and is done by the server/the Spider SE - because t is being referenced/linked in the t2 table definition? For item 2, "t1 references" means all SQL which refers to t1, i.e. the test as given in this comment. I was unaware that the write lock on t2 was supposed to be released by the read lock on t1. | ||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||
|
Please see | ||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||
|
Please note that SET GLOBAL spider_same_server_link=ON; is not used here. | ||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||
Mostly yes, but there is no background thread. The thread handling the user CLI connection also handles the Spider (background) connection. | ||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-15 ] | ||||||||||||||||||||||||||||||||||||||
|
I believe that the following test case found in the investigation for
| ||||||||||||||||||||||||||||||||||||||
| Comment by Yuchen Pei [ 2023-05-16 ] | ||||||||||||||||||||||||||||||||||||||
|
I can reprod the hang at 11.1 (4e5b771e980edfdad5c5414aa62c81d409d585a4) when running the test case in the previous comment. | ||||||||||||||||||||||||||||||||||||||
| Comment by Yuchen Pei [ 2023-11-17 ] | ||||||||||||||||||||||||||||||||||||||
|
fixed by |