[MDEV-10789] Locking problems - queries hang in Waiting for global read lock, lock_wait_timeout has no effect Created: 2016-09-11  Updated: 2020-10-20  Resolved: 2020-10-20

Status: Closed
Project: MariaDB Server
Component/s: Locking
Affects Version/s: 10.2
Fix Version/s: 10.2.7

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mysqld.1.err     File mysqld.log     File t5.test     HTML File threads1     HTML File threads2    

 Description   

For now the problem is too obscure to formulate it more specifically.
I'm running the attached ugly test t5.test on 10.2 commit 9810d5ea132c028ac133ef19683d20b5bf005cab with one change (otherwise the test doesn't work):

diff --git a/client/mysqltest.cc b/client/mysqltest.cc
index b802642..922efa4 100644
--- a/client/mysqltest.cc
+++ b/client/mysqltest.cc
@@ -80,7 +80,7 @@ static my_bool non_blocking_api_enabled= 0;
 #define MAX_COLUMNS            256
 #define MAX_EMBEDDED_SERVER_ARGS 64
 #define MAX_DELIMITER_LENGTH 16
-#define DEFAULT_MAX_CONN        64
+#define DEFAULT_MAX_CONN        512
 
 #define DIE_BUFF_SIZE           8192

Here is how I run it:

perl ./mtr bug.t5 --mysqld=--lower-case-table-names=1 --mysqld=--partition --mysqld=--lock-wait-timeout=5 --mysqld=--innodb-lock-wait-timeout=3 --testcase-timeout=300 --mysqld=--plugin-load-add=metadata_lock_info

metadata_lock_info is not important for getting into trouble, it just helps to see what's happening when we get there.

At some point, the test stalls. Here is what I see in the processlist:

+-----+-----------------+-----------------+------+---------+------+------------------------------+---------------------------------------------------------------------------------------+----------+
| Id  | User            | Host            | db   | Command | Time | State                        | Info                                                                                  | Progress |
+-----+-----------------+-----------------+------+---------+------+------------------------------+---------------------------------------------------------------------------------------+----------+
|   3 | root            | localhost       | test | Sleep   | 2074 |                              | NULL                                                                                  |    0.000 |
|   4 | root            | localhost:54429 | test | Sleep   | 2074 |                              | NULL                                                                                  |    0.000 |
| 128 | event_scheduler | localhost       | NULL | Daemon  | 1853 | Waiting on empty queue       | NULL                                                                                  |    0.000 |
| 153 | root            | localhost:55322 | test | Query   | 1709 | Waiting for global read lock | DROP TABLE IF EXISTS transforms.where_updatedelete_2940 /* QUERY_NO 7392 CON_ID 11 */ |    0.000 |
| 154 | root            | localhost:55340 | test | Query   | 1720 | Waiting for global read lock | DROP TABLE IF EXISTS transforms.where_updatedelete_4388 /* QUERY_NO 4813 CON_ID 10 */ |    0.000 |
| 161 | root            | localhost:55353 | test | Sleep   | 1715 |                              | NULL                                                                                  |    0.000 |
| 167 | root            | localhost:55405 | test | Sleep   | 1715 |                              | NULL                                                                                  |    0.000 |
| 174 | root            | localhost:55480 | test | Sleep   | 1715 |                              | NULL                                                                                  |    0.000 |
| 187 | root            | localhost:59471 | NULL | Query   |    0 | init                         | show full processlist                                                                 |    0.000 |
+-----+-----------------+-----------------+------+---------+------+------------------------------+---------------------------------------------------------------------------------------+----------+

Note that lock_wait_timeout is set to 5 seconds, but it doesn't seem to help. No connection at no point changed this value.

And here is what I see in metadata_lock_info:

+-----------+---------------------+---------------+---------------------+--------------+------------+
| THREAD_ID | LOCK_MODE           | LOCK_DURATION | LOCK_TYPE           | TABLE_SCHEMA | TABLE_NAME |
+-----------+---------------------+---------------+---------------------+--------------+------------+
|       174 | MDL_SHARED          | NULL          | Global read lock    |              |            |
|       174 | MDL_SHARED          | NULL          | Commit lock         |              |            |
|       153 | MDL_SHARED_READ     | NULL          | Table metadata lock | testdb_n     | t1_base3_n |
|       153 | MDL_SHARED_NO_WRITE | NULL          | User lock           | 4            |            |
+-----------+---------------------+---------------+---------------------+--------------+------------+

Note there is no schema with the name 4:

$ ml --port=16000 -e "show databases"
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mtr                |
| mysql              |
| performance_schema |
| test               |
| testdb_n           |
| transforms         |
+--------------------+

Last query that thread 174 executed was FLUSH TABLES WITH READ LOCK, so I suppose it's okay for it to hold locks.

mysqld процесс doesn't consume any significant CPU.

Error log, general log and two consequent stack traces (with ~1 min interval) are attached.

It's quite possible that the test does something wrong, but it doesn't explain why lock_wait_timeout doesn't work, or what database 4 means.



 Comments   
Comment by Elena Stepanova [ 2020-10-20 ]

The problem is perfectly reproducible up to 10.2.6, but disappeared in 10.2.7, so I consider it fixed. I didn't look for the exact commit, as it's not easy to automate due to the nature of the failure.

Generated at Thu Feb 08 07:44:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.