Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2(EOL)
-
None
Description
For now the problem is too obscure to formulate it more specifically.
I'm running the attached ugly test t5.test on 10.2 commit 9810d5ea132c028ac133ef19683d20b5bf005cab with one change (otherwise the test doesn't work):
diff --git a/client/mysqltest.cc b/client/mysqltest.cc
|
index b802642..922efa4 100644
|
--- a/client/mysqltest.cc
|
+++ b/client/mysqltest.cc
|
@@ -80,7 +80,7 @@ static my_bool non_blocking_api_enabled= 0;
|
#define MAX_COLUMNS 256
|
#define MAX_EMBEDDED_SERVER_ARGS 64
|
#define MAX_DELIMITER_LENGTH 16
|
-#define DEFAULT_MAX_CONN 64
|
+#define DEFAULT_MAX_CONN 512
|
|
#define DIE_BUFF_SIZE 8192
|
Here is how I run it:
perl ./mtr bug.t5 --mysqld=--lower-case-table-names=1 --mysqld=--partition --mysqld=--lock-wait-timeout=5 --mysqld=--innodb-lock-wait-timeout=3 --testcase-timeout=300 --mysqld=--plugin-load-add=metadata_lock_info
|
metadata_lock_info is not important for getting into trouble, it just helps to see what's happening when we get there.
At some point, the test stalls. Here is what I see in the processlist:
+-----+-----------------+-----------------+------+---------+------+------------------------------+---------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-----+-----------------+-----------------+------+---------+------+------------------------------+---------------------------------------------------------------------------------------+----------+
|
| 3 | root | localhost | test | Sleep | 2074 | | NULL | 0.000 |
|
| 4 | root | localhost:54429 | test | Sleep | 2074 | | NULL | 0.000 |
|
| 128 | event_scheduler | localhost | NULL | Daemon | 1853 | Waiting on empty queue | NULL | 0.000 |
|
| 153 | root | localhost:55322 | test | Query | 1709 | Waiting for global read lock | DROP TABLE IF EXISTS transforms.where_updatedelete_2940 /* QUERY_NO 7392 CON_ID 11 */ | 0.000 |
|
| 154 | root | localhost:55340 | test | Query | 1720 | Waiting for global read lock | DROP TABLE IF EXISTS transforms.where_updatedelete_4388 /* QUERY_NO 4813 CON_ID 10 */ | 0.000 |
|
| 161 | root | localhost:55353 | test | Sleep | 1715 | | NULL | 0.000 |
|
| 167 | root | localhost:55405 | test | Sleep | 1715 | | NULL | 0.000 |
|
| 174 | root | localhost:55480 | test | Sleep | 1715 | | NULL | 0.000 |
|
| 187 | root | localhost:59471 | NULL | Query | 0 | init | show full processlist | 0.000 |
|
+-----+-----------------+-----------------+------+---------+------+------------------------------+---------------------------------------------------------------------------------------+----------+
|
Note that lock_wait_timeout is set to 5 seconds, but it doesn't seem to help. No connection at no point changed this value.
And here is what I see in metadata_lock_info:
+-----------+---------------------+---------------+---------------------+--------------+------------+
|
| THREAD_ID | LOCK_MODE | LOCK_DURATION | LOCK_TYPE | TABLE_SCHEMA | TABLE_NAME |
|
+-----------+---------------------+---------------+---------------------+--------------+------------+
|
| 174 | MDL_SHARED | NULL | Global read lock | | |
|
| 174 | MDL_SHARED | NULL | Commit lock | | |
|
| 153 | MDL_SHARED_READ | NULL | Table metadata lock | testdb_n | t1_base3_n |
|
| 153 | MDL_SHARED_NO_WRITE | NULL | User lock | 4 | |
|
+-----------+---------------------+---------------+---------------------+--------------+------------+
|
Note there is no schema with the name 4:
$ ml --port=16000 -e "show databases"
|
+--------------------+
|
| Database |
|
+--------------------+
|
| information_schema |
|
| mtr |
|
| mysql |
|
| performance_schema |
|
| test |
|
| testdb_n |
|
| transforms |
|
+--------------------+
|
Last query that thread 174 executed was FLUSH TABLES WITH READ LOCK, so I suppose it's okay for it to hold locks.
mysqld процесс doesn't consume any significant CPU.
Error log, general log and two consequent stack traces (with ~1 min interval) are attached.
It's quite possible that the test does something wrong, but it doesn't explain why lock_wait_timeout doesn't work, or what database 4 means.