[MDEV-25023] MariaDB 10.5 - DROP DATABASE locked Created: 2021-03-01  Updated: 2022-11-25  Resolved: 2022-11-25

Status: Closed
Project: MariaDB Server
Component/s: Query Cache
Affects Version/s: 10.5.9, 10.5.12, 10.6.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Fabien CLERC Assignee: Unassigned
Resolution: Duplicate Votes: 3
Labels: None
Environment:

CPU: 2
RAM: 8 Go
SWAP: 2 Go

[root]# uname -a
Linux test09.intra-know.com 3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[root]$ yum list installed | grep -i mariadb
MariaDB-client.x86_64 10.5.9-1.el7.centos @MariaDB
MariaDB-common.x86_64 10.5.9-1.el7.centos @MariaDB
MariaDB-compat.x86_64 10.5.9-1.el7.centos @MariaDB
MariaDB-server.x86_64 10.5.9-1.el7.centos @MariaDB
galera-4.x86_64 26.4.7-1.el7.centos @MariaDB

[root] cat /etc/my.cnf.d/ik.cnf
[client]
port = 3306
socket = /var/lib/mysql/mysql.sock

[mysqld]

#innodb_force_recovery=6
port = 3306
bind-address = 127.0.0.1
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
pid-file = /var/run/mariadb/mariadb.pid

sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

  1. Logging configuration.
    log-error = /var/log/mariadb/mariadb.log
  1. Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links = 0
  1. http://dev.mysql.com/doc/refman/5.5/en/performance-schema.html
    ;performance_schema
  1. Memory settings.
    max_allowed_packet = 128M
    table_open_cache = 256
    read_buffer_size = 1M
    thread_cache_size = 8
    query_cache_size = 16M
    max_connections = 300
    tmp_table_size = 768M
    max_heap_table_size = 768M
  1. InnoDB settings.
    innodb_buffer_pool_size = 4096M
    default-time-zone = Europe/Paris
    character-set-server = utf8
    ft_min_word_len = 3
    slow_query_log = 1
    slow_query_log_file = /var/log/mariadb/mariadb-slow.log
    long_query_time = 2
  2. log-bin = mysqld-binlog
    expire_logs_days = 7

[mysqldump]
quick
max_allowed_packet = 64M

[mysqld_safe]
pid-file = /var/run/mariadb/mariadb.pid


Attachments: PNG File Selection_800.png     File all.sql     File source_db_schema.sql    
Issue Links:
Duplicate
duplicates MDEV-29760 DROP DATABASE hangs when particular q... Closed

 Description   

Hello,
The application my firm is developping runs on MariaDB 5.5
We are working on migration to MariaDB 10.5
Copies of databases between tests servers is done via shell scripts:

  • mysqldump on source server
  • copy of dump file from source to final server
  • drop database on final server
  • create database on final server
  • import of dump file on final server

The workflow works since years on MariaDB 5.5 but we are facing issues with a part of servers in MariaDB 10.5.
In fact, there are some servers (In MariaDB 10.5) where the DROP DATABASE is locked for hours without any others queries running.

[root] mysqladmin processlist
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+------+-----------+-------------------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+----------+
| 113 | root | localhost | | Query | 380 | closing tables | drop database `XXX` | 0.000 |

When we kill the process, the workflow continues with CREATE DATABASE which is also locked for hours, until we kill it.

[root] mysqladmin processlist
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+------+-----------+-------------------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+----------+
| 27343 | root | localhost | NULL | Query | 77 | Waiting for schema metadata lock | create database `XXX` | 0.000 |

Once done, impossible to stop MariaDB (sysctemctl stop mariadb ) and we have to "kill -9" the process

We can reproduce (each time) the issue on several servers in MariaDB 10.5
There are others servers (In MariaDB 10.5 also) we never have the issue.

The servers are configured the same way (via Ansible)

We didn't identify differencies between the servers.
We thougth it was because some servers were configured with a XFS filesystem and others with EXT4 but we face the issue on EXT4 AND XFS and do not reproduce on EXT4.

For information, nearly all tables are in innodb.
Few ones are in MyIsam.

When we run the following command

show engine innodb status;

We did not identified any problems

=====================================
2021-03-01 23:18:08 0x7f5bcc0ec700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 54 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 273 srv_active, 0 srv_shutdown, 2344 srv_idle
srv_master_thread log flush and writes: 2617
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 8478
OS WAIT ARRAY INFO: signal count 3918
RW-shared spins 93, rounds 1674, OS waits 18
RW-excl spins 228, rounds 3549, OS waits 93
RW-sx spins 49, rounds 1278, OS waits 39
Spin rounds per wait: 18.00 RW-shared, 15.57 RW-excl, 26.08 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 24528239
Purge done for trx's n:o < 24528222 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421507228312104, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421507228307840, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421507228303576, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending normal aio reads:
Pending flushes (fsync) log: 0; buffer pool: 0
79872 OS file reads, 160773 OS file writes, 25006 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 3, seg size 5, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
0.00 hash searches/s, 460.49 non-hash searches/s
---
LOG
---
Log sequence number 93304660002
Log flushed up to   93304660002
Pages flushed up to 93299476812
Last checkpoint at  93299476812
0 pending log flushes, 0 pending chkp writes
11642 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 4328521728
Dictionary memory allocated 26658000
Buffer pool size   258080
Free buffers       74037
Database pages     184043
Old database pages 67917
Modified db pages  567
Percent of dirty pages(LRU & free pages): 0.220
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 124329, not young 22654
0.00 youngs/s, 0.00 non-youngs/s
Pages read 43330, created 143146, written 148307
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 184043, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 3734493, updated 367, deleted 122, read 29441894
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 15258.64 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

We tried activating performance_schema

performance_schema=ON
performance-schema-instrument='wait/lock/metadata/sql/mdl=ON'

but we didn't identified any locked request

Do you have any idea to solve the issue ?
Do you need more details ?

Thanks and Regards



 Comments   
Comment by Daniel Black [ 2021-03-02 ]

It doesn't look like a filesystem issue however I had trouble parsing `face the issue on EXT4 ... AND do not reproduce on EXT4`.

Can you attempt to capture a full backtrace at the point it is stalled?

If you have a table name in that backtrace can you include the `show create table tablename` for it. If sensitive just replace the field/index names.

Comment by Fabien CLERC [ 2021-03-02 ]

Hello.
Sorry for my bad description.
I wanted to explain that:

  • on servers "S1" and "S2", in XFS, I faced the issue
  • on a server "S3", in EXT4, I didn't face the issue
    So I was thinking the issue was due to XFS but finally, I reproduced the issue on a server "S4" in EXT4 so it seems the issue is not due to the filesystem

About full backtrace, I'm not comfortable with gdb but I run both commands.
Each result is empty.

[root] gdb --batch --eval-command="thread apply all bt" /usr/sbin/mysqld $(pgrep -xn mysqld)  > /tmp/mysqld_bt_all_threads.txt
 
[root] gdb --batch --eval-command="thread apply all bt full" /usr/sbin/mysqld $(pgrep -xn mysqld)  > /tmp/mysqld_full_bt_all_threads.txt
 
[root]$ ll /tmp/mysqld_bt_all_threads.txt
-rw-r--r-- 1 yie0008 yieloo 0  2 mars  09:16 /tmp/mysqld_bt_all_threads.txt
 
[root]$ ll /tmp/mysqld_full_bt_all_threads.txt
-rw-r--r-- 1 yie0008 yieloo 0  2 mars  09:17 /tmp/mysqld_full_bt_all_threads.txt

However, the DROP DATABASE still running

[yie0008@test09 ~]$ sudo mysqladmin processlist
+------+------+-----------+-------------------+---------+-------+------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id   | User | Host      | db                | Command | Time  | State                        | Info                                                                                                 | Progress |
+------+------+-----------+-------------------+---------+-------+------------------------------+------------------------------------------------------------------------------------------------------+----------+
| 113  | root | localhost |                   | Query   | 38114 | closing tables               | drop database `ZZZ_yieloo`                                                                     | 0.000    |
| 506  | root | localhost |                   | Query   | 33804 | Waiting for query cache lock | FLUSH /*!40101 LOCAL */ TABLES                                                                       | 0.000    |
| 3514 | root | localhost | ZZZ_default | Sleep   | 158   |                              |                                                                                                      | 0.000    |
| 3515 | root | localhost | ZZZ_default | Sleep   | 158   |                              |                                                                                                      | 0.000    |
| 3516 | root | localhost | ZZZ_default | Sleep   | 157   |                              |                                                                                                      | 0.000    |
| 3517 | root | localhost | ZZZ_default | Query   | 0     | Waiting for query cache lock | SELECT id_champ_multiple, q_nc_champs_multiple.*, q_nc_champs_multiple.nom_table as 'table' FROM q_n | 0.000    |
| 3520 | root | localhost | ZZZ_default | Sleep   | 48    |                              |                                                                                                      | 0.000    |
| 3521 | root | localhost | ZZZ_default | Sleep   | 48    |                              |                                                                                                      | 0.000    |
| 3522 | root | localhost | ZZZ_default | Sleep   | 47    |                              |                                                                                                      | 0.000    |
| 3523 | root | localhost | ZZZ_default | Query   | 0     | Waiting for query cache lock | SELECT      q_nc_champ_lien_type.id_type
            FROM        q_nc_champ_lien_type
            IN | 0.000    |
| 3526 | root | localhost |                   | Query   | 0     | starting                     | show processlist                                                                                     | 0.000    |
+------+------+-----------+-------------------+---------+-------+------------------------------+------------------------------------------------------------------------------------------------------+----------+

Comment by Fabien CLERC [ 2021-03-02 ]

To complete, when DROP DATABASE is blocked, the CPU is used at 100%

Comment by Fabien CLERC [ 2021-03-02 ]

Another information to complete:
When "DROP DATABASE" is locked, the directory /var/lib/mysql/DATABASE_NAME is empty

Comment by Jonny Wylie [ 2021-07-06 ]

Fabien did you ever get this to work.
It looks very similar to a problem I was having: https://jira.mariadb.org/browse/MDEV-23527

It still seems to fail in the latest version of MariaDB too, I found the problem was introduced after 10.5.2, it might fail in 10.5.3 but the dockerhub image for that version seems broken so I couldn't test it.

Comment by Fabien CLERC [ 2021-07-06 ]

Hello,
We didn't find solution and, for the moment, we decided to report the upgrade of MariaDB in a future release of or application.
However, we suspect the issue is due to the anti-virus.
We excluded the /var/lib/mysql directory of the scans of the antivirus but we didn't test our application since this modification.
Please confirm me if you have an anti-virus installed on your server and if excluding /var/lib/mysql solved your issue.

Regards.

Comment by Jonny Wylie [ 2021-07-06 ]

Fabien, no we don't have any anti virus software on the server, so that's definately not the cause for us. I'm sure it's something in MAriaDB, because 10.5.2 works fine, but 10.5.4 does not.
I found it is something to do with the query cache, disabling the query cache makes the problem go away.

Comment by Jonny Wylie [ 2021-07-07 ]

Fabien I found a work around, if you execute query `FLUSH TABLES` before dropping the database, then it seems to work properly.

Comment by Alex [ 2021-08-04 ]

Same error here, deadlock when dropping a small 8 MB database with 33 tables.
Also happening at 10.6, so please add 10.6 to affected versions.
Working without problems in 10.4.

Comment by Rémi Augier [ 2022-01-11 ]

Same issue here in 10.5.12 and 10.6.5

Comment by Neven Ivanov [ 2022-10-10 ]

I was able to reproduce the problem on 100% of the times with MariaDB 10.5/10.6 ( no matter of the subversion but tested mostly with 10.6.10 )

You will need the files all.sql and source_db_schema.sql which are attached to this task.

Then just execute from bash console:

mysql -e "create database source_db"
 
 
 
mysql source_db < source_db_schema.sql
 
mysql -e 'RESET QUERY CACHE';
 
mysql -e "create database destination_db" ;  mysqldump source_db | mysql destination_db
 
mysql 'source_db' -e "select option_name from VbVoptions;" ;
 
 
 
    mysql destination_db < all.sql; 
 
    mysql -e "drop database destination_db ;"
 
    echo "drop successfully"

source_db_schema.sql all.sql

In summary if specific query cache is present, you are unable to drop database and it just hangs (MariaDB uses 100% cpu) and no restart ( without "kill -9" can be performed).

Comment by Alex [ 2022-10-28 ]

I can reproduce the deadlock with Nevens example everytime also with the latest MariaDB 10.6.10.
It is only working if the Query-cache-reset is immediately executed before the drop.

As written already last year, please add 10.6 to affected versions!

Comment by Daniel Black [ 2022-11-25 ]

continuing in MDEV-29760

Generated at Thu Feb 08 09:34:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.