[MDEV-8069] DROP or rebuild of a large table may lock up InnoDB Created: 2015-04-28  Updated: 2023-11-27  Resolved: 2020-06-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.0.16, 10.1.34, 10.2.27
Fix Version/s: 10.5.4

Type: Bug Priority: Critical
Reporter: Stoykov (Inactive) Assignee: Eugene Kosov (Inactive)
Resolution: Fixed Votes: 5
Labels: affects-tests, drop-table-optimiation, upstream, xtradb
Environment:

Red Hat Enterprise Linux


Issue Links:
Blocks
is blocked by MDEV-14585 Automatically remove #sql- tables in ... Closed
Relates
relates to MDEV-11655 Transactional data dictionary Open
relates to MDEV-18518 Implement atomic multi-table (or mult... Closed
relates to MDEV-18572 Thread executing DROP TABLE listed tw... Open
relates to MDEV-18613 Optimization for dropping table Closed
relates to MDEV-19552 DROP TABLE locks SHOW CREATE TABLE an... Closed
relates to MDEV-22456 Dropping the adaptive hash index may ... Closed
relates to MDEV-32786 Optimization of DROP TABLE for Galera Open
relates to MDEV-9459 Truncate table causes innodb stalls Closed
relates to MDEV-16796 TRUNCATE TABLE slowdown with innodb_f... Closed
Sprint: 10.3.1-1

 Description   

DROP DATABASE IF EXSITS executed on database with more that several thousand tables and over several TBs data in it is unreasonably slow.
A huge amount of writing to disk has been observed too.

Crash description:
executed:

SET unique_checks = 0; SET foreign_key_checks = 0; SET GLOBAL innodb_stats_on_metadata = 0; DROP DATABASE IF EXISTS HUGE_TABLE'

InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2015-04-27 12:21:26 7f07de3ff700  InnoDB: Assertion failure in thread 139671770232576 in file srv0srv.cc line 2196
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150427 12:21:26 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.0.16-MariaDB-log
key_buffer_size=53687091200
read_buffer_size=131072
max_used_connections=42
max_threads=402
thread_count=21
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 52900205 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb73d3b]
/usr/sbin/mysqld(handle_fatal_signal+0x398)[0x726518]
/lib64/libpthread.so.0(+0xf710)[0x7f5096c43710]
/lib64/libc.so.6(gsignal+0x35)[0x7f509529f625]
/lib64/libc.so.6(abort+0x175)[0x7f50952a0e05]
/usr/sbin/mysqld[0x936f9c]
/lib64/libpthread.so.0(+0x79d1)[0x7f5096c3b9d1]
/lib64/libc.so.6(clone+0x6d)[0x7f50953558fd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150427 12:48:54 mysqld_safe Number of processes running now: 0
150427 12:48:54 mysqld_safe mysqld restarted
150427 12:49:16 [Note] InnoDB: Using mutexes to ref count buffer pool pages
150427 12:49:16 [Note] InnoDB: The InnoDB memory heap is disabled
150427 12:49:16 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
150427 12:49:16 [Note] InnoDB: Memory barrier is not used
150427 12:49:16 [Note] InnoDB: Compressed tables use zlib 1.2.3
150427 12:49:16 [Note] InnoDB: Using Linux native AIO
150427 12:49:16 [Note] InnoDB: Using CPU crc32 instructions
150427 12:49:16 [Note] InnoDB: Initializing buffer pool, size = 225.0G
150427 12:49:27 [Note] InnoDB: Completed initialization of buffer pool
150427 12:49:29 [Note] InnoDB: Highest supported file format is Barracuda.
150427 12:49:29 [Note] InnoDB: Log scan progressed past the checkpoint lsn 277389716102041
150427 12:49:29 [Note] InnoDB: Database was not shutdown normally!
150427 12:49:29 [Note] InnoDB: Starting crash recovery.
150427 12:49:29 [Note] InnoDB: Reading tablespace information from the .ibd files...
(END)
 

Please do not ignore the Feature Request for optimizing the DROP DATABASE operation, even this ticket is marked as a bug.



 Comments   
Comment by Stoykov (Inactive) [ 2015-04-29 ]

short update:
When interrupted long running DROP TABLE on huge /400 GB table/ the MariaDB instance crashed.
On recovery only the .frm file was there.
This was fixed by ALTER TABLE xxx DISCARD TABLESPACE ; DROP TABLE xxx;
This is quite important issue too.

Comment by Jan Lindström (Inactive) [ 2015-05-05 ]

Lets consider the transactional way. If we have log record for DROP_DATABASE then it is enough to write that log record with database name to redo log. This is because if we crash on middle of dropping database at recovery we might need to redo the operation. If we do not have log record for DROP_DATABASE we at least have log record for DROP_TABLE. In this case we need to iterate all tables on database and for each table write DROP_TABLE log record (for similar crash recovery redo).

Comment by Jan Lindström (Inactive) [ 2015-05-05 ]

http://bugs.mysql.com/bug.php?id=51325
http://bugs.mysql.com/bug.php?id=64284
and
http://www.percona.com/blog/2012/06/22/drop-table-and-stalls-lazy-drop-table-in-percona-server-and-the-new-fixes-in-mysql/

Comment by Jan Lindström (Inactive) [ 2015-05-24 ]

After some experiments. Here is what found. If database is created containing 999 small tables and one big table, I can repeat slow execution time if contents of the table is on buffer pool. If I create database, shutdown, restart and then drop database, I could not repeat. If I create database, shutdown, restart, select, and then drop database still can't really repeat.

MariaDB [(none)]> select count(1) from innodbs.t1;
+------------+
| count(1)   |
+------------+
| 1310720000 |
+------------+
1 row in set (5 min 31.55 sec)
 
MariaDB [(none)]> drop database innodbs;
Query OK, 1001 rows affected (22.28 sec)

Big table on this test was:

mariadb@lizard2:/mnt/raid/jplindst$ ls -l --block-size=G innodbs/t1.ibd
-rw-rw---- 1 mariadb mariadb 63G May 22 00:18 innodbs/t1.ibd

Comment by Marko Mäkelä [ 2017-03-24 ]

In MySQL 10.x, row_drop_table_for_mysql() (which is executing as part of DROP TABLE or DROP DATABASE) contains a non-ACID hack, the ‘background DROP TABLE queue’. Under certain circumstances, it is possible that the table will not be dropped immediately.

I think that the only scenarios under which the DROP TABLE queue can kick in is when there are FOREIGN KEY constraints associated with the table, and another session is causing an access to this table via FOREIGN KEY constraints. Another case is the error handling for CREATE TABLE…SELECT, but I think that it is unlikely here. In MySQL, WL#6049 should eventually fix the FOREIGN KEY problem by acquiring proper meta-data locks for DML operations that may affect FOREIGN KEY tables. As far as I understand, it is planned to be fixed in MySQL 8.0.

This is not the only problem with the current-form DROP TABLE. If we implement a transactional data dictionary (MDEV-11655), in the DROP operations we would introduce a ‘post-commit’ step that actually purges the data (deletes files or frees pages in shared tablespaces) after the operation has been committed, while not holding any locks. This would also require the removal of the non-ACID ‘background DROP TABLE queue’ hack.

Comment by Marko Mäkelä [ 2017-03-24 ]

My current hypothesis is that row_drop_table_for_mysql() waited for more than 10 minutes for an exclusive dict_operation_lock, and finally the InnoDB watchdog kicked in, aborting the process.

This could be a starvation problem. InnoDB purge and rollback are acquiring shared dict_operation_lock. Maybe if there is constantly at least one S-lock holder, the X-lock request will never be fulfilled?

I was suspecting that InnoDB rollback would be holding dict_operation_lock in shared mode for the whole duration of the rollback, but that is not the case in MariaDB 10.0. The rw-lock is being acquired and released for each undo log record that is being processed.

The trick ALTER TABLE…DISCARD TABLESPACE is a good way of ensuring that rollback and purge immediately stop costly key lookups on the table.

Comment by Jan Lindström (Inactive) [ 2017-06-01 ]

Test 1: TPC-C database. I created TPC-C database with 100 warehouses using https://github.com/Percona-Lab/tpcc-mysql

mysql -u root tpcc1000 < create_table.sql
mysql -u root tpcc1000 < add_fkey_idx.sql 
sh load.sh tpcc1000 100
mysql -u root tpcc1000
mysql> use tpcc1000;
mysql> create table stock2 select * from stock;
mysql>exit;
myslqladmin -u root shutdown

Similarly I created LinkBench database https://github.com/facebookarchive/linkbench. I used LinkBench measure as workload while doing drop tests. For linktable I removed the partitioning. First table to be removed i.e. stock2 is about 34G. First test is as follows:

ow variables;
use tpcc1000;
select count(*) from stock2;
SELECT * FROM performance_schema.setup_instruments WHERE NAME LIKE '%wait/synch/mutex/innodb%';
UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name like 'events_waits%';
SELECT * FROM performance_schema.setup_consumers;
 
SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000 SUM_TIMER_WAIT_MS
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE SUM_TIMER_WAIT > 0 AND EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'
ORDER BY COUNT_STAR DESC;
 
TRUNCATE performance_schema.events_waits_summary_global_by_event_name;
 
select count(*) from stock2;
SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000 SUM_TIMER_WAIT_MS
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE SUM_TIMER_WAIT > 0 AND EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'
ORDER BY COUNT_STAR DESC;
 
TRUNCATE performance_schema.events_waits_summary_global_by_event_name;
 
drop table stock2;
 
SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000 SUM_TIMER_WAIT_MS
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE SUM_TIMER_WAIT > 0 AND EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'
ORDER BY COUNT_STAR DESC;

I could not repeat any crash, mutex wait in:

--------------
SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000 SUM_TIMER_WAIT_MS
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE SUM_TIMER_WAIT > 0 AND EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'
ORDER BY COUNT_STAR DESC
--------------
 
EVENT_NAME	COUNT_STAR	SUM_TIMER_WAIT_MS
wait/synch/mutex/innodb/trx_mutex	83733657	13345.1610
wait/synch/mutex/innodb/fil_system_mutex	61226009	276454.7795
wait/synch/mutex/innodb/buf_pool_mutex	38631784	114774.8560
wait/synch/mutex/innodb/srv_threads_mutex	29184209	2349.1542
wait/synch/mutex/innodb/log_sys_mutex	19482445	41216.1871
wait/synch/mutex/innodb/lock_mutex	12895322	23029.2398
wait/synch/mutex/innodb/trx_sys_mutex	12527822	99514.6675
wait/synch/mutex/innodb/redo_rseg_mutex	11928226	7740.1521
wait/synch/mutex/innodb/flush_list_mutex	6718844	965.3505
wait/synch/mutex/innodb/log_sys_write_mutex	4079112	66560.1368
wait/synch/mutex/innodb/log_flush_order_mutex	2892767	954.3451
wait/synch/mutex/innodb/trx_undo_mutex	2880837	658.1209
wait/synch/mutex/innodb/purge_sys_pq_mutex	1674000	287.9366
wait/synch/mutex/innodb/srv_sys_mutex	517472	40.0854
wait/synch/mutex/innodb/dict_sys_mutex	175967	69.0008
wait/synch/mutex/innodb/autoinc_mutex	154831	50.1458
wait/synch/mutex/innodb/ibuf_mutex	76878	489.0882
wait/synch/mutex/innodb/ibuf_pessimistic_insert_mutex	15954	25.6063
wait/synch/mutex/innodb/ibuf_bitmap_mutex	2944	0.7571
wait/synch/mutex/innodb/lock_wait_mutex	1273	1.3469
wait/synch/mutex/innodb/noredo_rseg_mutex	672	0.0974
wait/synch/mutex/innodb/rw_lock_list_mutex	27	0.0049
wait/synch/mutex/innodb/innobase_share_mutex	10	0.0022
wait/synch/mutex/innodb/file_format_max_mutex	9	0.0023
wait/synch/mutex/innodb/trx_pool_mutex	5	0.0009
wait/synch/mutex/innodb/srv_innodb_monitor_mutex	4	0.0014
wait/synch/mutex/innodb/trx_pool_manager_mutex	2	0.0013
wait/synch/mutex/innodb/recalc_pool_mutex	1	0.0007

Comment by Jan Lindström (Inactive) [ 2017-06-01 ]

Test 2 while running LinkBench measure workload drop linktable (used by workload) where linktable is about 80G. Still I do not see any performance problems or crashes:

--------------
SELECT * FROM performance_schema.setup_instruments WHERE NAME LIKE '%wait/synch/mutex/innodb%'
--------------
 
+-------------------------------------------------------+---------+-------+
| NAME                                                  | ENABLED | TIMED |
+-------------------------------------------------------+---------+-------+
| wait/synch/mutex/innodb/commit_cond_mutex             | YES     | YES   |
| wait/synch/mutex/innodb/pending_checkpoint_mutex      | YES     | YES   |
| wait/synch/mutex/innodb/innobase_share_mutex          | YES     | YES   |
| wait/synch/mutex/innodb/autoinc_mutex                 | YES     | YES   |
| wait/synch/mutex/innodb/buf_pool_mutex                | YES     | YES   |
| wait/synch/mutex/innodb/buf_pool_zip_mutex            | YES     | YES   |
| wait/synch/mutex/innodb/cache_last_read_mutex         | YES     | YES   |
| wait/synch/mutex/innodb/dict_foreign_err_mutex        | YES     | YES   |
| wait/synch/mutex/innodb/dict_sys_mutex                | YES     | YES   |
| wait/synch/mutex/innodb/recalc_pool_mutex             | YES     | YES   |
| wait/synch/mutex/innodb/file_format_max_mutex         | YES     | YES   |
| wait/synch/mutex/innodb/fil_system_mutex              | YES     | YES   |
| wait/synch/mutex/innodb/flush_list_mutex              | YES     | YES   |
| wait/synch/mutex/innodb/fts_bg_threads_mutex          | YES     | YES   |
| wait/synch/mutex/innodb/fts_delete_mutex              | YES     | YES   |
| wait/synch/mutex/innodb/fts_optimize_mutex            | YES     | YES   |
| wait/synch/mutex/innodb/fts_doc_id_mutex              | YES     | YES   |
| wait/synch/mutex/innodb/log_flush_order_mutex         | YES     | YES   |
| wait/synch/mutex/innodb/hash_table_mutex              | YES     | YES   |
| wait/synch/mutex/innodb/ibuf_bitmap_mutex             | YES     | YES   |
| wait/synch/mutex/innodb/ibuf_mutex                    | YES     | YES   |
| wait/synch/mutex/innodb/ibuf_pessimistic_insert_mutex | YES     | YES   |
| wait/synch/mutex/innodb/log_sys_mutex                 | YES     | YES   |
| wait/synch/mutex/innodb/log_sys_write_mutex           | YES     | YES   |
| wait/synch/mutex/innodb/mutex_list_mutex              | YES     | YES   |
| wait/synch/mutex/innodb/page_zip_stat_per_index_mutex | YES     | YES   |
| wait/synch/mutex/innodb/purge_sys_pq_mutex            | YES     | YES   |
| wait/synch/mutex/innodb/recv_sys_mutex                | YES     | YES   |
| wait/synch/mutex/innodb/recv_writer_mutex             | YES     | YES   |
| wait/synch/mutex/innodb/redo_rseg_mutex               | YES     | YES   |
| wait/synch/mutex/innodb/noredo_rseg_mutex             | YES     | YES   |
| wait/synch/mutex/innodb/rw_lock_list_mutex            | YES     | YES   |
| wait/synch/mutex/innodb/rw_lock_mutex                 | YES     | YES   |
| wait/synch/mutex/innodb/srv_dict_tmpfile_mutex        | YES     | YES   |
| wait/synch/mutex/innodb/srv_innodb_monitor_mutex      | YES     | YES   |
| wait/synch/mutex/innodb/srv_misc_tmpfile_mutex        | YES     | YES   |
| wait/synch/mutex/innodb/srv_monitor_file_mutex        | YES     | YES   |
| wait/synch/mutex/innodb/buf_dblwr_mutex               | YES     | YES   |
| wait/synch/mutex/innodb/trx_undo_mutex                | YES     | YES   |
| wait/synch/mutex/innodb/trx_pool_mutex                | YES     | YES   |
| wait/synch/mutex/innodb/trx_pool_manager_mutex        | YES     | YES   |
| wait/synch/mutex/innodb/srv_sys_mutex                 | YES     | YES   |
| wait/synch/mutex/innodb/lock_mutex                    | YES     | YES   |
| wait/synch/mutex/innodb/lock_wait_mutex               | YES     | YES   |
| wait/synch/mutex/innodb/trx_mutex                     | YES     | YES   |
| wait/synch/mutex/innodb/srv_threads_mutex             | YES     | YES   |
| wait/synch/mutex/innodb/rtr_active_mutex              | YES     | YES   |
| wait/synch/mutex/innodb/rtr_match_mutex               | YES     | YES   |
| wait/synch/mutex/innodb/rtr_path_mutex                | YES     | YES   |
| wait/synch/mutex/innodb/rtr_ssn_mutex                 | YES     | YES   |
| wait/synch/mutex/innodb/trx_sys_mutex                 | YES     | YES   |
| wait/synch/mutex/innodb/zip_pad_mutex                 | YES     | YES   |
+-------------------------------------------------------+---------+-------+
52 rows in set (0.01 sec)
 
--------------
UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name like 'events_waits%'
--------------
 
Query OK, 0 rows affected (0.00 sec)
Rows matched: 3  Changed: 0  Warnings: 0
 
--------------
SELECT * FROM performance_schema.setup_consumers
--------------
 
+--------------------------------+---------+
| NAME                           | ENABLED |
+--------------------------------+---------+
| events_stages_current          | NO      |
| events_stages_history          | NO      |
| events_stages_history_long     | NO      |
| events_statements_current      | YES     |
| events_statements_history      | NO      |
| events_statements_history_long | NO      |
| events_waits_current           | YES     |
| events_waits_history           | YES     |
| events_waits_history_long      | YES     |
| global_instrumentation         | YES     |
| thread_instrumentation         | YES     |
| statements_digest              | YES     |
+--------------------------------+---------+
12 rows in set (0.00 sec)
 
--------------
TRUNCATE performance_schema.events_waits_summary_global_by_event_name
--------------
 
Query OK, 0 rows affected (0.01 sec)
 
--------------
drop table linktable
--------------
 
Query OK, 0 rows affected (2.07 sec)
 
--------------
SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000 SUM_TIMER_WAIT_MS
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE SUM_TIMER_WAIT > 0 AND EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'
ORDER BY COUNT_STAR DESC
--------------
 
+------------------------------------------------+------------+-------------------+
| EVENT_NAME                                     | COUNT_STAR | SUM_TIMER_WAIT_MS |
+------------------------------------------------+------------+-------------------+
| wait/synch/mutex/innodb/flush_list_mutex       |     517900 | 23.9005           |
| wait/synch/mutex/innodb/fil_system_mutex       |     508770 | 139.1637          |
| wait/synch/mutex/innodb/log_sys_mutex          |     143548 | 9.7131            |
| wait/synch/mutex/innodb/buf_pool_mutex         |     107412 | 64.3274           |
| wait/synch/mutex/innodb/redo_rseg_mutex        |      35401 | 6.5018            |
| wait/synch/mutex/innodb/log_sys_write_mutex    |      24531 | 2.1803            |
| wait/synch/mutex/innodb/log_flush_order_mutex  |       5817 | 0.9242            |
| wait/synch/mutex/innodb/lock_mutex             |       5323 | 0.7879            |
| wait/synch/mutex/innodb/trx_mutex              |       2983 | 0.3252            |
| wait/synch/mutex/innodb/ibuf_mutex             |        981 | 4.5049            |
| wait/synch/mutex/innodb/purge_sys_pq_mutex     |        600 | 0.0242            |
| wait/synch/mutex/innodb/trx_sys_mutex          |        468 | 0.2089            |
| wait/synch/mutex/innodb/dict_sys_mutex         |        152 | 0.0370            |
| wait/synch/mutex/innodb/trx_undo_mutex         |        119 | 0.0212            |
| wait/synch/mutex/innodb/innobase_share_mutex   |        100 | 0.0072            |
| wait/synch/mutex/innodb/rw_lock_list_mutex     |          4 | 0.0005            |
| wait/synch/mutex/innodb/autoinc_mutex          |          4 | 0.0007            |
| wait/synch/mutex/innodb/srv_sys_mutex          |          3 | 0.0004            |
| wait/synch/mutex/innodb/trx_pool_mutex         |          3 | 0.0005            |
| wait/synch/mutex/innodb/trx_pool_manager_mutex |          2 | 0.0004            |
| wait/synch/mutex/innodb/lock_wait_mutex        |          2 | 0.0009            |
| wait/synch/mutex/innodb/srv_threads_mutex      |          2 | 0.0002            |
| wait/synch/mutex/innodb/recalc_pool_mutex      |          1 | 0.0001            |
+------------------------------------------------+------------+-------------------+
23 rows in set (0.02 sec)

Comment by Jan Lindström (Inactive) [ 2017-06-05 ]

Last test dropping a database with size ~230G and 20003 tables while running LinkBench measure. Still can't repeat any crash or significant slowness,

--------------
drop database linkdb
--------------
 
Query OK, 20003 rows affected (2 min 38.99 sec)
 
--------------
SELECT * FROM performance_schema.setup_instruments WHERE NAME LIKE '%wait/synch/mutex/innodb%'
--------------
 
+-------------------------------------------------------+------------+-------------------+
| EVENT_NAME                                            | COUNT_STAR | SUM_TIMER_WAIT_MS |
+-------------------------------------------------------+------------+-------------------+
| wait/synch/mutex/innodb/srv_threads_mutex             |   30069542 | 1547.3216         |
| wait/synch/mutex/innodb/fil_system_mutex              |   16019990 | 41574.3245        |
| wait/synch/mutex/innodb/trx_mutex                     |   14289712 | 2061.5040         |
| wait/synch/mutex/innodb/buf_pool_mutex                |    8860756 | 16410.3035        |
| wait/synch/mutex/innodb/redo_rseg_mutex               |    4702093 | 1359.1786         |
| wait/synch/mutex/innodb/log_sys_mutex                 |    3731063 | 5693.8245         |
| wait/synch/mutex/innodb/flush_list_mutex              |    3165656 | 469.6350          |
| wait/synch/mutex/innodb/lock_mutex                    |    2562113 | 3583.6684         |
| wait/synch/mutex/innodb/trx_sys_mutex                 |    2182337 | 15337.8729        |
| wait/synch/mutex/innodb/purge_sys_pq_mutex            |    1719600 | 126.1066          |
| wait/synch/mutex/innodb/log_sys_write_mutex           |    1270417 | 11030.6672        |
| wait/synch/mutex/innodb/log_flush_order_mutex         |     580398 | 149.8456          |
| wait/synch/mutex/innodb/trx_undo_mutex                |     551665 | 103.0134          |
| wait/synch/mutex/innodb/dict_sys_mutex                |     269384 | 29.3786           |
| wait/synch/mutex/innodb/autoinc_mutex                 |     118985 | 30.6980           |
| wait/synch/mutex/innodb/rw_lock_list_mutex            |      99220 | 10.3610           |
| wait/synch/mutex/innodb/srv_sys_mutex                 |      68944 | 5.9424            |
| wait/synch/mutex/innodb/trx_pool_mutex                |      40009 | 4.2988            |
| wait/synch/mutex/innodb/trx_pool_manager_mutex        |      20005 | 2.0145            |
| wait/synch/mutex/innodb/recalc_pool_mutex             |      20003 | 2.5704            |
| wait/synch/mutex/innodb/ibuf_mutex                    |      15972 | 89.8884           |
| wait/synch/mutex/innodb/ibuf_pessimistic_insert_mutex |       1930 | 3.5514            |
| wait/synch/mutex/innodb/noredo_rseg_mutex             |        736 | 0.0927            |
| wait/synch/mutex/innodb/ibuf_bitmap_mutex             |        412 | 0.1030            |
| wait/synch/mutex/innodb/innobase_share_mutex          |        391 | 0.0344            |
| wait/synch/mutex/innodb/lock_wait_mutex               |        337 | 1.9135            |
| wait/synch/mutex/innodb/srv_innodb_monitor_mutex      |          2 | 0.0003            |
+-------------------------------------------------------+------------+-------------------+

Comment by Jan Lindström (Inactive) [ 2017-06-05 ]

Firstly, is this repeatable with more recent version of MariaDB? If it is, could you please provide exact steps to repeat ad you can see I could not repeat the problem.

Comment by Jan Lindström (Inactive) [ 2017-06-07 ]

Could not repeat the problem with 10.0.16. I used two ~100G LinkBench databases with addition of 10K generated small tables for both databases. While running LinkBench measure on other db, dropping the other db naturally took quite long ~6minutes, but dropping the same db as workload used it was only ~1minute. No errors found on error log and no hangs or assertions seen.

Comment by Geoff Montee (Inactive) [ 2017-09-11 ]

Hi julien.fritsch,

No. The support issue has been closed because progress on this bug is being tracked in Jira.

Comment by Marko Mäkelä [ 2018-10-12 ]

I believe that the issue is that on a fragmented file system, deleting a large file is taking a lot of time.

Several years ago, there was a patch contributed to MySQL that would replace the unlink() system call with a series of system calls that would shrink the file piece by piece. I am not sure if that is a sensible approach.
(Edit: MariaDB pull request #1021 resembles that patch.)

However, I am sure that InnoDB should not be holding any mutexes while deleting the file. I see that both row_drop_table_for_mysql() and row_discard_tablespace() are holding both dict_sys->mutex and dict_operation_lock exclusively while calling fil_delete_tablespace().

I believe that a reasonable fix would be as follows:

  1. Rename the table to an internal name starting with #sql-, so that MDEV-14585 will quarantee that the will be dropped in case the server is killed.
  2. Commit the transaction and release the mutexes.
  3. Delete the #sql-*.ibd file while not blocking other threads.
  4. Delete the InnoDB data dictionary records for the #sql- table in a separate transaction.

This is technically doable in MariaDB Server 10.3, and also starting with 10.2.19 if the MDEV-14585 functionality is enabled.

A more proper solution would involve writing undo log records for "delete file" operations, to be processed by rollback (rolling back CREATE TABLE) or purge (cleaning up after committing DROP TABLE). This cannot be done in a GA release, because it involves changing the InnoDB undo log format. Then it would be something like this:

  1. Rename the table to #sql- name.
  2. Write an undo log record: "on purge, delete #sql-*.ibd"
  3. Drop the table metadata.
  4. Commit the mini-transaction.
  5. Eventually, the purge of history will delete the file while not holding any locks.
Comment by Marko Mäkelä [ 2018-12-21 ]

I posted some review remarks to PR#1021 with ideas how to fix this. It seems doable in 10.2 already.

Comment by Marko Mäkelä [ 2019-02-08 ]

MDEV-18518 is for the undo log format change.

Comment by Marko Mäkelä [ 2020-01-28 ]

MySQL Bug #91977 could report the same problem.

Comment by Marko Mäkelä [ 2020-05-04 ]

There are two issues affecting DROP or rebuild operations of large partitions, tables or databases. (Tables or partitions are internally dropped as part of operations that rebuild the table or partition: TRUNCATE and some forms of OPTIMIZE or ALTER TABLE, sometimes even CREATE INDEX or DROP INDEX.)

The problem reported in this ticket is that the InnoDB data dictionary cache (dict_sys) is being locked while the data file is being deleted, and deleting a large data file may take a lot of time, especially for a fragmented data file.

A related problem affects not only dropping or rebuilding entire tables or partitions, but also DROP INDEX operations that are executed in-place:
MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB

Comment by Marko Mäkelä [ 2020-05-20 ]

MDEV-22456 recently removed one bottleneck already. We still have 2 remaining:

InnoDB is invoking unlink() to delete the data file while holding some mutexes. This must be fixed as part of this ticket.

On some file systems (most notably, on Linux), unlink() of a large file can block any concurrent usage of the entire file system. A workaround for this may be implemented in MDEV-18613.

Comment by Marko Mäkelä [ 2020-05-26 ]

I think that we should try to rely on delete-on-close semantics. That is, copy the handle to the to-be-deleted file, hold it open across the deletion, and close the handle after releasing the dict_sys latches. As far as I understand, file system recovery should guarantee that the file be deleted.

Comment by Matthias Leich [ 2020-06-10 ]

Results of RQG testing
-----------------------------------
origin/bb-10.5-kevgs 3b08527d8c0907b06dad4179b009ca76efdd4aad 2020-06-09T04:34:10+03:00  containing MDEV-8069, Build with ASAN
versus
~ actual 10.5, Build with ASAN
1.  Test battery for broad range coverage
      The tree containing MDEV-8069 performed neither better nor worse than actual 10.5
2. Some new test where concurrent connections fiddle (CREATE OR REPLACE/ALTER/DROP)
     with one 500000 rows table and several 1000 rows tables per connection
     innodb_fatal_semaphore_wait_threshold values tried were 2 and 200
     100 RQG test runs per tree with binaries
      actual 10.5 :  Two times   long semaphore wait hit      the actual value for that was 2s
     MDEV-8069:  Four times  long semaphore wait hit      the actual value for that was 2s
     IMHO the amount of tests does not allow to conclude that MDEV-8069 is  clear better.

Generated at Thu Feb 08 07:24:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.