Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.1.29, 10.2.11, 10.3.3
Description
ccalender noted in MDEV-13328 that ALTER TABLE...DISCARD TABLESPACE still takes a long time in a large buffer pool.
I tried to reproduce the issue with the following:
source include/have_innodb.inc;
|
source include/have_sequence.inc;
|
|
create database p; |
use p; |
create table pad(a serial, |
b0 char(255) not null default '', |
b1 char(255) not null default '', |
b2 char(255) not null default '', |
b3 char(255) not null default '', |
b4 char(255) not null default '', |
b5 char(255) not null default '', |
b6 char(255) not null default '', |
key(b0),key(b1),key(b2),key(b3),key(b4),key(b5),key(b6)) ENGINE=InnoDB; |
|
insert into pad(a) select null from seq_1_to_1000000; |
|
let $n=1000;
|
while ($n) {
|
eval create table t$n(a serial) engine=innodb; |
dec $n; |
}
|
|
let $n=1000;
|
while ($n) {
|
eval alter table t$n discard tablespace; |
dec $n; |
}
|
|
drop database p; |
./mtr --mysqld=--innodb-flush-log-at-trx-commit=0 --mysqld=--innodb-max-dirty-pages-pct=99.999 --mysqld=--innodb-log-file-size=2g --mysqld=--innodb-page-size=4k --mysqld=--innodb-buffer-pool-size=8g innodb.discard,xtradb
|
I believe that the trick is to fill up the buffer pool with many dirty pages, so that the buf_pool->LRU and buf_pool->flush_list will be long. The reason for innodb_page_size=4k is to get as many pages as possible into a smaller buffer pool. The other parameters are mainly there to speed up the insert (do as little file I/O as possible). I ran the test in /dev/shm.
The DISCARD TABLESPACE were rather fast, maybe 0.1 seconds or less. While they were being run, sudo perf top showed the following:
10.1 bfed1bfe28980d3b1404f99a44712242a5108ef5 |
93,50% mysqld [.] buf_LRU_flush_or_remove_pages
|
0,60% mysqld [.] buf_calc_page_new_checksum
|
0,21% mysqld [.] ut_delay
|
0,21% libc-2.27.so [.] __memmove_avx_unaligned_erms
|
0,09% libshadow.so [.] shadowUpdatePacked
|
0,09% libc-2.27.so [.] __memchr_avx2
|
Yes, we still have to scan the buf_pool->flush_list and remove the pages of any discarded tables, so that the discarded data will not be written, or that we will not get flooded by error messages that there are unwritten changes to missing files.
I tried shortening the flush_list by adding the following before the CREATE TABLE loop:
FLUSH TABLES pad FOR EXPORT; |
UNLOCK TABLES;
|
Computing the page checksums for the flushing takes quite a bit of time, but to my surprise, the DISCARD TABLESPACE remained roughly as slow as before. I also tried skip_innodb_adaptive_hash_index (even though already with the previous fix, we would not touch the adaptive hash index). No change.
Here is the culprit:
#0 buf_LRU_flush_or_remove_pages (id=id@entry=5, trx=trx@entry=0x0,
|
drop_ahi=drop_ahi@entry=true)
|
at /mariadb/10.1/storage/xtradb/buf/buf0lru.cc:745
|
#1 0x0000555555dbf1dd in fil_delete_tablespace (id=id@entry=5,
|
drop_ahi=drop_ahi@entry=true)
|
at /mariadb/10.1/storage/xtradb/fil/fil0fil.cc:2939
|
#2 0x0000555555dbf693 in fil_discard_tablespace (id=5)
|
at /mariadb/10.1/storage/xtradb/fil/fil0fil.cc:3050
|
#3 0x0000555555cfba1a in row_discard_tablespace (table=0x7fffb40159c8,
|
trx=0x7fffb40126f8) at /mariadb/10.1/storage/xtradb/row/row0mysql.cc:3123
|
#4 row_discard_tablespace_for_mysql (name=<optimized out>,
|
trx=0x7fffb40126f8) at /mariadb/10.1/storage/xtradb/row/row0mysql.cc:3217
|
We are actually passing drop_ahi=true in DISCARD TABLESPACE, and calling the expensive function buf_LRU_drop_page_hash_for_tablespace() for no good reason.
The adaptive hash index entries should do no harm here; they will have to be dropped on IMPORT TABLESPACE or DROP TABLE, but not earlier.
It is also curious that we are searching for adaptive hash index entries to be dropped, even though skip_innodb_adaptive_hash_index is set. We should check if any AHI entries exist for any table (or even better, for any indexes of our table or partition), and only then invoke buf_LRU_drop_page_hash_for_tablespace().
Attachments
Issue Links
- includes
-
MDEV-13328 ALTER TABLE ... DISCARD TABLESPACE takes a lot of time with large buffer pool (>128G)
- Closed
- relates to
-
MDEV-14489 rpl.rpl_stm_mixing_engines, rpl.rpl_non_direct_mixed_mixing_engines, rpl.rpl_mixed_mixing_engines, rpl.rpl_non_direct_stm_mixing_engines failed in buildbot, sync_slave_with_master failed
- Closed
-
MDEV-14491 rpl.rpl_row_mixing_engines, rpl.rpl_non_direct_row_mixing_engines failed in buildbot with long semaphore wait
- Closed
-
MDEV-14727 innodb_fts.fulltext2 failed in buildbot with long semaphore wait
- Closed
-
MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB
- Closed
-
MDEV-16515 InnoDB: Failing assertion: ++retries < 10000 in file dict0dict.cc line 2737
- Closed
-
MDEV-21549 IMPORT TABLESPACE fails to adjust all tablespace ID in root pages
- Closed