[MDEV-16283] ALTER TABLE...DISCARD TABLESPACE still takes long on a large buffer pool - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1.29, 10.2.11, 10.3.3
Fix Version/s: 10.1.34, 10.2.16, 10.3.8
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Labels:
- ddl
- performance
- upstream

Description

ccalender noted in ~~MDEV-13328~~ that ALTER TABLE...DISCARD TABLESPACE still takes a long time in a large buffer pool.

I tried to reproduce the issue with the following:

source include/have_innodb.inc;

source include/have_sequence.inc;

create database p;

use p;

create table pad(a serial,

b0 char(255) not null default '',

b1 char(255) not null default '',

b2 char(255) not null default '',

b3 char(255) not null default '',

b4 char(255) not null default '',

b5 char(255) not null default '',

b6 char(255) not null default '',

key(b0),key(b1),key(b2),key(b3),key(b4),key(b5),key(b6)) ENGINE=InnoDB;

insert into pad(a) select null from seq_1_to_1000000;

let $n=1000;

while ($n) {

eval create table t$n(a serial) engine=innodb;

dec $n;

let $n=1000;

while ($n) {

eval alter table t$n discard tablespace;

dec $n;

drop database p;

./mtr --mysqld=--innodb-flush-log-at-trx-commit=0 --mysqld=--innodb-max-dirty-pages-pct=99.999 --mysqld=--innodb-log-file-size=2g --mysqld=--innodb-page-size=4k --mysqld=--innodb-buffer-pool-size=8g innodb.discard,xtradb

I believe that the trick is to fill up the buffer pool with many dirty pages, so that the buf_pool->LRU and buf_pool->flush_list will be long. The reason for innodb_page_size=4k is to get as many pages as possible into a smaller buffer pool. The other parameters are mainly there to speed up the insert (do as little file I/O as possible). I ran the test in /dev/shm.

The DISCARD TABLESPACE were rather fast, maybe 0.1 seconds or less. While they were being run, sudo perf top showed the following:

10.1 bfed1bfe28980d3b1404f99a44712242a5108ef5
93,50% mysqld [.] buf_LRU_flush_or_remove_pages
0,60% mysqld [.] buf_calc_page_new_checksum
0,21% mysqld [.] ut_delay
0,21% libc-2.27.so [.] __memmove_avx_unaligned_erms
0,09% libshadow.so [.] shadowUpdatePacked
0,09% libc-2.27.so [.] __memchr_avx2

Yes, we still have to scan the buf_pool->flush_list and remove the pages of any discarded tables, so that the discarded data will not be written, or that we will not get flooded by error messages that there are unwritten changes to missing files.

I tried shortening the flush_list by adding the following before the CREATE TABLE loop:

FLUSH TABLES pad FOR EXPORT;

UNLOCK TABLES;

Computing the page checksums for the flushing takes quite a bit of time, but to my surprise, the DISCARD TABLESPACE remained roughly as slow as before. I also tried skip_innodb_adaptive_hash_index (even though already with the previous fix, we would not touch the adaptive hash index). No change.

Here is the culprit:

#0  buf_LRU_flush_or_remove_pages (id=id@entry=5, trx=trx@entry=0x0,

    drop_ahi=drop_ahi@entry=true)

    at /mariadb/10.1/storage/xtradb/buf/buf0lru.cc:745

#1  0x0000555555dbf1dd in fil_delete_tablespace (id=id@entry=5,

    drop_ahi=drop_ahi@entry=true)

    at /mariadb/10.1/storage/xtradb/fil/fil0fil.cc:2939

#2  0x0000555555dbf693 in fil_discard_tablespace (id=5)

    at /mariadb/10.1/storage/xtradb/fil/fil0fil.cc:3050

#3  0x0000555555cfba1a in row_discard_tablespace (table=0x7fffb40159c8,

    trx=0x7fffb40126f8) at /mariadb/10.1/storage/xtradb/row/row0mysql.cc:3123

#4  row_discard_tablespace_for_mysql (name=<optimized out>,

    trx=0x7fffb40126f8) at /mariadb/10.1/storage/xtradb/row/row0mysql.cc:3217

We are actually passing drop_ahi=true in DISCARD TABLESPACE, and calling the expensive function buf_LRU_drop_page_hash_for_tablespace() for no good reason.

The adaptive hash index entries should do no harm here; they will have to be dropped on IMPORT TABLESPACE or DROP TABLE, but not earlier.

It is also curious that we are searching for adaptive hash index entries to be dropped, even though skip_innodb_adaptive_hash_index is set. We should check if any AHI entries exist for any table (or even better, for any indexes of our table or partition), and only then invoke buf_LRU_drop_page_hash_for_tablespace().

Attachments

Issue Links

includes

MDEV-13328 ALTER TABLE ... DISCARD TABLESPACE takes a lot of time with large buffer pool (>128G)

Closed

relates to

MDEV-14489 rpl.rpl_stm_mixing_engines, rpl.rpl_non_direct_mixed_mixing_engines, rpl.rpl_mixed_mixing_engines, rpl.rpl_non_direct_stm_mixing_engines failed in buildbot, sync_slave_with_master failed

Closed

MDEV-14491 rpl.rpl_row_mixing_engines, rpl.rpl_non_direct_row_mixing_engines failed in buildbot with long semaphore wait

Closed

MDEV-14727 innodb_fts.fulltext2 failed in buildbot with long semaphore wait

Closed

MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB

Closed

MDEV-16515 InnoDB: Failing assertion: ++retries < 10000 in file dict0dict.cc line 2737

Closed

MDEV-21549 IMPORT TABLESPACE fails to adjust all tablespace ID in root pages

Closed

(2 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2018-05-29 08:45

I don’t think that we need to drop the adaptive hash index at all on DISCARD TABLESPACE. However, we must fix DROP TABLE so that it will drop the entries, even if the tablespace has been discarded.

On IMPORT TABLESPACE, since ~~MDEV-13328~~ we would free any pages with matching page numbers, and this will also drop the adaptive hash index entries. Only if the table would shrink as a result of the DISCARD TABLESPACE and IMPORT TABLESPACE, there could be adaptive hash index entries pointing to ‘phantom’ pages in the buffer pool. These entries must be removed. As a result of this, IMPORT TABLESPACE may become a little slower when replacing a table with a smaller one.

While developing the fix, I came across the code that emits messages like this:

InnoDB: Error: Waited for 140 secs for hash index ref_count (1) to drop to 0

I do not see any mechanism that would actively drop the adaptive hash index in this case. Because we are keeping the dictionary locked while waiting, the entire server could hang, and the InnoDB buffer pool could become idle, so that the entries are never dropped. I think that it is better to actively drop the adaptive hash index entries in this case.

Marko Mäkelä added a comment - 2018-05-29 08:45 I don’t think that we need to drop the adaptive hash index at all on DISCARD TABLESPACE . However, we must fix DROP TABLE so that it will drop the entries, even if the tablespace has been discarded. On IMPORT TABLESPACE , since MDEV-13328 we would free any pages with matching page numbers, and this will also drop the adaptive hash index entries. Only if the table would shrink as a result of the DISCARD TABLESPACE and IMPORT TABLESPACE , there could be adaptive hash index entries pointing to ‘phantom’ pages in the buffer pool. These entries must be removed. As a result of this, IMPORT TABLESPACE may become a little slower when replacing a table with a smaller one. While developing the fix, I came across the code that emits messages like this: InnoDB: Error: Waited for 140 secs for hash index ref_count (1) to drop to 0 I do not see any mechanism that would actively drop the adaptive hash index in this case. Because we are keeping the dictionary locked while waiting, the entire server could hang, and the InnoDB buffer pool could become idle, so that the entries are never dropped. I think that it is better to actively drop the adaptive hash index entries in this case.

Marko Mäkelä added a comment - 2018-05-30 07:17

The original fix (~~MDEV-13328~~) was that we would not evict the pages of the discarded tablespace from the buffer pool; we would do that on a subsequent IMPORT TABLESPACE only. But, we would still scan the buffer pool for adaptive hash index entries, to drop them. It actually suffices to drop the adaptive hash index entries on IMPORT TABLESPACE or on DROP TABLE or DROP INDEX. And there is no need to scan the buffer pool if index->search_info->ref_count==0 for all indexes of the table. ~~MDEV-16283~~ implemented these fixes.

Marko Mäkelä added a comment - 2018-05-30 07:17 The original fix ( MDEV-13328 ) was that we would not evict the pages of the discarded tablespace from the buffer pool; we would do that on a subsequent IMPORT TABLESPACE only. But, we would still scan the buffer pool for adaptive hash index entries, to drop them. It actually suffices to drop the adaptive hash index entries on IMPORT TABLESPACE or on DROP TABLE or DROP INDEX . And there is no need to scan the buffer pool if index->search_info->ref_count==0 for all indexes of the table. MDEV-16283 implemented these fixes.

Tim He added a comment - 2020-03-13 14:19

--innodb-buffer-pool-size=16G
------------------------------------ (change "seq_1_to_1000000" to "seq_1_to_2000000")
let $n=1000;
while ($n)

{ eval alter table t$n discard tablespace; dec $n; }
------------------------------------
71.09 sec

--innodb-buffer-pool-size=8G
------------------------------------
let $n=1000;
while ($n) { eval alter table t$n discard tablespace; dec $n; }

------------------------------------
41.65 sec

--innodb-buffer-pool-size=2G
------------------------------------
let $n=1000;
while ($n)

{ eval alter table t$n discard tablespace; dec $n; }
------------------------------------
16.24 sec

--innodb-buffer-pool-size=256M
------------------------------------
let $n=1000;
while ($n) { eval alter table t$n discard tablespace; dec $n; }

------------------------------------
3.6 sec

--innodb-buffer-pool-size=128M
------------------------------------
let $n=1000;
while ($n)

{ eval alter table t$n discard tablespace; dec $n; }

------------------------------------
2.75 sec

Tim He added a comment - 2020-03-13 14:19 --innodb-buffer-pool-size=16G ------------------------------------ (change "seq_1_to_1000000" to "seq_1_to_2000000") let $n=1000; while ($n) { eval alter table t$n discard tablespace; dec $n; } ------------------------------------ 71.09 sec --innodb-buffer-pool-size=8G ------------------------------------ let $n=1000; while ($n) { eval alter table t$n discard tablespace; dec $n; } ------------------------------------ 41.65 sec --innodb-buffer-pool-size=2G ------------------------------------ let $n=1000; while ($n) { eval alter table t$n discard tablespace; dec $n; } ------------------------------------ 16.24 sec --innodb-buffer-pool-size=256M ------------------------------------ let $n=1000; while ($n) { eval alter table t$n discard tablespace; dec $n; } ------------------------------------ 3.6 sec --innodb-buffer-pool-size=128M ------------------------------------ let $n=1000; while ($n) { eval alter table t$n discard tablespace; dec $n; } ------------------------------------ 2.75 sec

Marko Mäkelä added a comment - 2020-03-13 15:01

Tim He, with which version of MariaDB is that? Can you post some analysis based on perf record and perf report?

Marko Mäkelä added a comment - 2020-03-13 15:01 Tim He , with which version of MariaDB is that? Can you post some analysis based on perf record and perf report ?

MariaDB Server

ALTER TABLE...DISCARD TABLESPACE still takes long on a large buffer pool

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration