[MDEV-18613] Optimization for dropping table - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 10.5.4
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Labels:
- need_feedback

Description

When innodb_file_per_table=ON, each table has its only ibd file, user thread has to unlink refered ibd file when drop table is executed. as a result, it cost a lot of time when the ibd file is large and stall the whole system.

For detail information, please refer to: https://github.com/MariaDB/server/pull/1021

Attachments

Issue Links

relates to

MDEV-32786 Support NBO for DROP TABLE in Galera

Open

MDEV-8069 DROP or rebuild of a large table may lock up InnoDB

Closed

MDEV-9459 Truncate table causes innodb stalls

Closed

MDEV-16796 TRUNCATE TABLE slowdown with innodb_file_per_table=ON

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2020-05-20 09:52

As far as I can tell, this basically is a work-around for an operating system deficiency that blocks any concurrent usage of the file system while a large file is being deleted. To my knowledge, it is most needed on Linux, and not at all needed on Microsoft Windows.

~~MDEV-8069~~ and ~~MDEV-22456~~ will remove some other bottlenecks related to InnoDB DDL operations that affect all environments.

Technically, if we implement a background task that piecewise shrinks a large file in order to work around the file system starvation bug, it would be preferable to do that on 10.5 or later, using the ~~MDEV-16264~~ infrastructure.

Marko Mäkelä added a comment - 2020-05-20 09:52 As far as I can tell, this basically is a work-around for an operating system deficiency that blocks any concurrent usage of the file system while a large file is being deleted. To my knowledge, it is most needed on Linux, and not at all needed on Microsoft Windows. MDEV-8069 and MDEV-22456 will remove some other bottlenecks related to InnoDB DDL operations that affect all environments. Technically, if we implement a background task that piecewise shrinks a large file in order to work around the file system starvation bug, it would be preferable to do that on 10.5 or later, using the MDEV-16264 infrastructure.

Marko Mäkelä added a comment - 2020-06-11 17:07

Now that ~~MDEV-8069~~ has been fixed, I would like to know if a ftruncate() workaround is actually needed to prevent stalls on some file systems.

Marko Mäkelä added a comment - 2020-06-11 17:07 Now that MDEV-8069 has been fixed, I would like to know if a ftruncate() workaround is actually needed to prevent stalls on some file systems.

Manjot Singh (Inactive) added a comment - 2020-06-12 15:54

How is this issue different than ~~MDEV-8069~~?

Manjot Singh (Inactive) added a comment - 2020-06-12 15:54 How is this issue different than MDEV-8069 ?

Manjot Singh (Inactive) added a comment - 2020-07-21 17:39

marko in your comment in ~~MDEV-8069~~ on May 20, you mention that the unlink should still be fixed. Is the truncate here a different issue or the same issue?

Was this fixed in 8069?

Manjot Singh (Inactive) added a comment - 2020-07-21 17:39 marko in your comment in MDEV-8069 on May 20, you mention that the unlink should still be fixed. Is the truncate here a different issue or the same issue? Was this fixed in 8069?

Marko Mäkelä added a comment - 2020-07-27 12:19

manjot, in ~~MDEV-8069~~ we changed the logic so that at the time of the unlink() invocation, there will be an open handles, to prevent the unlink() from performing any actual work. In this way, holding InnoDB mutexes at that point does not matter. We would close() the file handle only after releasing the InnoDB mutexes.

I do not know whether any currently popular Linux file systems suffer from the problem that deleting a file (which in our case would occur at the time of the close() invocation) would prevent any concurrent operation on the file system. There are some hints that this was a problem with the ext3 file system, but not with ext4. I think that we will find it out when someone complains. I would expect the worst case to involve the deletion of large fragmented files. It might ‘help’ to fragment the files by enabling page_compressed when creating the tables.

If some file system turns out to suffer from that problem, we could try to work around that problem by repeatedly invoking ftruncate() to shrink the file before closing the file handle.

Marko Mäkelä added a comment - 2020-07-27 12:19 manjot , in MDEV-8069 we changed the logic so that at the time of the unlink() invocation, there will be an open handles, to prevent the unlink() from performing any actual work. In this way, holding InnoDB mutexes at that point does not matter. We would close() the file handle only after releasing the InnoDB mutexes. I do not know whether any currently popular Linux file systems suffer from the problem that deleting a file (which in our case would occur at the time of the close() invocation) would prevent any concurrent operation on the file system. There are some hints that this was a problem with the ext3 file system, but not with ext4 . I think that we will find it out when someone complains. I would expect the worst case to involve the deletion of large fragmented files. It might ‘help’ to fragment the files by enabling page_compressed when creating the tables. If some file system turns out to suffer from that problem, we could try to work around that problem by repeatedly invoking ftruncate() to shrink the file before closing the file handle.

Marko Mäkelä added a comment - 2021-07-01 09:55

In ~~MDEV-25506~~ the DROP TABLE code was rewritten, but the basic idea of the ~~MDEV-8069~~ fix was preserved: we will unlink() the file while holding both some mutexes and an open file handle. Finally, we will release the mutexes and close the file. At this point, some time may be spent in the file system driver of the operating system kernel.

Should some file system really require a work-around to make the delete-on-close perform faster (without stalling other threads or processes that are competing for kernel resources), we could implement something that performs a piecewise ftruncate() of the file before finally closing the handle. The following just illustrates the idea; there are multiple occurrences of such code in the 10.6 server:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc

index 1acb8ef5e20..ddb8422a553 100644

--- a/storage/innobase/handler/ha_innodb.cc

+++ b/storage/innobase/handler/ha_innodb.cc

@@ -2050,7 +2050,7 @@ static void drop_garbage_tables_after_restore()

     row_mysql_unlock_data_dictionary(trx);

     for (pfs_os_file_t d : deleted)

-      os_file_close(d);

+      os_file_truncate_and_close(d);

     mtr.start();

     btr_pcur_restore_position(BTR_SEARCH_LEAF, &pcur, &mtr);

Marko Mäkelä added a comment - 2021-07-01 09:55 In MDEV-25506 the DROP TABLE code was rewritten, but the basic idea of the MDEV-8069 fix was preserved: we will unlink() the file while holding both some mutexes and an open file handle. Finally, we will release the mutexes and close the file. At this point, some time may be spent in the file system driver of the operating system kernel. Should some file system really require a work-around to make the delete-on-close perform faster (without stalling other threads or processes that are competing for kernel resources), we could implement something that performs a piecewise ftruncate() of the file before finally closing the handle. The following just illustrates the idea; there are multiple occurrences of such code in the 10.6 server: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 1acb8ef5e20..ddb8422a553 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -2050,7 +2050,7 @@ static void drop_garbage_tables_after_restore() row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) - os_file_close(d); + os_file_truncate_and_close(d); mtr.start(); btr_pcur_restore_position(BTR_SEARCH_LEAF, &pcur, &mtr);

Marko Mäkelä added a comment - 2021-10-26 08:31

I believe that this has been fixed by ~~MDEV-8069~~ in MariaDB Server 10.5.4.

Marko Mäkelä added a comment - 2021-10-26 08:31 I believe that this has been fixed by MDEV-8069 in MariaDB Server 10.5.4.

People

Assignee:: Marko Mäkelä

Reporter:: musazhang

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2019-02-18 07:55

Updated:: 2024-09-06 10:00

Resolved:: 2021-10-26 08:31

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.