Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13328

ALTER TABLE ... DISCARD TABLESPACE takes a lot of time with large buffer pool (>128G)

Details

    • 10.1.29

    Description

      ALTER TABLE ... DISCARD TABLESPACE takes a lot of time with large buffer pool (>128G).

      Steps to reproduce:

      1. drop / create an InnoDB table
      2. discard the table space
      3. increase the buffer pool size.
      4. Re-try once you have a 128G+ buffer pool, and re-run #1 & #2.

      The discard time should be increasing.

      Attachments

        Issue Links

          Activity

            If DISCARD TABLESPACE does not evict the pages from the buffer pool, then IMPORT TABLESPACE must do it. It would not be a useful fix to simply move the call to the IMPORT TABLESPACE step.

            I tried simply removing the eviction in DISCARD TABLESPACE. If the DISCARD is soon followed by IMPORT, the IMPORT would adjust the imported file to the pre-existing tablespace ID. If the old pages were not evicted from the buffer pool at DISCARD, then after IMPORT we could incorrectly read old pages from the buffer pool, instead of reading the pages that exist in the imported file.

            Import is initially bypassing the buffer pool when reading pages for the adjustment phase. A possible fix is that in the adjustment phase, if a page exists in the buffer pool, we would replace it with the page from the imported file. This would allow quick DISCARD TABLESPACE even for large tables, and a slightly slower IMPORT TABLESPACE (the slowness being proportional to the file size).

            marko Marko Mäkelä added a comment - If DISCARD TABLESPACE does not evict the pages from the buffer pool, then IMPORT TABLESPACE must do it. It would not be a useful fix to simply move the call to the IMPORT TABLESPACE step. I tried simply removing the eviction in DISCARD TABLESPACE. If the DISCARD is soon followed by IMPORT, the IMPORT would adjust the imported file to the pre-existing tablespace ID. If the old pages were not evicted from the buffer pool at DISCARD, then after IMPORT we could incorrectly read old pages from the buffer pool, instead of reading the pages that exist in the imported file. Import is initially bypassing the buffer pool when reading pages for the adjustment phase. A possible fix is that in the adjustment phase, if a page exists in the buffer pool, we would replace it with the page from the imported file. This would allow quick DISCARD TABLESPACE even for large tables, and a slightly slower IMPORT TABLESPACE (the slowness being proportional to the file size).

            I have pushed this fix to 10.0 and merged to 10.1 so far.
            I spent some time merging this to 10.2.

            In 10.2, there is a merge conflict for the TRUNCATE TABLE code, as anticipated.
            And the solution ought to be good news for TRUNCATE performance: There should be no need to evict adaptive hash index entries or old pages from the buffer pool during TRUNCATE. I believe that it suffices to ensure that buf_page_create() or equivalent is being called when initializing new pages after TRUNCATE, and that TRUNCATE will edit the dict_index_t objects in place.

            marko Marko Mäkelä added a comment - I have pushed this fix to 10.0 and merged to 10.1 so far. I spent some time merging this to 10.2. In 10.2, there is a merge conflict for the TRUNCATE TABLE code, as anticipated. And the solution ought to be good news for TRUNCATE performance: There should be no need to evict adaptive hash index entries or old pages from the buffer pool during TRUNCATE. I believe that it suffices to ensure that buf_page_create() or equivalent is being called when initializing new pages after TRUNCATE, and that TRUNCATE will edit the dict_index_t objects in place.

            I need to re-open this one.

            After testing with 10.1.29, we are still seeing slowness with 512G buffer pool size:

            MariaDB [test]> drop table if exists t;
            Query OK, 0 rows affected, 1 warning (0.00 sec)
             
            MariaDB [test]> create table t(id int) engine=InnoDB;
            Query OK, 0 rows affected (0.00 sec)
             
            MariaDB [test]> alter table t discard tablespace;
            Query OK, 0 rows affected (7.34 sec)
             
            MariaDB [test]> drop table if exists t;
            Query OK, 0 rows affected (0.22 sec)
             
            MariaDB [test]> select @@version;
            +-----------------+
            | @@version |
            +-----------------+
            | 10.1.29-MariaDB |
            +-----------------+
            1 row in set (0.01 sec)
             
            MariaDB [test]> select @@innodb_buffer_pool_size;
            +---------------------------+
            | @@innodb_buffer_pool_size |
            +---------------------------+
            | 549755813888 |
            +---------------------------+
            1 row in set (0.00 sec)
            

            Is there anything further that can be done here?

            ccalender Chris Calender (Inactive) added a comment - I need to re-open this one. After testing with 10.1.29, we are still seeing slowness with 512G buffer pool size: MariaDB [test]> drop table if exists t; Query OK, 0 rows affected, 1 warning (0.00 sec)   MariaDB [test]> create table t(id int) engine=InnoDB; Query OK, 0 rows affected (0.00 sec)   MariaDB [test]> alter table t discard tablespace; Query OK, 0 rows affected (7.34 sec)   MariaDB [test]> drop table if exists t; Query OK, 0 rows affected (0.22 sec)   MariaDB [test]> select @@version; +-----------------+ | @@version | +-----------------+ | 10.1.29-MariaDB | +-----------------+ 1 row in set (0.01 sec)   MariaDB [test]> select @@innodb_buffer_pool_size; +---------------------------+ | @@innodb_buffer_pool_size | +---------------------------+ | 549755813888 | +---------------------------+ 1 row in set (0.00 sec) Is there anything further that can be done here?

            ccalender, I filed MDEV-16283 for what I believe to be the remaining issue. With the smaller buffer pool that I used, the DISCARD TABLESPACE was still rather fast.

            Let us track the progress in MDEV-16283.

            marko Marko Mäkelä added a comment - ccalender , I filed MDEV-16283 for what I believe to be the remaining issue. With the smaller buffer pool that I used, the DISCARD TABLESPACE was still rather fast. Let us track the progress in MDEV-16283 .

            The original fix (MDEV-13328) was that we would not evict the pages of the discarded tablespace from the buffer pool; we would do that on a subsequent IMPORT TABLESPACE only. But, we would still scan the buffer pool for adaptive hash index entries, to drop them. It actually suffices to drop the adaptive hash index entries on IMPORT TABLESPACE or on DROP TABLE or DROP INDEX. And there is no need to scan the buffer pool if index->search_info->ref_count==0 for all indexes of the table. MDEV-16283 implemented these fixes.

            marko Marko Mäkelä added a comment - The original fix ( MDEV-13328 ) was that we would not evict the pages of the discarded tablespace from the buffer pool; we would do that on a subsequent IMPORT TABLESPACE only. But, we would still scan the buffer pool for adaptive hash index entries, to drop them. It actually suffices to drop the adaptive hash index entries on IMPORT TABLESPACE or on DROP TABLE or DROP INDEX . And there is no need to scan the buffer pool if index->search_info->ref_count==0 for all indexes of the table. MDEV-16283 implemented these fixes.

            People

              marko Marko Mäkelä
              ccalender Chris Calender (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.