Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16283

ALTER TABLE...DISCARD TABLESPACE still takes long on a large buffer pool

    Details

      Description

      Chris Calender noted in MDEV-13328 that ALTER TABLE...DISCARD TABLESPACE still takes a long time in a large buffer pool.

      I tried to reproduce the issue with the following:

      source include/have_innodb.inc;
      source include/have_sequence.inc;
       
      create database p;
      use p;
      create table pad(a serial,
      b0 char(255) not null default '',
      b1 char(255) not null default '',
      b2 char(255) not null default '',
      b3 char(255) not null default '',
      b4 char(255) not null default '',
      b5 char(255) not null default '',
      b6 char(255) not null default '',
      key(b0),key(b1),key(b2),key(b3),key(b4),key(b5),key(b6)) ENGINE=InnoDB;
       
      insert into pad(a) select null from seq_1_to_1000000;
       
      let $n=1000;
      while ($n) {
      eval create table t$n(a serial) engine=innodb;
      dec $n;
      }
       
      let $n=1000;
      while ($n) {
      eval alter table t$n discard tablespace;
      dec $n;
      }
       
      drop database p;
      

      ./mtr --mysqld=--innodb-flush-log-at-trx-commit=0 --mysqld=--innodb-max-dirty-pages-pct=99.999 --mysqld=--innodb-log-file-size=2g --mysqld=--innodb-page-size=4k --mysqld=--innodb-buffer-pool-size=8g innodb.discard,xtradb
      

      I believe that the trick is to fill up the buffer pool with many dirty pages, so that the buf_pool->LRU and buf_pool->flush_list will be long. The reason for innodb_page_size=4k is to get as many pages as possible into a smaller buffer pool. The other parameters are mainly there to speed up the insert (do as little file I/O as possible). I ran the test in /dev/shm.

      The DISCARD TABLESPACE were rather fast, maybe 0.1 seconds or less. While they were being run, sudo perf top showed the following:

      10.1 bfed1bfe28980d3b1404f99a44712242a5108ef5

        93,50%  mysqld                         [.] buf_LRU_flush_or_remove_pages
         0,60%  mysqld                         [.] buf_calc_page_new_checksum
         0,21%  mysqld                         [.] ut_delay
         0,21%  libc-2.27.so                   [.] __memmove_avx_unaligned_erms
         0,09%  libshadow.so                   [.] shadowUpdatePacked
         0,09%  libc-2.27.so                   [.] __memchr_avx2
      

      Yes, we still have to scan the buf_pool->flush_list and remove the pages of any discarded tables, so that the discarded data will not be written, or that we will not get flooded by error messages that there are unwritten changes to missing files.

      I tried shortening the flush_list by adding the following before the CREATE TABLE loop:

      FLUSH TABLES pad FOR EXPORT;
      UNLOCK TABLES;
      

      Computing the page checksums for the flushing takes quite a bit of time, but to my surprise, the DISCARD TABLESPACE remained roughly as slow as before. I also tried skip_innodb_adaptive_hash_index (even though already with the previous fix, we would not touch the adaptive hash index). No change.

      Here is the culprit:

      #0  buf_LRU_flush_or_remove_pages (id=id@entry=5, trx=trx@entry=0x0, 
          drop_ahi=drop_ahi@entry=true)
          at /mariadb/10.1/storage/xtradb/buf/buf0lru.cc:745
      #1  0x0000555555dbf1dd in fil_delete_tablespace (id=id@entry=5, 
          drop_ahi=drop_ahi@entry=true)
          at /mariadb/10.1/storage/xtradb/fil/fil0fil.cc:2939
      #2  0x0000555555dbf693 in fil_discard_tablespace (id=5)
          at /mariadb/10.1/storage/xtradb/fil/fil0fil.cc:3050
      #3  0x0000555555cfba1a in row_discard_tablespace (table=0x7fffb40159c8, 
          trx=0x7fffb40126f8) at /mariadb/10.1/storage/xtradb/row/row0mysql.cc:3123
      #4  row_discard_tablespace_for_mysql (name=<optimized out>, 
          trx=0x7fffb40126f8) at /mariadb/10.1/storage/xtradb/row/row0mysql.cc:3217
      

      We are actually passing drop_ahi=true in DISCARD TABLESPACE, and calling the expensive function buf_LRU_drop_page_hash_for_tablespace() for no good reason.

      The adaptive hash index entries should do no harm here; they will have to be dropped on IMPORT TABLESPACE or DROP TABLE, but not earlier.

      It is also curious that we are searching for adaptive hash index entries to be dropped, even though skip_innodb_adaptive_hash_index is set. We should check if any AHI entries exist for any table (or even better, for any indexes of our table or partition), and only then invoke buf_LRU_drop_page_hash_for_tablespace().

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                marko Marko Mäkelä
                Reporter:
                marko Marko Mäkelä
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: