[MDEV-13328] ALTER TABLE ... DISCARD TABLESPACE takes a lot of time with large buffer pool (>128G) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.1.20
Fix Version/s: 10.1.29, 10.2.11, 10.3.3
Component/s: Storage Engine - InnoDB
Labels:
None

Sprint:
10.1.29

Description

ALTER TABLE ... DISCARD TABLESPACE takes a lot of time with large buffer pool (>128G).

Steps to reproduce:

1. drop / create an InnoDB table
2. discard the table space
3. increase the buffer pool size.
4. Re-try once you have a 128G+ buffer pool, and re-run #1 & #2.

The discard time should be increasing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

perf_10g.data
316 kB
2017-09-08 04:52
perf_5g.data
510 kB
2017-09-08 04:52

Issue Links

causes

MDEV-16759 InnoDB: Assertion failure in thread 139946191502080 in file row0ins.cc line 285

Closed

is part of

MDEV-16283 ALTER TABLE...DISCARD TABLESPACE still takes long on a large buffer pool

Closed

relates to

MDEV-14263 FLUSH TABLES FOR EXPORT may write the same pages multiple times

Confirmed

MDEV-9459 Truncate table causes innodb stalls

Closed

MDEV-14310 Possible corruption by table-rebuilding or index-creating ALTER TABLE…ALGORITHM=INPLACE

Closed

Sub-Tasks

Smoke test / sanity check for changes made in scope of MDEV-13328

Closed

Elena Stepanova

Activity

Ascending order - Click to sort in descending order

Chris Calender (Inactive) added a comment - 2017-07-14 19:07

https://jira.mariadb.org/browse/MDEV-11658
https://jira.mariadb.org/browse/MDEV-7894

These look potentially related at a quick glance (just ones we ran across while searching the bugs db for an existing bug about this).

Chris Calender (Inactive) added a comment - 2017-07-14 19:07 https://jira.mariadb.org/browse/MDEV-11658 https://jira.mariadb.org/browse/MDEV-7894 These look potentially related at a quick glance (just ones we ran across while searching the bugs db for an existing bug about this).

Elena Stepanova added a comment - 2017-07-14 23:25

I am not getting any difference for the given scenario neither on current 10.1 nor on 10.1.20. There must be more to it.
Please collect and provide complete cnf file(s), an example of the exact CREATE TABLE statement, and information whether the table is supposed to be empty.

DROP PROCEDURE IF EXISTS pr;

DELIMITER $$

CREATE PROCEDURE pr(n INT)

BEGIN

  DECLARE i INT DEFAULT 0;

  WHILE i < n

DO

    DROP TABLE IF EXISTS t1;

    CREATE TABLE t1 (i INT);

    ALTER TABLE t1 DISCARD TABLESPACE;

    SET i = i+1;

  END WHILE;

END$$

DELIMITER ;

128 MB
MariaDB [test]> SELECT @@innodb_buffer_pool_size;
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 134217728 \|
+---------------------------+
1 row in set (0.00 sec)

MariaDB [test]> CALL pr(20);
Query OK, 0 rows affected (0.09 sec)

16 GB
MariaDB [test]> SELECT @@innodb_buffer_pool_size;
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 17179869184 \|
+---------------------------+
1 row in set (0.00 sec)

MariaDB [test]> CALL pr(20);
Query OK, 0 rows affected (0.09 sec)

160 GB
MariaDB [test]> SELECT @@innodb_buffer_pool_size;
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 171798691840 \|
+---------------------------+
1 row in set (0.00 sec)

MariaDB [test]> CALL pr(20);
Query OK, 0 rows affected (0.09 sec)

Elena Stepanova added a comment - 2017-07-14 23:25 I am not getting any difference for the given scenario neither on current 10.1 nor on 10.1.20. There must be more to it. Please collect and provide complete cnf file(s), an example of the exact CREATE TABLE statement, and information whether the table is supposed to be empty. DROP PROCEDURE IF EXISTS pr; DELIMITER $$ CREATE PROCEDURE pr(n INT ) BEGIN DECLARE i INT DEFAULT 0; WHILE i < n DO DROP TABLE IF EXISTS t1; CREATE TABLE t1 (i INT ); ALTER TABLE t1 DISCARD TABLESPACE; SET i = i+1; END WHILE; END $$ DELIMITER ; 128 MB MariaDB [test]> SELECT @@innodb_buffer_pool_size; + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 134217728 | + ---------------------------+ 1 row in set (0.00 sec) MariaDB [test]> CALL pr(20); Query OK, 0 rows affected (0.09 sec) 16 GB MariaDB [test]> SELECT @@innodb_buffer_pool_size; + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 17179869184 | + ---------------------------+ 1 row in set (0.00 sec) MariaDB [test]> CALL pr(20); Query OK, 0 rows affected (0.09 sec) 160 GB MariaDB [test]> SELECT @@innodb_buffer_pool_size; + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 171798691840 | + ---------------------------+ 1 row in set (0.00 sec) MariaDB [test]> CALL pr(20); Query OK, 0 rows affected (0.09 sec)

Chris Calender (Inactive) added a comment - 2017-07-17 21:55

Actually, it is not only sufficient for the buffer pool to be big, but it also needs to be full of pages ...

– run some artificial load like sysbench so that the buffer pool is full of pages

drop database if exists test_discard;
create database test_discard;
show global variables like 'innodb_buffer_pool_size';
use test_discard;
create table t(id int primary key)engine=InnoDB;
alter table t discard tablespace;

Chris Calender (Inactive) added a comment - 2017-07-17 21:55 Actually, it is not only sufficient for the buffer pool to be big, but it also needs to be full of pages ... – run some artificial load like sysbench so that the buffer pool is full of pages drop database if exists test_discard; create database test_discard; show global variables like 'innodb_buffer_pool_size'; use test_discard; create table t(id int primary key)engine=InnoDB; alter table t discard tablespace;

Elena Stepanova added a comment - 2017-07-17 22:10

Do you mean that it actually needs to be full?
Cannot the slowness be caused by the mere existence of 128+G of tables?

Elena Stepanova added a comment - 2017-07-17 22:10 Do you mean that it actually needs to be full ? Cannot the slowness be caused by the mere existence of 128+G of tables?

Chris Calender (Inactive) added a comment - 2017-07-17 22:13

Yes, it must be full, according to the reporter.

Note that if they lower innodb_buffer_pool_size to, say 16G, then it is fast. So just having 128G+ stored in tables is not sufficient.

Chris Calender (Inactive) added a comment - 2017-07-17 22:13 Yes, it must be full, according to the reporter. Note that if they lower innodb_buffer_pool_size to, say 16G, then it is fast. So just having 128G+ stored in tables is not sufficient.

Elena Stepanova added a comment - 2017-07-18 15:33

With full buffer pool, I do see some difference depending on the size, but it does not seem huge. What kind of slowness we are talking about, is it more like a few seconds, or more like minutes? Maybe there is something more to it, server configuration, data specifics or the scenario?

I've created ~80 tables, each with 1,000,000 rows. Then every test (with different buffer pool size) goes like this:

start server with buffer pool size X;
run SELECT COUNT from every table, to populate buffer pool;
check buffer pool size and status;
run ALTER TABLE .. DISCARD TABLESPACE on one table.

The fact that after every test there would be one table less is irrelevant, because in all cases the buffer pool is sufficiently populated.
Results are below. In short, I'm getting ~0.5 seconds with 16G buffer pool, and ~3 seconds with 160G. And as far as I can tell, there is no jump after 128G, the time just grows along with the buffer pool size.

128 G
+-------------------------------+-------+
\| Variable_name \| Value \|
+-------------------------------+-------+
\| Innodb_buffer_pool_pages_free \| 8192 \|
+-------------------------------+-------+
+-------------------------------+---------+
\| Variable_name \| Value \|
+-------------------------------+---------+
\| Innodb_buffer_pool_pages_data \| 8380407 \|
+-------------------------------+---------+
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 137438953472 \|
+---------------------------+
MariaDB [test]> ALTER TABLE t79 DISCARD TABLESPACE;
Query OK, 0 rows affected (3.01 sec)

16 G
+-------------------------------+-------+
\| Variable_name \| Value \|
+-------------------------------+-------+
\| Innodb_buffer_pool_pages_free \| 8192 \|
+-------------------------------+-------+
+-------------------------------+---------+
\| Variable_name \| Value \|
+-------------------------------+---------+
\| Innodb_buffer_pool_pages_data \| 1040375 \|
+-------------------------------+---------+
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 17179869184 \|
+---------------------------+
MariaDB [test]> ALTER TABLE t78 DISCARD TABLESPACE;
Query OK, 0 rows affected (0.51 sec)

130 G
+-------------------------------+-------+
\| Variable_name \| Value \|
+-------------------------------+-------+
\| Innodb_buffer_pool_pages_free \| 8192 \|
+-------------------------------+-------+
+-------------------------------+---------+
\| Variable_name \| Value \|
+-------------------------------+---------+
\| Innodb_buffer_pool_pages_data \| 8511479 \|
+-------------------------------+---------+
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 139586437120 \|
+---------------------------+
MariaDB [test]> ALTER TABLE t77 DISCARD TABLESPACE;
Query OK, 0 rows affected (2.06 sec)

160 G
+-------------------------------+-------+
\| Variable_name \| Value \|
+-------------------------------+-------+
\| Innodb_buffer_pool_pages_free \| 8192 \|
+-------------------------------+-------+
+-------------------------------+----------+
\| Variable_name \| Value \|
+-------------------------------+----------+
\| Innodb_buffer_pool_pages_data \| 10477559 \|
+-------------------------------+----------+
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 171798691840 \|
+---------------------------+
MariaDB [test]> ALTER TABLE t76 DISCARD TABLESPACE;
Query OK, 0 rows affected (2.67 sec)

100 G
+-------------------------------+-------+
\| Variable_name \| Value \|
+-------------------------------+-------+
\| Innodb_buffer_pool_pages_free \| 8192 \|
+-------------------------------+-------+
+-------------------------------+---------+
\| Variable_name \| Value \|
+-------------------------------+---------+
\| Innodb_buffer_pool_pages_data \| 6545371 \|
+-------------------------------+---------+
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 107374182400 \|
+---------------------------+
MariaDB [test]> ALTER TABLE t75 DISCARD TABLESPACE;
Query OK, 0 rows affected (1.97 sec)

160 G
+-------------------------------+-------+
\| Variable_name \| Value \|
+-------------------------------+-------+
\| Innodb_buffer_pool_pages_free \| 8192 \|
+-------------------------------+-------+
+-------------------------------+----------+
\| Variable_name \| Value \|
+-------------------------------+----------+
\| Innodb_buffer_pool_pages_data \| 10477559 \|
+-------------------------------+----------+
+---------------------------+
\| @@innodb_buffer_pool_size \|
+---------------------------+
\| 171798691840 \|
+---------------------------+
MariaDB [test]> ALTER TABLE t74 DISCARD TABLESPACE;
Query OK, 0 rows affected (3.58 sec)

Elena Stepanova added a comment - 2017-07-18 15:33 With full buffer pool, I do see some difference depending on the size, but it does not seem huge. What kind of slowness we are talking about, is it more like a few seconds, or more like minutes? Maybe there is something more to it, server configuration, data specifics or the scenario? I've created ~80 tables, each with 1,000,000 rows. Then every test (with different buffer pool size) goes like this: start server with buffer pool size X; run SELECT COUNT from every table, to populate buffer pool; check buffer pool size and status; run ALTER TABLE .. DISCARD TABLESPACE on one table. The fact that after every test there would be one table less is irrelevant, because in all cases the buffer pool is sufficiently populated. Results are below. In short, I'm getting ~0.5 seconds with 16G buffer pool, and ~3 seconds with 160G. And as far as I can tell, there is no jump after 128G, the time just grows along with the buffer pool size. 128 G + -------------------------------+-------+ | Variable_name | Value | + -------------------------------+-------+ | Innodb_buffer_pool_pages_free | 8192 | + -------------------------------+-------+ + -------------------------------+---------+ | Variable_name | Value | + -------------------------------+---------+ | Innodb_buffer_pool_pages_data | 8380407 | + -------------------------------+---------+ + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 137438953472 | + ---------------------------+ MariaDB [test]> ALTER TABLE t79 DISCARD TABLESPACE; Query OK, 0 rows affected (3.01 sec) 16 G + -------------------------------+-------+ | Variable_name | Value | + -------------------------------+-------+ | Innodb_buffer_pool_pages_free | 8192 | + -------------------------------+-------+ + -------------------------------+---------+ | Variable_name | Value | + -------------------------------+---------+ | Innodb_buffer_pool_pages_data | 1040375 | + -------------------------------+---------+ + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 17179869184 | + ---------------------------+ MariaDB [test]> ALTER TABLE t78 DISCARD TABLESPACE; Query OK, 0 rows affected (0.51 sec) 130 G + -------------------------------+-------+ | Variable_name | Value | + -------------------------------+-------+ | Innodb_buffer_pool_pages_free | 8192 | + -------------------------------+-------+ + -------------------------------+---------+ | Variable_name | Value | + -------------------------------+---------+ | Innodb_buffer_pool_pages_data | 8511479 | + -------------------------------+---------+ + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 139586437120 | + ---------------------------+ MariaDB [test]> ALTER TABLE t77 DISCARD TABLESPACE; Query OK, 0 rows affected (2.06 sec) 160 G + -------------------------------+-------+ | Variable_name | Value | + -------------------------------+-------+ | Innodb_buffer_pool_pages_free | 8192 | + -------------------------------+-------+ + -------------------------------+----------+ | Variable_name | Value | + -------------------------------+----------+ | Innodb_buffer_pool_pages_data | 10477559 | + -------------------------------+----------+ + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 171798691840 | + ---------------------------+ MariaDB [test]> ALTER TABLE t76 DISCARD TABLESPACE; Query OK, 0 rows affected (2.67 sec) 100 G + -------------------------------+-------+ | Variable_name | Value | + -------------------------------+-------+ | Innodb_buffer_pool_pages_free | 8192 | + -------------------------------+-------+ + -------------------------------+---------+ | Variable_name | Value | + -------------------------------+---------+ | Innodb_buffer_pool_pages_data | 6545371 | + -------------------------------+---------+ + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 107374182400 | + ---------------------------+ MariaDB [test]> ALTER TABLE t75 DISCARD TABLESPACE; Query OK, 0 rows affected (1.97 sec) 160 G + -------------------------------+-------+ | Variable_name | Value | + -------------------------------+-------+ | Innodb_buffer_pool_pages_free | 8192 | + -------------------------------+-------+ + -------------------------------+----------+ | Variable_name | Value | + -------------------------------+----------+ | Innodb_buffer_pool_pages_data | 10477559 | + -------------------------------+----------+ + ---------------------------+ | @@innodb_buffer_pool_size | + ---------------------------+ | 171798691840 | + ---------------------------+ MariaDB [test]> ALTER TABLE t74 DISCARD TABLESPACE; Query OK, 0 rows affected (3.58 sec)

Chris Calender (Inactive) added a comment - 2017-07-18 21:31

They are seeing in the 10+ second range. Here is an example from one of their servers:

MariaDB [(none)]> create database test_discard;

Query OK, 1 row affected (0.02 sec)

MariaDB [(none)]> show global variables like 'innodb_buffer_pool_size';

+-------------------------+--------------+

| Variable_name | Value |

+-------------------------+--------------+

| innodb_buffer_pool_size | 549755813888 |

+-------------------------+--------------+

1 row in set (0.02 sec)

MariaDB [(none)]> use test_discard;

Database changed

MariaDB [test_discard]> create table t(id int primary key)engine=InnoDB;

Query OK, 0 rows affected (0.03 sec)

MariaDB [test_discard]> alter table t discard tablespace;

Query OK, 0 rows affected (10.62 sec)

Customer believes that it is related to this bug, which it very well may be:

https://bugs.mysql.com/bug.php?id=68184

In which case Marko may have good idea about it. And it was fixed in MySQL 8.0.

Chris Calender (Inactive) added a comment - 2017-07-18 21:31 They are seeing in the 10+ second range. Here is an example from one of their servers: MariaDB [(none)]> create database test_discard; Query OK, 1 row affected (0.02 sec) MariaDB [(none)]> show global variables like 'innodb_buffer_pool_size'; +-------------------------+--------------+ | Variable_name | Value | +-------------------------+--------------+ | innodb_buffer_pool_size | 549755813888 | +-------------------------+--------------+ 1 row in set (0.02 sec) MariaDB [(none)]> use test_discard; Database changed MariaDB [test_discard]> create table t(id int primary key)engine=InnoDB; Query OK, 0 rows affected (0.03 sec) MariaDB [test_discard]> alter table t discard tablespace; Query OK, 0 rows affected (10.62 sec) Customer believes that it is related to this bug, which it very well may be: https://bugs.mysql.com/bug.php?id=68184 In which case Marko may have good idea about it. And it was fixed in MySQL 8.0.

Elena Stepanova added a comment - 2017-07-18 21:54

Okay, I don't have 512G to try, but i suppose if I get from 16G/0.5 sec to 100G/2 sec and 160G/3 sec, then I can reasonably extrapolate to 512G/10 sec. Thus I'll consider it confirmed.

Elena Stepanova added a comment - 2017-07-18 21:54 Okay, I don't have 512G to try, but i suppose if I get from 16G/0.5 sec to 100G/2 sec and 160G/3 sec, then I can reasonably extrapolate to 512G/10 sec. Thus I'll consider it confirmed.

Jan Lindström (Inactive) added a comment - 2017-09-08 04:50 - edited

I did two tests using 100x LinkBench database that amounts ~1T. I discarded the largest table i.e. linktable ~763G and in these tests I did not use partitioning. Used several range selects to linktable to get pages to buffer pool. Note that machine has 24G main memory and problem seems to be that mysql command line tool does not much know how to restrict its memory size.

Using 5G buffer pool discard took < 1s. Perf data is as follows:

+   63.95%    18.10%  mysqld   mysqld              [.] buf_LRU_block_remove_hashed

+   44.79%     0.13%  mysqld   [kernel.kallsyms]   [k] page_fault

+   44.71%     0.12%  mysqld   [kernel.kallsyms]   [k] do_page_fault

+   44.45%     0.24%  mysqld   [kernel.kallsyms]   [k] __do_page_fault

+   43.63%     0.72%  mysqld   [kernel.kallsyms]   [k] handle_mm_fault

+   42.22%     3.16%  mysqld   [kernel.kallsyms]   [k] do_numa_page

+   36.18%     0.14%  mysqld   [kernel.kallsyms]   [k] migrate_misplaced_page

+   32.04%     0.17%  mysqld   [kernel.kallsyms]   [k] migrate_pages

+   15.36%     0.33%  mysqld   [kernel.kallsyms]   [k] move_to_new_page

+   15.36%    11.15%  mysqld   mysqld              [.] buf_LRU_flush_or_remove_pages

+   12.05%     0.00%  mysqld   [kernel.kallsyms]   [k] migrate_page

+   10.11%    10.11%  mysqld   [kernel.kallsyms]   [k] copy_page_rep

+    8.18%     0.00%  mysqld   [unknown]           [k] 0000000000000000

+    5.89%     0.30%  mysqld   [kernel.kallsyms]   [k] putback_lru_page

+    5.30%     0.00%  mysqld   libaio.so.1.0.1     [.] 0xffff806b830a7644

+    5.15%     0.00%  mysqld   [kernel.kallsyms]   [k] lru_cache_add

+    5.15%     0.38%  mysqld   [kernel.kallsyms]   [k] __lru_cache_add

+    5.06%     0.05%  mysqld   [kernel.kallsyms]   [k] system_call_fastpath

Using 10G buffer pool discard took < 1s. Perf data as follows:

  58.67%    32.41%  mysqld   mysqld              [.] buf_LRU_block_remove_hashed

+   32.98%    22.19%  mysqld   mysqld              [.] buf_LRU_flush_or_remove_pages

+   28.21%     0.23%  mysqld   [kernel.kallsyms]   [k] page_fault

+   27.91%     0.46%  mysqld   [kernel.kallsyms]   [k] do_page_fault

+   27.27%     1.15%  mysqld   [kernel.kallsyms]   [k] __do_page_fault

+   24.26%     1.87%  mysqld   [kernel.kallsyms]   [k] handle_mm_fault

+   21.70%     8.62%  mysqld   [kernel.kallsyms]   [k] do_numa_page

+    7.81%     7.81%  mysqld   [kernel.kallsyms]   [k] irq_return

+    7.19%     0.00%  mysqld   [kernel.kallsyms]   [k] migrate_misplaced_page

+    6.80%     0.07%  mysqld   [kernel.kallsyms]   [k] migrate_pages

Perf files attached. Note the function buf_LRU_flush_or_remove_pages.

Jan Lindström (Inactive) added a comment - 2017-09-08 04:50 - edited I did two tests using 100x LinkBench database that amounts ~1T. I discarded the largest table i.e. linktable ~763G and in these tests I did not use partitioning. Used several range selects to linktable to get pages to buffer pool. Note that machine has 24G main memory and problem seems to be that mysql command line tool does not much know how to restrict its memory size. Using 5G buffer pool discard took < 1s. Perf data is as follows: + 63.95% 18.10% mysqld mysqld [.] buf_LRU_block_remove_hashed + 44.79% 0.13% mysqld [kernel.kallsyms] [k] page_fault + 44.71% 0.12% mysqld [kernel.kallsyms] [k] do_page_fault + 44.45% 0.24% mysqld [kernel.kallsyms] [k] __do_page_fault + 43.63% 0.72% mysqld [kernel.kallsyms] [k] handle_mm_fault + 42.22% 3.16% mysqld [kernel.kallsyms] [k] do_numa_page + 36.18% 0.14% mysqld [kernel.kallsyms] [k] migrate_misplaced_page + 32.04% 0.17% mysqld [kernel.kallsyms] [k] migrate_pages + 15.36% 0.33% mysqld [kernel.kallsyms] [k] move_to_new_page + 15.36% 11.15% mysqld mysqld [.] buf_LRU_flush_or_remove_pages + 12.05% 0.00% mysqld [kernel.kallsyms] [k] migrate_page + 10.11% 10.11% mysqld [kernel.kallsyms] [k] copy_page_rep + 8.18% 0.00% mysqld [unknown] [k] 0000000000000000 + 5.89% 0.30% mysqld [kernel.kallsyms] [k] putback_lru_page + 5.30% 0.00% mysqld libaio.so.1.0.1 [.] 0xffff806b830a7644 + 5.15% 0.00% mysqld [kernel.kallsyms] [k] lru_cache_add + 5.15% 0.38% mysqld [kernel.kallsyms] [k] __lru_cache_add + 5.06% 0.05% mysqld [kernel.kallsyms] [k] system_call_fastpath Using 10G buffer pool discard took < 1s. Perf data as follows: 58.67% 32.41% mysqld mysqld [.] buf_LRU_block_remove_hashed + 32.98% 22.19% mysqld mysqld [.] buf_LRU_flush_or_remove_pages + 28.21% 0.23% mysqld [kernel.kallsyms] [k] page_fault + 27.91% 0.46% mysqld [kernel.kallsyms] [k] do_page_fault + 27.27% 1.15% mysqld [kernel.kallsyms] [k] __do_page_fault + 24.26% 1.87% mysqld [kernel.kallsyms] [k] handle_mm_fault + 21.70% 8.62% mysqld [kernel.kallsyms] [k] do_numa_page + 7.81% 7.81% mysqld [kernel.kallsyms] [k] irq_return + 7.19% 0.00% mysqld [kernel.kallsyms] [k] migrate_misplaced_page + 6.80% 0.07% mysqld [kernel.kallsyms] [k] migrate_pages Perf files attached. Note the function buf_LRU_flush_or_remove_pages.

Marko Mäkelä added a comment - 2017-09-08 07:49

The entry point to these functions in ALTER TABLE…DISCARD TABLESPACE is fil_discard_tablespace(), which invokes fil_delete_tablespace(id, BUF_REMOVE_ALL_NO_WRITE), which in turn invokes buf_LRU_flush_or_remove_pages(id, BUF_REMOVE_ALL_NO_WRITE, NULL).

That will invoke buf_LRU_drop_page_hash_for_tablespace() and buf_LRU_remove_pages(buf_pool, id, BUF_REMOVE_ALL_NO_WRITE, NULL) on every buffer pool. That is where the time is consumed. The intention is to remove all traces of the discarded tablespace ID from the buffer pool.

For DROP TABLE, there were performance fixes in MySQL 5.5, especially MySQL Bug #51325.

Like I remembered, the fix to DROP TABLE performance issues was to defer the eviction of the pages from the buffer pool, and instead let the buffer pool LRU replacement mechanism get rid of the orphan pages. MySQL Bug #68184 probably extended this approach to TRUNCATE TABLE.
A similar approach could indeed be applied to DISCARD TABLESPACE.

Note: If there are adaptive hash index (AHI) entries for the table that is being dropped, truncated or discarded, these must be dropped. The design constraint of the AHI is that it must represent a subset of the index contents (all AHI entries must correspond to existing records). Furthermoer, the AHI depends on the dict_index_t objects of the table, and these would be freed at least by DROP TABLE.

Side note: While looking for the above-mentioned DROP TABLE performance bug, I found two other bugs that were fixed back then, related to ROW_FORMAT=COMPRESSED: MySQL Bug #35077, MySQL Bug #61188.

Side note: the refactored TRUNCATE TABLE in MySQL 5.7 (WL#6501) and in MariaDB 10.2.2+ probably cannot use this approach of deferred deletion, because no new tablespace ID would be assigned to the table. This TRUNCATE is not compatible with hot backup (see ~~MDEV-13564~~) and has introduced some performance issues. If I remember correctly, one of the issues was that some mutex was being held while dropping the adaptive hash index.

Marko Mäkelä added a comment - 2017-09-08 07:49 The entry point to these functions in ALTER TABLE…DISCARD TABLESPACE is fil_discard_tablespace(), which invokes fil_delete_tablespace(id, BUF_REMOVE_ALL_NO_WRITE), which in turn invokes buf_LRU_flush_or_remove_pages(id, BUF_REMOVE_ALL_NO_WRITE, NULL). That will invoke buf_LRU_drop_page_hash_for_tablespace() and buf_LRU_remove_pages(buf_pool, id, BUF_REMOVE_ALL_NO_WRITE, NULL) on every buffer pool. That is where the time is consumed. The intention is to remove all traces of the discarded tablespace ID from the buffer pool. For DROP TABLE, there were performance fixes in MySQL 5.5, especially MySQL Bug #51325 . Like I remembered, the fix to DROP TABLE performance issues was to defer the eviction of the pages from the buffer pool , and instead let the buffer pool LRU replacement mechanism get rid of the orphan pages. MySQL Bug #68184 probably extended this approach to TRUNCATE TABLE. A similar approach could indeed be applied to DISCARD TABLESPACE. Note: If there are adaptive hash index (AHI) entries for the table that is being dropped, truncated or discarded, these must be dropped. The design constraint of the AHI is that it must represent a subset of the index contents (all AHI entries must correspond to existing records). Furthermoer, the AHI depends on the dict_index_t objects of the table, and these would be freed at least by DROP TABLE. Side note: While looking for the above-mentioned DROP TABLE performance bug, I found two other bugs that were fixed back then, related to ROW_FORMAT=COMPRESSED: MySQL Bug #35077 , MySQL Bug #61188 . Side note: the refactored TRUNCATE TABLE in MySQL 5.7 ( WL#6501 ) and in MariaDB 10.2.2+ probably cannot use this approach of deferred deletion, because no new tablespace ID would be assigned to the table. This TRUNCATE is not compatible with hot backup (see MDEV-13564 ) and has introduced some performance issues. If I remember correctly, one of the issues was that some mutex was being held while dropping the adaptive hash index.

Marko Mäkelä added a comment - 2017-11-03 06:58

If DISCARD TABLESPACE does not evict the pages from the buffer pool, then IMPORT TABLESPACE must do it. It would not be a useful fix to simply move the call to the IMPORT TABLESPACE step.

I tried simply removing the eviction in DISCARD TABLESPACE. If the DISCARD is soon followed by IMPORT, the IMPORT would adjust the imported file to the pre-existing tablespace ID. If the old pages were not evicted from the buffer pool at DISCARD, then after IMPORT we could incorrectly read old pages from the buffer pool, instead of reading the pages that exist in the imported file.

Import is initially bypassing the buffer pool when reading pages for the adjustment phase. A possible fix is that in the adjustment phase, if a page exists in the buffer pool, we would replace it with the page from the imported file. This would allow quick DISCARD TABLESPACE even for large tables, and a slightly slower IMPORT TABLESPACE (the slowness being proportional to the file size).

Marko Mäkelä added a comment - 2017-11-03 06:58 If DISCARD TABLESPACE does not evict the pages from the buffer pool, then IMPORT TABLESPACE must do it. It would not be a useful fix to simply move the call to the IMPORT TABLESPACE step. I tried simply removing the eviction in DISCARD TABLESPACE. If the DISCARD is soon followed by IMPORT, the IMPORT would adjust the imported file to the pre-existing tablespace ID. If the old pages were not evicted from the buffer pool at DISCARD, then after IMPORT we could incorrectly read old pages from the buffer pool, instead of reading the pages that exist in the imported file. Import is initially bypassing the buffer pool when reading pages for the adjustment phase. A possible fix is that in the adjustment phase, if a page exists in the buffer pool, we would replace it with the page from the imported file. This would allow quick DISCARD TABLESPACE even for large tables, and a slightly slower IMPORT TABLESPACE (the slowness being proportional to the file size).

Marko Mäkelä added a comment - 2017-11-06 21:11

I have pushed this fix to 10.0 and merged to 10.1 so far.
I spent some time merging this to 10.2.

In 10.2, there is a merge conflict for the TRUNCATE TABLE code, as anticipated.
And the solution ought to be good news for TRUNCATE performance: There should be no need to evict adaptive hash index entries or old pages from the buffer pool during TRUNCATE. I believe that it suffices to ensure that buf_page_create() or equivalent is being called when initializing new pages after TRUNCATE, and that TRUNCATE will edit the dict_index_t objects in place.

Marko Mäkelä added a comment - 2017-11-06 21:11 I have pushed this fix to 10.0 and merged to 10.1 so far. I spent some time merging this to 10.2. In 10.2, there is a merge conflict for the TRUNCATE TABLE code, as anticipated. And the solution ought to be good news for TRUNCATE performance: There should be no need to evict adaptive hash index entries or old pages from the buffer pool during TRUNCATE. I believe that it suffices to ensure that buf_page_create() or equivalent is being called when initializing new pages after TRUNCATE, and that TRUNCATE will edit the dict_index_t objects in place.

Chris Calender (Inactive) added a comment - 2018-05-22 17:50

I need to re-open this one.

After testing with 10.1.29, we are still seeing slowness with 512G buffer pool size:

MariaDB [test]> drop table if exists t;

Query OK, 0 rows affected, 1 warning (0.00 sec)

MariaDB [test]> create table t(id int) engine=InnoDB;

Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> alter table t discard tablespace;

Query OK, 0 rows affected (7.34 sec)

MariaDB [test]> drop table if exists t;

Query OK, 0 rows affected (0.22 sec)

MariaDB [test]> select @@version;

+-----------------+

| @@version |

+-----------------+

| 10.1.29-MariaDB |

+-----------------+

1 row in set (0.01 sec)

MariaDB [test]> select @@innodb_buffer_pool_size;

+---------------------------+

| @@innodb_buffer_pool_size |

+---------------------------+

| 549755813888 |

+---------------------------+

1 row in set (0.00 sec)

Is there anything further that can be done here?

Chris Calender (Inactive) added a comment - 2018-05-22 17:50 I need to re-open this one. After testing with 10.1.29, we are still seeing slowness with 512G buffer pool size: MariaDB [test]> drop table if exists t; Query OK, 0 rows affected, 1 warning (0.00 sec) MariaDB [test]> create table t(id int) engine=InnoDB; Query OK, 0 rows affected (0.00 sec) MariaDB [test]> alter table t discard tablespace; Query OK, 0 rows affected (7.34 sec) MariaDB [test]> drop table if exists t; Query OK, 0 rows affected (0.22 sec) MariaDB [test]> select @@version; +-----------------+ | @@version | +-----------------+ | 10.1.29-MariaDB | +-----------------+ 1 row in set (0.01 sec) MariaDB [test]> select @@innodb_buffer_pool_size; +---------------------------+ | @@innodb_buffer_pool_size | +---------------------------+ | 549755813888 | +---------------------------+ 1 row in set (0.00 sec) Is there anything further that can be done here?

Marko Mäkelä added a comment - 2018-05-24 12:22

ccalender, I filed ~~MDEV-16283~~ for what I believe to be the remaining issue. With the smaller buffer pool that I used, the DISCARD TABLESPACE was still rather fast.

Let us track the progress in ~~MDEV-16283~~.

Marko Mäkelä added a comment - 2018-05-24 12:22 ccalender , I filed MDEV-16283 for what I believe to be the remaining issue. With the smaller buffer pool that I used, the DISCARD TABLESPACE was still rather fast. Let us track the progress in MDEV-16283 .

Marko Mäkelä added a comment - 2018-05-30 07:16

The original fix (~~MDEV-13328~~) was that we would not evict the pages of the discarded tablespace from the buffer pool; we would do that on a subsequent IMPORT TABLESPACE only. But, we would still scan the buffer pool for adaptive hash index entries, to drop them. It actually suffices to drop the adaptive hash index entries on IMPORT TABLESPACE or on DROP TABLE or DROP INDEX. And there is no need to scan the buffer pool if index->search_info->ref_count==0 for all indexes of the table. ~~MDEV-16283~~ implemented these fixes.

Marko Mäkelä added a comment - 2018-05-30 07:16 The original fix ( MDEV-13328 ) was that we would not evict the pages of the discarded tablespace from the buffer pool; we would do that on a subsequent IMPORT TABLESPACE only. But, we would still scan the buffer pool for adaptive hash index entries, to drop them. It actually suffices to drop the adaptive hash index entries on IMPORT TABLESPACE or on DROP TABLE or DROP INDEX . And there is no need to scan the buffer pool if index->search_info->ref_count==0 for all indexes of the table. MDEV-16283 implemented these fixes.

People

Assignee:: Marko Mäkelä

Reporter:: Chris Calender (Inactive)

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2017-07-14 19:03

Updated:: 2024-07-08 00:40

Resolved:: 2018-05-24 12:22

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Git Integration