[MDEV-23017] range query performance regression in 10.5.4 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5.4
Fix Version/s: 10.5.5
Component/s: Storage Engine - InnoDB
Labels:
None

Description

The regression test suite reports rather severe performance regressions in MariaDB 10.5.4 vs. 10.5.3. It looks like it is genuinely for the sysbench OLTP range queries. Example:

Test 't_collate_distinct_range_utf8_general' - sysbench OLTP readonly

selecting distinct rows from short range, collation utf8_general_ci

1 table, 1 mio rows, engine InnoDB/XtraDB (builtin)

numbers are queries per second

#thread count           1       8       16      32      64      128     256

mariadb-10.5.3          7840.1  53249   92616   149360  149130  149435  148546

mariadb-10.5.4          7794.3  52208   91235   131122  137925  128168  94999

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-23017.pdf
29 kB
2020-07-02 07:15
range_distinct.png
5 kB
2020-06-30 12:20
range_ordered.png
5 kB
2020-06-30 12:20
range_simple.png
5 kB
2020-06-30 12:20
range_sum.png
5 kB
2020-06-30 12:20
simpe_range4.png
4 kB
2020-07-01 08:34

Issue Links

is caused by

MDEV-15053 Reduce buf_pool_t::mutex contention

Closed

relates to

MDEV-515 innodb bulk insert

Closed

MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC

Closed

Activity

Ascending order - Click to sort in descending order

View 9 older comments

Axel Schwenke added a comment - 2020-07-01 14:02

I verified that it is not purge. For that reason I started a dummy TRANSACTION WITH CONSISTENT SNAPSHOT even before loading the tables. Then I ran the benchmark and finally killed the client holding the open transaction. This should reliably stop all purge activity during the benchmark.

Axel Schwenke added a comment - 2020-07-01 14:02 I verified that it is not purge. For that reason I started a dummy TRANSACTION WITH CONSISTENT SNAPSHOT even before loading the tables. Then I ran the benchmark and finally killed the client holding the open transaction. This should reliably stop all purge activity during the benchmark.

Axel Schwenke added a comment - 2020-07-01 14:59

I traced the regression through the commits between tags mariadb-10.5.3 and mariadb-10.5.4. The first bad commit is b1ab211dee599eabd9a5b886fafa3adea29ae041.

Steps to reproduce. Load sysbench tables:

sysbench --test=oltp.lua --mysql-table-engine=InnoDB --oltp_tables_count=1 \

--oltp-table-size=200000 prepare

run benchmark:

sysbench --test=oltp.lua --oltp_tables_count=1 --oltp-table-size=200000 \

--oltp-read-only=on --oltp_point_selects=0 --oltp_simple_ranges=10 \

--oltp_sum_ranges=0 --oltp_order_ranges=0 --oltp_distinct_ranges=0 \

--oltp_range_size=10 --num-threads=32 --max-requests=0 --max-time=30 \

--report-interval=2 run

No special my.cnf needed. A good commit gives ~20000 tps, a bad one ~3500 tps.

If the server is stopped and restarted with the existing datadir, a bad commit gives the same performance as a good one too.

Axel Schwenke added a comment - 2020-07-01 14:59 I traced the regression through the commits between tags mariadb-10.5.3 and mariadb-10.5.4 . The first bad commit is b1ab211dee599eabd9a5b886fafa3adea29ae041 . Steps to reproduce. Load sysbench tables: sysbench --test=oltp.lua --mysql-table-engine=InnoDB --oltp_tables_count=1 \ --oltp-table-size=200000 prepare run benchmark: sysbench --test=oltp.lua --oltp_tables_count=1 --oltp-table-size=200000 \ --oltp-read-only=on --oltp_point_selects=0 --oltp_simple_ranges=10 \ --oltp_sum_ranges=0 --oltp_order_ranges=0 --oltp_distinct_ranges=0 \ --oltp_range_size=10 --num-threads=32 --max-requests=0 --max-time=30 \ --report-interval=2 run No special my.cnf needed. A good commit gives ~20000 tps, a bad one ~3500 tps. If the server is stopped and restarted with the existing datadir, a bad commit gives the same performance as a good one too.

Vladislav Vaintroub added a comment - 2020-07-01 16:32

Oh, that's an elephant sized commit

Vladislav Vaintroub added a comment - 2020-07-01 16:32 Oh, that's an elephant sized commit

Marko Mäkelä added a comment - 2020-07-01 16:43

Based on some perf record traces, the problem is in the following:

	const bool first_access = fix_block->page.set_accessed();

The condition is actually negated! And I do not think that buf_page_optimistic_get() ever needs to initiate read-ahead. axel, can you please test the following patch:

diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc

index 620c224d4b6..4aaf374da95 100644

--- a/storage/innobase/buf/buf0buf.cc

+++ b/storage/innobase/buf/buf0buf.cc

@@ -3516,7 +3516,7 @@ buf_page_get_low(

 	      || mode == BUF_PEEK_IF_IN_POOL

 	      || fix_block->page.status != buf_page_t::FREED);

-	const bool first_access = fix_block->page.set_accessed();

+	const bool not_first_access = fix_block->page.set_accessed();

 	if (mode != BUF_PEEK_IF_IN_POOL) {

 		buf_page_make_young_if_needed(&fix_block->page);

@@ -3571,7 +3571,7 @@ buf_page_get_low(

 					      file, line);

-	if (mode != BUF_PEEK_IF_IN_POOL && first_access) {

+	if (!not_first_access && mode != BUF_PEEK_IF_IN_POOL) {

 		/* In the case of a first access, try to apply linear

 		read-ahead */

@@ -3678,7 +3678,7 @@ buf_page_optimistic_get(

 	buf_block_buf_fix_inc(block, file, line);

 	hash_lock->read_unlock();

-	const bool first_access = block->page.set_accessed();

+	block->page.set_accessed();

 	buf_page_make_young_if_needed(&block->page);

@@ -3723,13 +3723,6 @@ buf_page_optimistic_get(

 	ut_ad(block->page.buf_fix_count());

 	ut_ad(block->page.state() == BUF_BLOCK_FILE_PAGE);

-	if (first_access) {

-		/* In the case of a first access, try to apply linear

-		read-ahead */

-		buf_read_ahead_linear(block->page.id(), block->zip_size(),

-				      ibuf_inside(mtr));

-	}

 	buf_pool.stat.n_page_gets++;

 	return(TRUE);

Marko Mäkelä added a comment - 2020-07-01 16:43 Based on some perf record traces, the problem is in the following: const bool first_access = fix_block->page.set_accessed(); The condition is actually negated! And I do not think that buf_page_optimistic_get() ever needs to initiate read-ahead. axel , can you please test the following patch: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 620c224d4b6..4aaf374da95 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -3516,7 +3516,7 @@ buf_page_get_low( || mode == BUF_PEEK_IF_IN_POOL || fix_block->page.status != buf_page_t::FREED); - const bool first_access = fix_block->page.set_accessed(); + const bool not_first_access = fix_block->page.set_accessed(); if (mode != BUF_PEEK_IF_IN_POOL) { buf_page_make_young_if_needed(&fix_block->page); @@ -3571,7 +3571,7 @@ buf_page_get_low( file, line); } - if (mode != BUF_PEEK_IF_IN_POOL && first_access) { + if (!not_first_access && mode != BUF_PEEK_IF_IN_POOL) { /* In the case of a first access, try to apply linear read-ahead */ @@ -3678,7 +3678,7 @@ buf_page_optimistic_get( buf_block_buf_fix_inc(block, file, line); hash_lock->read_unlock(); - const bool first_access = block->page.set_accessed(); + block->page.set_accessed(); buf_page_make_young_if_needed(&block->page); @@ -3723,13 +3723,6 @@ buf_page_optimistic_get( ut_ad(block->page.buf_fix_count()); ut_ad(block->page.state() == BUF_BLOCK_FILE_PAGE); - if (first_access) { - /* In the case of a first access, try to apply linear - read-ahead */ - buf_read_ahead_linear(block->page.id(), block->zip_size(), - ibuf_inside(mtr)); - } - buf_pool.stat.n_page_gets++; return(TRUE);

Axel Schwenke added a comment - 2020-07-02 07:17

That looks good. I have rerun all the tests from the regression test suite that showed a regression and now all numbers are back to normal. See attachment MDEV-23017.pdf

Axel Schwenke added a comment - 2020-07-02 07:17 That looks good. I have rerun all the tests from the regression test suite that showed a regression and now all numbers are back to normal. See attachment MDEV-23017.pdf

MariaDB Server

range query performance regression in 10.5.4

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration