Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23017

range query performance regression in 10.5.4

Details

    Description

      The regression test suite reports rather severe performance regressions in MariaDB 10.5.4 vs. 10.5.3. It looks like it is genuinely for the sysbench OLTP range queries. Example:

      Test 't_collate_distinct_range_utf8_general' - sysbench OLTP readonly
      selecting distinct rows from short range, collation utf8_general_ci
      1 table, 1 mio rows, engine InnoDB/XtraDB (builtin)
      numbers are queries per second
       
      #thread count           1       8       16      32      64      128     256
      mariadb-10.5.3          7840.1  53249   92616   149360  149130  149435  148546
      mariadb-10.5.4          7794.3  52208   91235   131122  137925  128168  94999
      

      Attachments

        1. MDEV-23017.pdf
          29 kB
        2. range_distinct.png
          range_distinct.png
          5 kB
        3. range_ordered.png
          range_ordered.png
          5 kB
        4. range_simple.png
          range_simple.png
          5 kB
        5. range_sum.png
          range_sum.png
          5 kB
        6. simpe_range4.png
          simpe_range4.png
          4 kB

        Issue Links

          Activity

            axel Axel Schwenke added a comment -

            I verified that it is not purge. For that reason I started a dummy TRANSACTION WITH CONSISTENT SNAPSHOT even before loading the tables. Then I ran the benchmark and finally killed the client holding the open transaction. This should reliably stop all purge activity during the benchmark.

            axel Axel Schwenke added a comment - I verified that it is not purge. For that reason I started a dummy TRANSACTION WITH CONSISTENT SNAPSHOT even before loading the tables. Then I ran the benchmark and finally killed the client holding the open transaction. This should reliably stop all purge activity during the benchmark.
            axel Axel Schwenke added a comment -

            I traced the regression through the commits between tags mariadb-10.5.3 and mariadb-10.5.4. The first bad commit is b1ab211dee599eabd9a5b886fafa3adea29ae041.

            Steps to reproduce. Load sysbench tables:

            sysbench --test=oltp.lua --mysql-table-engine=InnoDB --oltp_tables_count=1 \
            --oltp-table-size=200000 prepare
            

            run benchmark:

            sysbench --test=oltp.lua --oltp_tables_count=1 --oltp-table-size=200000 \
            --oltp-read-only=on --oltp_point_selects=0 --oltp_simple_ranges=10 \
            --oltp_sum_ranges=0 --oltp_order_ranges=0 --oltp_distinct_ranges=0 \
            --oltp_range_size=10 --num-threads=32 --max-requests=0 --max-time=30 \
            --report-interval=2 run
            

            No special my.cnf needed. A good commit gives ~20000 tps, a bad one ~3500 tps.

            If the server is stopped and restarted with the existing datadir, a bad commit gives the same performance as a good one too.

            axel Axel Schwenke added a comment - I traced the regression through the commits between tags mariadb-10.5.3 and mariadb-10.5.4 . The first bad commit is b1ab211dee599eabd9a5b886fafa3adea29ae041 . Steps to reproduce. Load sysbench tables: sysbench --test=oltp.lua --mysql-table-engine=InnoDB --oltp_tables_count=1 \ --oltp-table-size=200000 prepare run benchmark: sysbench --test=oltp.lua --oltp_tables_count=1 --oltp-table-size=200000 \ --oltp-read-only=on --oltp_point_selects=0 --oltp_simple_ranges=10 \ --oltp_sum_ranges=0 --oltp_order_ranges=0 --oltp_distinct_ranges=0 \ --oltp_range_size=10 --num-threads=32 --max-requests=0 --max-time=30 \ --report-interval=2 run No special my.cnf needed. A good commit gives ~20000 tps, a bad one ~3500 tps. If the server is stopped and restarted with the existing datadir, a bad commit gives the same performance as a good one too.

            Oh, that's an elephant sized commit

            wlad Vladislav Vaintroub added a comment - Oh, that's an elephant sized commit

            Based on some perf record traces, the problem is in the following:

            	const bool first_access = fix_block->page.set_accessed();
            

            The condition is actually negated! And I do not think that buf_page_optimistic_get() ever needs to initiate read-ahead. axel, can you please test the following patch:

            diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
            index 620c224d4b6..4aaf374da95 100644
            --- a/storage/innobase/buf/buf0buf.cc
            +++ b/storage/innobase/buf/buf0buf.cc
            @@ -3516,7 +3516,7 @@ buf_page_get_low(
             	      || mode == BUF_PEEK_IF_IN_POOL
             	      || fix_block->page.status != buf_page_t::FREED);
             
            -	const bool first_access = fix_block->page.set_accessed();
            +	const bool not_first_access = fix_block->page.set_accessed();
             
             	if (mode != BUF_PEEK_IF_IN_POOL) {
             		buf_page_make_young_if_needed(&fix_block->page);
            @@ -3571,7 +3571,7 @@ buf_page_get_low(
             					      file, line);
             	}
             
            -	if (mode != BUF_PEEK_IF_IN_POOL && first_access) {
            +	if (!not_first_access && mode != BUF_PEEK_IF_IN_POOL) {
             		/* In the case of a first access, try to apply linear
             		read-ahead */
             
            @@ -3678,7 +3678,7 @@ buf_page_optimistic_get(
             	buf_block_buf_fix_inc(block, file, line);
             	hash_lock->read_unlock();
             
            -	const bool first_access = block->page.set_accessed();
            +	block->page.set_accessed();
             
             	buf_page_make_young_if_needed(&block->page);
             
            @@ -3723,13 +3723,6 @@ buf_page_optimistic_get(
             	ut_ad(block->page.buf_fix_count());
             	ut_ad(block->page.state() == BUF_BLOCK_FILE_PAGE);
             
            -	if (first_access) {
            -		/* In the case of a first access, try to apply linear
            -		read-ahead */
            -		buf_read_ahead_linear(block->page.id(), block->zip_size(),
            -				      ibuf_inside(mtr));
            -	}
            -
             	buf_pool.stat.n_page_gets++;
             
             	return(TRUE);
            

            marko Marko Mäkelä added a comment - Based on some perf record traces, the problem is in the following: const bool first_access = fix_block->page.set_accessed(); The condition is actually negated! And I do not think that buf_page_optimistic_get() ever needs to initiate read-ahead. axel , can you please test the following patch: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 620c224d4b6..4aaf374da95 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -3516,7 +3516,7 @@ buf_page_get_low( || mode == BUF_PEEK_IF_IN_POOL || fix_block->page.status != buf_page_t::FREED); - const bool first_access = fix_block->page.set_accessed(); + const bool not_first_access = fix_block->page.set_accessed(); if (mode != BUF_PEEK_IF_IN_POOL) { buf_page_make_young_if_needed(&fix_block->page); @@ -3571,7 +3571,7 @@ buf_page_get_low( file, line); } - if (mode != BUF_PEEK_IF_IN_POOL && first_access) { + if (!not_first_access && mode != BUF_PEEK_IF_IN_POOL) { /* In the case of a first access, try to apply linear read-ahead */ @@ -3678,7 +3678,7 @@ buf_page_optimistic_get( buf_block_buf_fix_inc(block, file, line); hash_lock->read_unlock(); - const bool first_access = block->page.set_accessed(); + block->page.set_accessed(); buf_page_make_young_if_needed(&block->page); @@ -3723,13 +3723,6 @@ buf_page_optimistic_get( ut_ad(block->page.buf_fix_count()); ut_ad(block->page.state() == BUF_BLOCK_FILE_PAGE); - if (first_access) { - /* In the case of a first access, try to apply linear - read-ahead */ - buf_read_ahead_linear(block->page.id(), block->zip_size(), - ibuf_inside(mtr)); - } - buf_pool.stat.n_page_gets++; return(TRUE);
            axel Axel Schwenke added a comment -

            That looks good. I have rerun all the tests from the regression test suite that showed a regression and now all numbers are back to normal. See attachment MDEV-23017.pdf

            axel Axel Schwenke added a comment - That looks good. I have rerun all the tests from the regression test suite that showed a regression and now all numbers are back to normal. See attachment MDEV-23017.pdf

            People

              marko Marko Mäkelä
              axel Axel Schwenke
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.