Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31939

Adaptive flush recommendation ignores dirty ratio and checkpoint age

Details

    Description

      The follow-up fix referenced in a comment on MDEV-26827 changed buf_flush_page_cleaner to pass an additional parameter pct_lwm to page_cleaner_flush_pages_recommendation, documented as innodb_max_dirty_pages_pct_lwm, or 0 to ignore it.

      As far as I am able to tell from the code, there is no code path where pct_lwm can ever be anything but 0. Here is the relevant code as of 10.6.15, stripped down to relevant parts and control flow constructs:

          double pct_lwm= 0.0;
          // ...
          pct_lwm= srv_max_dirty_pages_pct_lwm;
          if (pct_lwm != 0.0)
          {
            // ...
            if (activity_count != last_activity_count)
            {
              // ...
              goto maybe_unemployed;
            }
            else if (buf_pool.page_cleaner_idle() && !os_aio_pending_reads())
            {
              // ...
              goto idle_flush;
            }
            else
            {
            maybe_unemployed:
              const bool below{dirty_pct < pct_lwm};
              pct_lwm= 0.0;
              if (below)
                goto possibly_unemployed;
            }
          }
          else if (dirty_pct < srv_max_buf_pool_modified_pct)
          possibly_unemployed:
            if (!soft_lsn_limit && !af_needed_for_redo(oldest_lsn))
              goto unemployed;
      

      These are the only assignments to pct_lwm, which is then passed to page_cleaner_flush_pages_recommendation shortly after this block.

      It appears the only way to have pct_lwm to be non-zero at the end of this block is:
      1. Have srv_max_dirty_pages_pct_lwm non-zero
      2. Make it out of the if (pct_lwm != 0.0) block without hitting the pct_lwm= 0.0 line in the else case.

      But #2 appears to be impossible: the inner if case does a goto to the body of the else case (which zeros the value), and the else if case jumps to idle_flush which bypasses the page_cleaner_flush_pages_recommendation call.

      Inside page_cleaner_flush_pages_recommendation we have:

              double total_ratio;
              if (pct_lwm == 0.0 || max_pct == 0.0) {
                      total_ratio = 1;
              } else {
                      total_ratio = std::max(double(pct_for_lsn) / 100,
                                             (dirty_pct / max_pct));
              }
      

      If pct_lwm is always 0, then total_ratio is always 1 and dirty_pct and pct_for_lsn have no effect.

      I had a little bit of trouble tracking all the possible control flow paths in buf_flush_page_cleaner due to all the gotos, so I tested this theory by adding DBUG_ASSERT(pct_lwm == 0.0) immediately before the previous snippet. The debug build with that assert passes:

      ./mtr --mariadbd=--innodb-max-dirty-pages-pct-lwm=0.001 --mariadbd=--innodb-adaptive-flushing-lwm=0.001 --suites=innodb-
      

      which I would expect to trip that assert fairly quickly?

      I started looking into this because I noticed that flushing on our databases (on 10.6.14) didn't seem to be speeding up as the checkpoint age approached max_modified_age_async, leading to periodic spikes of rapid flushing. I went reading through the code to try to understand what was going on and noticed this, but I'm not sure it's the only factor.

      Attachments

        Issue Links

          Activity

            Thank you for the report. I can confirm this logic flaw.

            In this commit I eliminated the local variable bool idle_flush of buf_flush_page_cleaner(), trying to assign pct_lwm=0.0 whenever that variable would have been set.

            I am considering the following fix. I did not test it yet. I would appreciate your feedback.

            diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
            index 8e2b80b9512..1ac8fd15a57 100644
            --- a/storage/innobase/buf/buf0flu.cc
            +++ b/storage/innobase/buf/buf0flu.cc
            @@ -2442,10 +2442,11 @@ static void buf_flush_page_cleaner()
                   else
                   {
                   maybe_unemployed:
            -        const bool below{dirty_pct < pct_lwm};
            -        pct_lwm= 0.0;
            -        if (below)
            +        if (dirty_pct < pct_lwm)
            +        {
            +          pct_lwm= 0.0;
                       goto possibly_unemployed;
            +        }
                   }
                 }
                 else if (dirty_pct < srv_max_buf_pool_modified_pct)
            

            Note that there is also another adaptive flushing based on the log checkpoint age, that is, the difference between the latest checkpoint LSN (more precisely, the oldest pending modification in the buffer pool) and the current LSN. The condition related to that would be af_needed_for_redo(oldest_lsn).

            marko Marko Mäkelä added a comment - Thank you for the report. I can confirm this logic flaw. In this commit I eliminated the local variable bool idle_flush of buf_flush_page_cleaner() , trying to assign pct_lwm=0.0 whenever that variable would have been set. I am considering the following fix. I did not test it yet. I would appreciate your feedback. diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc index 8e2b80b9512..1ac8fd15a57 100644 --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -2442,10 +2442,11 @@ static void buf_flush_page_cleaner() else { maybe_unemployed: - const bool below{dirty_pct < pct_lwm}; - pct_lwm= 0.0; - if (below) + if (dirty_pct < pct_lwm) + { + pct_lwm= 0.0; goto possibly_unemployed; + } } } else if (dirty_pct < srv_max_buf_pool_modified_pct) Note that there is also another adaptive flushing based on the log checkpoint age, that is, the difference between the latest checkpoint LSN (more precisely, the oldest pending modification in the buffer pool) and the current LSN. The condition related to that would be af_needed_for_redo(oldest_lsn) .

            wlad, can you please review the above patch? I tested it with some additional output during a 60-second Sysbench oltp_update_index workload:

            @@ -2486,6 +2487,8 @@ static void buf_flush_page_cleaner()
                   mysql_mutex_lock(&buf_pool.mutex);
                   last_pages= n_flushed= buf_flush_list_holding_mutex(n);
                   page_cleaner.flush_time+= ut_time_ms() - tm;
            +      sql_print_information("InnoDB: flush %zu, %zu, %g",
            +                            n, page_cleaner.flush_time, pct_lwm);
                   MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
                                                MONITOR_FLUSH_ADAPTIVE_COUNT,
                                                MONITOR_FLUSH_ADAPTIVE_PAGES,
            

            With the patch, all calls will display pct_lwm=10, which corresponds to the non-default parameter innodb_max_dirty_pages_pct_lwm=10 that I had specified. If I specify innodb_adaptive_flushing=OFF, then there will be no output. If I revert the patch, all output will be with pct_lwm=0, as reported, and the number of pages flushed will be smaller:

            10.6 f074223ae730cfa12511090f2257d9779e867181 with the output

            Version: '10.6.17-MariaDB'  socket: '/dev/shm/sbtest/mysqld.sock'  port: 3306  Source distribution
            2023-12-05 14:05:18 0 [Note] InnoDB: flush 200, 3, 0
            2023-12-05 14:05:19 0 [Note] InnoDB: flush 200, 15, 0
            2023-12-05 14:05:20 0 [Note] InnoDB: flush 67, 17, 0
            2023-12-05 14:05:21 0 [Note] InnoDB: flush 67, 19, 0
            2023-12-05 14:05:22 0 [Note] InnoDB: flush 67, 20, 0
            2023-12-05 14:05:23 0 [Note] InnoDB: flush 67, 23, 0
            2023-12-05 14:05:24 0 [Note] InnoDB: flush 67, 27, 0
            2023-12-05 14:05:25 0 [Note] InnoDB: flush 70, 30, 0
            2023-12-05 14:05:26 0 [Note] InnoDB: flush 67, 33, 0
            …
            2023-12-05 14:05:48 0 [Note] InnoDB: flush 67, 56, 0
            2023-12-05 14:05:49 0 [Note] InnoDB: flush 300, 4, 0
            2023-12-05 14:05:50 0 [Note] InnoDB: flush 300, 6, 0
            2023-12-05 14:05:56 0 [Note] InnoDB: flush 136, 10, 0
            … skip lines where the first number varies between 94 and 177
            2023-12-05 14:06:01 0 [Note] InnoDB: flush 90, 83, 0
            2023-12-05 14:06:01 0 [Note] InnoDB: flush 121, 87, 0
            2023-12-05 14:06:02 0 [Note] InnoDB: flush 143, 90, 0
            2023-12-05 14:06:14 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
            

            10.6 f074223ae730cfa12511090f2257d9779e867181 with the output and the fix

            Version: '10.6.17-MariaDB'  socket: '/dev/shm/sbtest/mysqld.sock'  port: 3306  Source distribution
            2023-12-05 14:01:22 0 [Note] InnoDB: flush 200, 7, 10
            2023-12-05 14:01:23 0 [Note] InnoDB: flush 200, 11, 10
            2023-12-05 14:01:24 0 [Note] InnoDB: flush 47, 12, 10
            2023-12-05 14:01:25 0 [Note] InnoDB: flush 57, 14, 10
            2023-12-05 14:01:26 0 [Note] InnoDB: flush 67, 18, 10
            2023-12-05 14:01:27 0 [Note] InnoDB: flush 85, 23, 10
            2023-12-05 14:01:28 0 [Note] InnoDB: flush 98, 25, 10
            2023-12-05 14:01:29 0 [Note] InnoDB: flush 119, 28, 10
            2023-12-05 14:01:30 0 [Note] InnoDB: flush 131, 30, 10
            2023-12-05 14:01:31 0 [Note] InnoDB: flush 146, 32, 10
            2023-12-05 14:01:32 0 [Note] InnoDB: flush 184, 34, 10
            2023-12-05 14:01:33 0 [Note] InnoDB: flush 184, 36, 10
            2023-12-05 14:01:34 0 [Note] InnoDB: flush 200, 38, 10
            2023-12-05 14:01:35 0 [Note] InnoDB: flush 216, 41, 10
            2023-12-05 14:01:36 0 [Note] InnoDB: flush 233, 44, 10
            2023-12-05 14:01:37 0 [Note] InnoDB: flush 259, 51, 10
            2023-12-05 14:01:38 0 [Note] InnoDB: flush 277, 55, 10
            … skip lines where the first number steadily grows
            2023-12-05 14:02:07 0 [Note] InnoDB: flush 861, 338, 10
            2023-12-05 14:02:08 0 [Note] InnoDB: flush 884, 355, 10
            2023-12-05 14:02:09 0 [Note] InnoDB: flush 928, 367, 10
            2023-12-05 14:02:12 0 [Note] InnoDB: flush 835, 389, 10
            2023-12-05 14:02:18 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
            

            The buffer pool size was 2GiB of 16KiB pages (131072 pages), so we were nowhere close to writing 10% of the buffer pool per second. This was with the default innodb_io_capacity on a RAM disk, running Sysbench oltp_update_index at 256 connections, with the data set growing to 3GiB during the workload. I changed the instrumentation to dump a bit more information:

            @@ -2486,6 +2487,10 @@ static void buf_flush_page_cleaner()
                   mysql_mutex_lock(&buf_pool.mutex);
                   last_pages= n_flushed= buf_flush_list_holding_mutex(n);
                   page_cleaner.flush_time+= ut_time_ms() - tm;
            +      sql_print_information("InnoDB: flush %zu, %zu, %zu, %g, %zu/%zu",
            +                            n, n_flushed, page_cleaner.flush_time, pct_lwm,
            +                            buf_pool.flush_list.count,
            +                            buf_pool.LRU.count);
                   MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
                                                MONITOR_FLUSH_ADAPTIVE_COUNT,
                                                MONITOR_FLUSH_ADAPTIVE_PAGES,
            

            During the workload, the number of dirty/total pages in the buffer pool steadily grows after an initial dip until almost the end of the 60-second workload:

            10.6 f074223ae730cfa12511090f2257d9779e867181 with the fix and the revised output

            2023-12-05 14:16:18 0 [Note] InnoDB: flush 200, 200, 7, 10, 15125/15313
            2023-12-05 14:16:19 0 [Note] InnoDB: flush 200, 200, 11, 10, 18850/19059
            2023-12-05 14:16:20 0 [Note] InnoDB: flush 46, 46, 17, 10, 22284/22500
            2023-12-05 14:16:21 0 [Note] InnoDB: flush 57, 57, 23, 10, 25597/25817
            2023-12-05 14:16:22 0 [Note] InnoDB: flush 67, 67, 28, 10, 29078/29308
            2023-12-05 14:16:23 0 [Note] InnoDB: flush 85, 85, 30, 10, 32626/32861
            2023-12-05 14:16:24 0 [Note] InnoDB: flush 98, 98, 32, 10, 36415/36657
            2023-12-05 14:16:25 0 [Note] InnoDB: flush 122, 122, 40, 10, 39971/40257
            … skip lines with monotonic growth of the numbers
            2023-12-05 14:16:48 0 [Note] InnoDB: flush 589, 589, 229, 10, 113861/121145
            2023-12-05 14:16:49 0 [Note] InnoDB: flush 882, 882, 26, 10, 116936/124702
            2023-12-05 14:16:50 0 [Note] InnoDB: flush 906, 906, 49, 10, 119658/128206
            2023-12-05 14:16:56 0 [Note] InnoDB: flush 906, 906, 74, 10, 127997/129250
            2023-12-05 14:16:57 0 [Note] InnoDB: flush 857, 857, 84, 10, 126538/128291
            2023-12-05 14:17:14 0 [Note] InnoDB: flush 431, 431, 92, 10, 57742/129792
            2023-12-05 14:17:14 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
            

            We can see that we actually wrote out the recommended number of pages n per second. At all times, something like 90% of the buffer pool was dirty, so the pct_lwm=10 threshold was exceeded and we should write out total_ratio*innodb_io_capacity pages per second, up to innodb_max_io_capacity pages, whose default value computes to 2000 pages per second. The default innodb_io_capacity is 200 pages per second.

            marko Marko Mäkelä added a comment - wlad , can you please review the above patch? I tested it with some additional output during a 60-second Sysbench oltp_update_index workload: @@ -2486,6 +2487,8 @@ static void buf_flush_page_cleaner() mysql_mutex_lock(&buf_pool.mutex); last_pages= n_flushed= buf_flush_list_holding_mutex(n); page_cleaner.flush_time+= ut_time_ms() - tm; + sql_print_information("InnoDB: flush %zu, %zu, %g", + n, page_cleaner.flush_time, pct_lwm); MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE, MONITOR_FLUSH_ADAPTIVE_COUNT, MONITOR_FLUSH_ADAPTIVE_PAGES, With the patch, all calls will display pct_lwm=10 , which corresponds to the non-default parameter innodb_max_dirty_pages_pct_lwm=10 that I had specified. If I specify innodb_adaptive_flushing=OFF , then there will be no output. If I revert the patch, all output will be with pct_lwm=0 , as reported, and the number of pages flushed will be smaller: 10.6 f074223ae730cfa12511090f2257d9779e867181 with the output Version: '10.6.17-MariaDB' socket: '/dev/shm/sbtest/mysqld.sock' port: 3306 Source distribution 2023-12-05 14:05:18 0 [Note] InnoDB: flush 200, 3, 0 2023-12-05 14:05:19 0 [Note] InnoDB: flush 200, 15, 0 2023-12-05 14:05:20 0 [Note] InnoDB: flush 67, 17, 0 2023-12-05 14:05:21 0 [Note] InnoDB: flush 67, 19, 0 2023-12-05 14:05:22 0 [Note] InnoDB: flush 67, 20, 0 2023-12-05 14:05:23 0 [Note] InnoDB: flush 67, 23, 0 2023-12-05 14:05:24 0 [Note] InnoDB: flush 67, 27, 0 2023-12-05 14:05:25 0 [Note] InnoDB: flush 70, 30, 0 2023-12-05 14:05:26 0 [Note] InnoDB: flush 67, 33, 0 … 2023-12-05 14:05:48 0 [Note] InnoDB: flush 67, 56, 0 2023-12-05 14:05:49 0 [Note] InnoDB: flush 300, 4, 0 2023-12-05 14:05:50 0 [Note] InnoDB: flush 300, 6, 0 2023-12-05 14:05:56 0 [Note] InnoDB: flush 136, 10, 0 … skip lines where the first number varies between 94 and 177 2023-12-05 14:06:01 0 [Note] InnoDB: flush 90, 83, 0 2023-12-05 14:06:01 0 [Note] InnoDB: flush 121, 87, 0 2023-12-05 14:06:02 0 [Note] InnoDB: flush 143, 90, 0 2023-12-05 14:06:14 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown 10.6 f074223ae730cfa12511090f2257d9779e867181 with the output and the fix Version: '10.6.17-MariaDB' socket: '/dev/shm/sbtest/mysqld.sock' port: 3306 Source distribution 2023-12-05 14:01:22 0 [Note] InnoDB: flush 200, 7, 10 2023-12-05 14:01:23 0 [Note] InnoDB: flush 200, 11, 10 2023-12-05 14:01:24 0 [Note] InnoDB: flush 47, 12, 10 2023-12-05 14:01:25 0 [Note] InnoDB: flush 57, 14, 10 2023-12-05 14:01:26 0 [Note] InnoDB: flush 67, 18, 10 2023-12-05 14:01:27 0 [Note] InnoDB: flush 85, 23, 10 2023-12-05 14:01:28 0 [Note] InnoDB: flush 98, 25, 10 2023-12-05 14:01:29 0 [Note] InnoDB: flush 119, 28, 10 2023-12-05 14:01:30 0 [Note] InnoDB: flush 131, 30, 10 2023-12-05 14:01:31 0 [Note] InnoDB: flush 146, 32, 10 2023-12-05 14:01:32 0 [Note] InnoDB: flush 184, 34, 10 2023-12-05 14:01:33 0 [Note] InnoDB: flush 184, 36, 10 2023-12-05 14:01:34 0 [Note] InnoDB: flush 200, 38, 10 2023-12-05 14:01:35 0 [Note] InnoDB: flush 216, 41, 10 2023-12-05 14:01:36 0 [Note] InnoDB: flush 233, 44, 10 2023-12-05 14:01:37 0 [Note] InnoDB: flush 259, 51, 10 2023-12-05 14:01:38 0 [Note] InnoDB: flush 277, 55, 10 … skip lines where the first number steadily grows 2023-12-05 14:02:07 0 [Note] InnoDB: flush 861, 338, 10 2023-12-05 14:02:08 0 [Note] InnoDB: flush 884, 355, 10 2023-12-05 14:02:09 0 [Note] InnoDB: flush 928, 367, 10 2023-12-05 14:02:12 0 [Note] InnoDB: flush 835, 389, 10 2023-12-05 14:02:18 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown The buffer pool size was 2GiB of 16KiB pages (131072 pages), so we were nowhere close to writing 10% of the buffer pool per second. This was with the default innodb_io_capacity on a RAM disk, running Sysbench oltp_update_index at 256 connections, with the data set growing to 3GiB during the workload. I changed the instrumentation to dump a bit more information: @@ -2486,6 +2487,10 @@ static void buf_flush_page_cleaner() mysql_mutex_lock(&buf_pool.mutex); last_pages= n_flushed= buf_flush_list_holding_mutex(n); page_cleaner.flush_time+= ut_time_ms() - tm; + sql_print_information("InnoDB: flush %zu, %zu, %zu, %g, %zu/%zu", + n, n_flushed, page_cleaner.flush_time, pct_lwm, + buf_pool.flush_list.count, + buf_pool.LRU.count); MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE, MONITOR_FLUSH_ADAPTIVE_COUNT, MONITOR_FLUSH_ADAPTIVE_PAGES, During the workload, the number of dirty/total pages in the buffer pool steadily grows after an initial dip until almost the end of the 60-second workload: 10.6 f074223ae730cfa12511090f2257d9779e867181 with the fix and the revised output 2023-12-05 14:16:18 0 [Note] InnoDB: flush 200, 200, 7, 10, 15125/15313 2023-12-05 14:16:19 0 [Note] InnoDB: flush 200, 200, 11, 10, 18850/19059 2023-12-05 14:16:20 0 [Note] InnoDB: flush 46, 46, 17, 10, 22284/22500 2023-12-05 14:16:21 0 [Note] InnoDB: flush 57, 57, 23, 10, 25597/25817 2023-12-05 14:16:22 0 [Note] InnoDB: flush 67, 67, 28, 10, 29078/29308 2023-12-05 14:16:23 0 [Note] InnoDB: flush 85, 85, 30, 10, 32626/32861 2023-12-05 14:16:24 0 [Note] InnoDB: flush 98, 98, 32, 10, 36415/36657 2023-12-05 14:16:25 0 [Note] InnoDB: flush 122, 122, 40, 10, 39971/40257 … skip lines with monotonic growth of the numbers 2023-12-05 14:16:48 0 [Note] InnoDB: flush 589, 589, 229, 10, 113861/121145 2023-12-05 14:16:49 0 [Note] InnoDB: flush 882, 882, 26, 10, 116936/124702 2023-12-05 14:16:50 0 [Note] InnoDB: flush 906, 906, 49, 10, 119658/128206 2023-12-05 14:16:56 0 [Note] InnoDB: flush 906, 906, 74, 10, 127997/129250 2023-12-05 14:16:57 0 [Note] InnoDB: flush 857, 857, 84, 10, 126538/128291 2023-12-05 14:17:14 0 [Note] InnoDB: flush 431, 431, 92, 10, 57742/129792 2023-12-05 14:17:14 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown We can see that we actually wrote out the recommended number of pages n per second. At all times, something like 90% of the buffer pool was dirty, so the pct_lwm=10 threshold was exceeded and we should write out total_ratio * innodb_io_capacity pages per second, up to innodb_max_io_capacity pages, whose default value computes to 2000 pages per second. The default innodb_io_capacity is 200 pages per second.

            Looks good to me

            wlad Vladislav Vaintroub added a comment - Looks good to me

            People

              marko Marko Mäkelä
              adrake Andrew Drake
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.