Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25093

Adaptive flushing fails to kick in even if innodb_adaptive_flushing_lwm is hit. (possible regression)

    XMLWordPrintable

Details

    Description

      • InnoDB flushing should happen if either of the factors is true

        a. dirty_pct (dirty pages in buffer pool) > innodb_max_dirty_pages_pct_lwm
        b. or innodb_adaptive_flushing_lwm limit is reached (default to 10%)

      • condition (b) represent pressure on redo log and even if (a) is not reached
          then (b) will cause flushing to start to help reduce the pressure on the redo-log.
      • Based on the investigation so far it has been found that (b) condition
          is not causing adaptive flushing to kick in.

      ---------------------------------------------------------------------------

      Let's understand this with some quick experiment

      Let's say we have a very large buffer pool (1m pages = 160 GB).
      Also, let's set innodb_max_dirty_pages_pct_lwm = 70% which means flushing
      will not happen till we reach that limit unless adaptive flushing kicks in
      (with 69 GB data the limit is never hit).

      Adaptive flushing should kick in if there is pressure being built on the redo log
      and is controlled by innodb_adaptive_flushing_lwm (default to 10% unchanged for the experiment)).

      I am running an update-index workload in parallel and as we could see despite redo log
      crossing the 10% (innodb_adaptive_flushing_lwm) limit flushing fails to kick in.
      [condition (b) is true].

      Ideally, on crossing 10% of the redo-log size (20GB * 10% = 2GB) it should start flushing.
      Max-checkpoint age is correctly set to 85% of the redo-log size (I recall it should be 80-85%).

      MariaDB [(none)]> show status like 'Innodb_buffer_pool_pages%'; show status like 'Innodb_checkpoint_%';
      -------------------------------------------------+
      Variable_name Value
      -------------------------------------------------+

      Innodb_buffer_pool_pages_data 4496537
      Innodb_buffer_pool_pages_dirty 3100258
      Innodb_buffer_pool_pages_flushed 0
      Innodb_buffer_pool_pages_free 5826663
      .....
      Innodb_checkpoint_age 4260770018
      Innodb_checkpoint_max_age 17393908102
      --------------------------------------+

      MariaDB [(none)]> show status like 'Innodb_buffer_pool_pages%'; show status like 'Innodb_checkpoint_%';
      -------------------------------------------------+
      Variable_name Value
      -------------------------------------------------+

      Innodb_buffer_pool_pages_data 4523411
      Innodb_buffer_pool_pages_dirty 4483055
      Innodb_buffer_pool_pages_flushed 0
      Innodb_buffer_pool_pages_free 5799789
      .....
      Innodb_checkpoint_age 15647589898
      Innodb_checkpoint_max_age 17393908102
      --------------------------------------+

      Version tested on: 10.5 (#4498714)

      and of-course a sudden drop in tps is seen once the redo-log hit the max-checkpoint age (84K -> 34K)

      [ 255s ] thds: 1024 tps: 84861.92 qps: 84862.12 (r/w/o: 0.00/84862.12/0.00) lat (ms,95%): 12.75 err/s: 0.00 reconn/s: 0.00
      [ 260s ] thds: 1024 tps: 78755.87 qps: 78755.87 (r/w/o: 0.00/78755.87/0.00) lat (ms,95%): 12.30 err/s: 0.00 reconn/s: 0.00
      [ 265s ] thds: 1024 tps: 34419.32 qps: 34419.32 (r/w/o: 0.00/34419.32/0.00) lat (ms,95%): 27.17 err/s: 0.00 reconn/s: 0.00
      [ 270s ] thds: 1024 tps: 53913.70 qps: 53913.70 (r/w/o: 0.00/53913.70/0.00) lat (ms,95%): 13.70 err/s: 0.00 reconn/s: 0.00
      [ 275s ] thds: 1024 tps: 59043.41 qps: 59043.41 (r/w/o: 0.00/59043.41/0.00) lat (ms,95%): 14.73 err/s: 0.00 reconn/s: 0.00
      [ 280s ] thds: 1024 tps: 73390.11 qps: 73390.11 (r/w/o: 0.00/73390.11/0.00) lat (ms,95%): 13.70 err/s: 0.00 reconn/s: 0.00

      ---------------

      Said issue looks to be a regression and older version should be studied to findout when it started regressing but likely it is 10.5 onwards only.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              krunalbauskar Krunal Bauskar
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.