Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39824

Adaptive flushing only kicks in when innodb_max_dirty_pages_pct_lwm is exceeded

    XMLWordPrintable

Details

    • Not for Release Notes

    Description

      We have set:

      • innodb_adaptive_flushing=ON
      • innodb_adaptive_flushing_lwm=10

      However, despite the adaptive flushing LWM being exceeded, adaptive flushing does not kick in. The page flush rate is at 0 until the redo log capacity is exhausted by more than 80%. At this point other mechanism kick in to flush pages (but in our case not quickly enough to deal with certain surges in write load).

      Our buffer pool is comparably large and the amount of dirty page stays far below innodb_max_dirty_pages_pct. Rather the size of the redo log is our limiting factor.

      When setting innodb_max_dirty_pages_pct_lwm to a small non-zero value (i.e. below the percentage dirty pages), dirty page flushing and adaptive flushing kick in. We can then reset innodb_max_dirty_pages_pct_lwm=0. This seems to stop flushing based on the dirty pages, but adaptive flushing (based on LSN/redo log) seems to continue until innodb_adaptive_flushing_lwm is hit. Also see the attached plot. Page write rate increases by setting innodb_max_dirty_pages_pct_lwm to a small non-zero value. It is reduced (but not to 0) by resetting it to 0, presumably because adaptive flushing remains active, but the flushing based on the dirty pages LWM has deactivated.

      Looking at the source code, it seems to me that the condition for waking the page cleaner thread is the problem. It is woken under these conditions:

      • if for_LRU is true: I'm not entirely sure to what situations this applies, but it seems to be only specific case, so likely false in the situation described here?
      • if innodb_max_dirty_pages_pct_lwm/srv_max_buf_pool_modified_pct is hit, which isn't the case for our large buffer pool.
      • if pct_lwm != 0 and:
        • innodb_max_dirty_pages_pct_lwm is hit
        • or the server is idle.

      I'm not sure how the activity counting/idle detection works, but I suppose in our situation it doesn't evaluate to true because we have a constant write load. So for us the page cleaner is only woken when innodb_max_dirty_pages_pct_lwm is hit.

      To me it seems wrong that adaptive flushing is coupled to innodb_max_dirty_pages_pct_lwm in this way and I would expect adaptive flushing to become active when innodb_adaptive_flushing_lwm is hit independent of innodb_max_dirty_pages_pct_lwm.

      There are a few related tickets:

      • MDEV-25093 reads very similar, but is supposedly fixed since 10.5.10.
      • MDEV-26055 saying that the issue persists in 10.5.11, but supposedly also fixed in several versions.
      • There is the epic MDEV-26620, but with little information and apparently no subtickets assigned. (I also don't have permission to assign this bug report to the epic.)

      Attachments

        Issue Links

          Activity

            People

              shipjain Shipra Jain
              gosmannj Jan Gosmann
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.