[MDEV-31939] Adaptive flush recommendation ignores dirty ratio and checkpoint age - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.6.15
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:
- regression

Description

The follow-up fix referenced in a comment on ~~MDEV-26827~~ changed buf_flush_page_cleaner to pass an additional parameter pct_lwm to page_cleaner_flush_pages_recommendation, documented as innodb_max_dirty_pages_pct_lwm, or 0 to ignore it.

As far as I am able to tell from the code, there is no code path where pct_lwm can ever be anything but 0. Here is the relevant code as of 10.6.15, stripped down to relevant parts and control flow constructs:

    double pct_lwm= 0.0;

    // ...

    pct_lwm= srv_max_dirty_pages_pct_lwm;

    if (pct_lwm != 0.0)

      // ...

      if (activity_count != last_activity_count)

        // ...

        goto maybe_unemployed;

      else if (buf_pool.page_cleaner_idle() && !os_aio_pending_reads())

        // ...

        goto idle_flush;

      else

      maybe_unemployed:

        const bool below{dirty_pct < pct_lwm};

        pct_lwm= 0.0;

        if (below)

          goto possibly_unemployed;

    else if (dirty_pct < srv_max_buf_pool_modified_pct)

    possibly_unemployed:

      if (!soft_lsn_limit && !af_needed_for_redo(oldest_lsn))

        goto unemployed;

These are the only assignments to pct_lwm, which is then passed to page_cleaner_flush_pages_recommendation shortly after this block.

It appears the only way to have pct_lwm to be non-zero at the end of this block is:
1. Have srv_max_dirty_pages_pct_lwm non-zero
2. Make it out of the if (pct_lwm != 0.0) block without hitting the pct_lwm= 0.0 line in the else case.

But #2 appears to be impossible: the inner if case does a goto to the body of the else case (which zeros the value), and the else if case jumps to idle_flush which bypasses the page_cleaner_flush_pages_recommendation call.

Inside page_cleaner_flush_pages_recommendation we have:

        double total_ratio;

        if (pct_lwm == 0.0 || max_pct == 0.0) {

                total_ratio = 1;

        } else {

                total_ratio = std::max(double(pct_for_lsn) / 100,

                                       (dirty_pct / max_pct));

If pct_lwm is always 0, then total_ratio is always 1 and dirty_pct and pct_for_lsn have no effect.

I had a little bit of trouble tracking all the possible control flow paths in buf_flush_page_cleaner due to all the gotos, so I tested this theory by adding DBUG_ASSERT(pct_lwm == 0.0) immediately before the previous snippet. The debug build with that assert passes:

./mtr --mariadbd=--innodb-max-dirty-pages-pct-lwm=0.001 --mariadbd=--innodb-adaptive-flushing-lwm=0.001 --suites=innodb-

which I would expect to trip that assert fairly quickly?

I started looking into this because I noticed that flushing on our databases (on 10.6.14) didn't seem to be speeding up as the checkpoint age approached max_modified_age_async, leading to periodic spikes of rapid flushing. I went reading through the code to try to understand what was going on and noticed this, but I'm not sure it's the only factor.

Attachments

Issue Links

is caused by

MDEV-26055 Adaptive flushing is still not getting invoked in 10.5.11

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Andrew Drake

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2023-08-17 07:37

Updated:: 2024-09-05 22:20

Resolved:: 2023-12-08 09:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server