Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27942

No background flushing until innodb_adaptive_flushing_lwm is reached

Details

    Description

      There seems to be almost no flushing whatsoever until checkpoint age reaches innodb_adaptive_flushing_lwm. Afterwards flushing goes on for several minutes, until we get to about innodb_adaptive_flushing_lwm-1% of max checkpoint age. Then flushing stops again and checkpoint age builds up to innodb_adaptive_flushing_lwm.

      Expected:
      I would expect adaptive flushing to happen even before lwm is hit. According to documentation:

      If set to 1, the default, the server will dynamically adjust the flush rate of dirty pages in the InnoDB buffer pool. This assists to reduce brief bursts of I/O activity. If set to 0, adaptive flushing will only take place when the limit specified by innodb_adaptive_flushing_lwm is reached.

      It seems to me that my instance behaves as if it was set to OFF, as adaptive flushing kicks in ONLY when lwm is reached.

      This is very different from the 10.4.15 version which has a LOT of background flushes all the time and does not actually allow any checkpoint age buildup at all. Both issues seem not optimal, but since 10.4.15 is long gone, I am opening the issue for 10.4.24. Attached screenshot of 10.4.15 is just for an idea of what it looks like on second slave, which has SAME CONFIG and SAME LOAD. Totally different behavior.

      +---------------------------------------------+------------------------+
      | Variable_name                               | Value                  |
      +---------------------------------------------+------------------------+
      | innodb_adaptive_flushing                    | ON                     |
      | innodb_adaptive_flushing_lwm                | 10.000000              |
      | innodb_adaptive_hash_index                  | OFF                    |
      | innodb_adaptive_hash_index_parts            | 8                      |
      | innodb_flush_log_at_timeout                 | 1                      |
      | innodb_flush_log_at_trx_commit              | 2                      |
      | innodb_flush_method                         | O_DIRECT               |
      | innodb_flush_neighbors                      | 0                      |
      | innodb_flush_sync                           | ON                     |
      | innodb_io_capacity                          | 1024                   |
      | innodb_io_capacity_max                      | 2048                   |
      | innodb_lru_scan_depth                       | 100                    |
      | innodb_max_dirty_pages_pct                  | 75.000000              |
      | innodb_max_dirty_pages_pct_lwm              | 0.000000               |
      +---------------------------------------------+------------------------+
      

      Attachments

        Issue Links

          Activity

            Seems that lowering innodb log size from exaggerated 44GB to reasonable 4GB created much more stable flushing.

            shalogo Leonard Sinquer added a comment - Seems that lowering innodb log size from exaggerated 44GB to reasonable 4GB created much more stable flushing.
            brahma brahma added a comment - - edited

            We're struggling with the same issue - innodb_adaptive_flushing_lwm is set to default value 10 , currently we're running all productions on mariadb 5.5 and planning to migrate to 10.6.9. Noticing the checkpoint age is growing up to 6GB in 10.6.9 (total red_log allocated - 8GB) Vs 1.5GB in 5.5 (total red_log allocated - 8GB) then sudden flushing is happening, worried about the crash recovery with 6GB of un check-pointed redo log.

            Tried by changing the innodb_adaptive_flushing_lwm value between 0 to 65 with innodb_adaptive_flushing (with ON & OFF) but checkpoint age is still growing until 6GB out of 8GB of redo log size in 2 hrs.

             
            +--------------------------------+-----------+
            | Variable_name                  | Value     |
            +--------------------------------+-----------+
            | flush                          | OFF       |
            | flush_time                     | 0         |
            | innodb_adaptive_flushing       | ON        |
            | innodb_adaptive_flushing_lwm   | 10.000000 |
            | innodb_flush_log_at_timeout    | 1         |
            | innodb_flush_log_at_trx_commit | 1         |
            | innodb_flush_method            | O_DIRECT  |
            | innodb_flush_neighbors         | 1         |
            | innodb_flush_sync              | ON        |
            | innodb_flushing_avg_loops      | 30        |
            | innodb_lru_flush_size          | 32        |
            +--------------------------------+-----------+
             
            +------------------------+-------+
            | Variable_name          | Value |
            +------------------------+-------+
            | innodb_io_capacity     | 6000  |
            | innodb_io_capacity_max | 12000 |
            +------------------------+-------+
             
            +--------------------------------+-----------+
            | Variable_name                  | Value     |
            +--------------------------------+-----------+
            | innodb_max_dirty_pages_pct     | 80.000000 |
            | innodb_max_dirty_pages_pct_lwm | 0.000000  |
            +--------------------------------+-----------+
            
            

            This this how it works in newer versions of MariaDB or were are any other parameters introduced in the new version of MariaDB to control the check point age.

            Thanks

            brahma brahma added a comment - - edited We're struggling with the same issue - innodb_adaptive_flushing_lwm is set to default value 10 , currently we're running all productions on mariadb 5.5 and planning to migrate to 10.6.9. Noticing the checkpoint age is growing up to 6GB in 10.6.9 (total red_log allocated - 8GB) Vs 1.5GB in 5.5 (total red_log allocated - 8GB) then sudden flushing is happening, worried about the crash recovery with 6GB of un check-pointed redo log. Tried by changing the innodb_adaptive_flushing_lwm value between 0 to 65 with innodb_adaptive_flushing (with ON & OFF) but checkpoint age is still growing until 6GB out of 8GB of redo log size in 2 hrs.   +--------------------------------+-----------+ | Variable_name | Value | +--------------------------------+-----------+ | flush | OFF | | flush_time | 0 | | innodb_adaptive_flushing | ON | | innodb_adaptive_flushing_lwm | 10.000000 | | innodb_flush_log_at_timeout | 1 | | innodb_flush_log_at_trx_commit | 1 | | innodb_flush_method | O_DIRECT | | innodb_flush_neighbors | 1 | | innodb_flush_sync | ON | | innodb_flushing_avg_loops | 30 | | innodb_lru_flush_size | 32 | +--------------------------------+-----------+   +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | innodb_io_capacity | 6000 | | innodb_io_capacity_max | 12000 | +------------------------+-------+   +--------------------------------+-----------+ | Variable_name | Value | +--------------------------------+-----------+ | innodb_max_dirty_pages_pct | 80.000000 | | innodb_max_dirty_pages_pct_lwm | 0.000000 | +--------------------------------+-----------+ This this how it works in newer versions of MariaDB or were are any other parameters introduced in the new version of MariaDB to control the check point age. Thanks
            brahma brahma added a comment -

            Check point age in MariaDB 5.5

            Check point age in MariaDB 10.6.9

            brahma brahma added a comment - Check point age in MariaDB 5.5 Check point age in MariaDB 10.6.9

            We are having a similar issue with 10.6.17 - 10.4 (almost EOL) looks good though.

            marostegui Manuel Arostegui added a comment - We are having a similar issue with 10.6.17 - 10.4 (almost EOL) looks good though.

            This is an interesting graph where we can see the behaviour when a host gets migrated from 10.4 to 10.6

            marostegui Manuel Arostegui added a comment - This is an interesting graph where we can see the behaviour when a host gets migrated from 10.4 to 10.6

            People

              marko Marko Mäkelä
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.