Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33894

MariaDB does unexpected storage read IO for the redo log

Details

    Description

      The full writeup is here. The summary is that while running sysbench with a cached database I see reads from storage during some of the write-heavy microbenchmark steps and that IO is being done for the redo log.

      I saw this long ago with web-scale MySQL. The issue is that writing the first 512 bytes to a 4kb filesystem page must read the page into the OS page cache when using buffered IO.

      I don't see this with upstream MySQL 5.7.44 or 8.0.36. While 8.0.36 has innodb_log_write_ahead_size, that isn't required to avoid it because I also don't see this with 5.7.44.

      I know that MariaDB has changed the InnoDB redo log architecture somewhere between MariaDB 10.4 and 11.4 because it now uses only one large redo log file while MySQL uses many (configurable number) smaller ones. But I have yet to figure out why this shows up in 10.11 and not in 10.6.

      I used strace to see how the redo log file was opened and the pwrite64 calls. I didn't see any obvious differences between MariaDB and MySQL other than the change (one large redo log) listed above.

      I can avoid this by using a small redo log, but is bad for performance because it increases write back activity.

      Attachments

        Issue Links

          Activity

            I am now done with the review.

            Dynamic Configuration Patch
            I had shared my comments earlier. I understand we would like to defer this patch for now.

            Read Only Configuration Patch
            Please see my comments. I agree with the patch. Please check my comments.

            debarun Debarun Banerjee added a comment - I am now done with the review. Dynamic Configuration Patch I had shared my comments earlier. I understand we would like to defer this patch for now. Read Only Configuration Patch Please see my comments. I agree with the patch. Please check my comments.

            The read-only parameter innodb_log_write_ahead_size with the default value 512 and the allowed values 512, 1024, 2048, 4096 was introduced. Up to MariaDB Server 10.6, additional allowed values were 8192, 16384, and the default value was 8192. On Linux and Microsoft Windows, the default or the specified innodb_log_write_ahead_size will be automatically adjusted to not be less than the physical block size (if it can be determined).

            The more ambitious fix (to make the parameter settable at runtime and to allow larger values) in https://github.com/MariaDB/server/pull/3327 might be revisited later.

            marko Marko Mäkelä added a comment - The read-only parameter innodb_log_write_ahead_size with the default value 512 and the allowed values 512, 1024, 2048, 4096 was introduced. Up to MariaDB Server 10.6, additional allowed values were 8192, 16384, and the default value was 8192. On Linux and Microsoft Windows, the default or the specified innodb_log_write_ahead_size will be automatically adjusted to not be less than the physical block size (if it can be determined). The more ambitious fix (to make the parameter settable at runtime and to allow larger values) in https://github.com/MariaDB/server/pull/3327 might be revisited later.

            The MDEV-33894 fix makes a big difference. I was able to show that MariaDB is ~10% faster than MySQL on a medium server.
            https://smalldatum.blogspot.com/2024/07/sysbench-on-medium-server-mariadb-is.html

            mdcallag Mark Callaghan added a comment - The MDEV-33894 fix makes a big difference. I was able to show that MariaDB is ~10% faster than MySQL on a medium server. https://smalldatum.blogspot.com/2024/07/sysbench-on-medium-server-mariadb-is.html
            mdcallag Mark Callaghan added a comment - - edited

            This is still a problem for me. Or perhaps I don't understand what was fixed.

            I think that I got the numbers listed below (512 vs 8192) backwards. See my next comment.
            When I don't set innodb_log_write_ahead_size in etc/my.cnf the value of innodb_log_write_ahead_size is:

            • 512 n MariaDB versions 10.4.33, 10.4.34, 10.5.25, 10.5.26, 10.6.18, 10.6.19, 10.11.7, 10.11.9
            • 8192 in MariaDB 11.x

            And those default values (512 prior to 11.x, 8192 in 11.x) are documented
            https://mariadb.com/kb/en/innodb-system-variables/#innodb_log_write_ahead_size

            When I repeat sysbench with a cached database using the latest point release versions then I still see a regression in throughput for update-index (see here). The numbers are throughput relative to 10.2.44 and the relative throughput is 1.08 in column 4 (10.6.19) and drops to 0.90 in column 5 (10.11.9).

            From results for more versions, focus on columns 8, 9 and 10 which have ...

            • column 8 -> MariaDB 10.11.8 with default for innodb_log_write_ahead_size
            • column 9 -> MariaDB 10.11.9 with default value for innodb_log_write_ahead_size (8192)
            • column 10 -> MariaDB 10.11.9 with innodb_log_write_ahead_size=4096 in etc/my.cnf

            Results are here and the performance improves in column 10. A similar pattern occurs for MariaDB 11.4, 11.5 and 11.6. Entries with "z11a_lwas4k_c8r32" set innodb_log_write_ahead_size to 4096, otherwise I don't set it.

            mdcallag Mark Callaghan added a comment - - edited This is still a problem for me. Or perhaps I don't understand what was fixed. I think that I got the numbers listed below (512 vs 8192) backwards. See my next comment. When I don't set innodb_log_write_ahead_size in etc/my.cnf the value of innodb_log_write_ahead_size is: 512 n MariaDB versions 10.4.33, 10.4.34, 10.5.25, 10.5.26, 10.6.18, 10.6.19, 10.11.7, 10.11.9 8192 in MariaDB 11.x And those default values (512 prior to 11.x, 8192 in 11.x) are documented https://mariadb.com/kb/en/innodb-system-variables/#innodb_log_write_ahead_size When I repeat sysbench with a cached database using the latest point release versions then I still see a regression in throughput for update-index ( see here ). The numbers are throughput relative to 10.2.44 and the relative throughput is 1.08 in column 4 (10.6.19) and drops to 0.90 in column 5 (10.11.9). From results for more versions, focus on columns 8, 9 and 10 which have ... column 8 -> MariaDB 10.11.8 with default for innodb_log_write_ahead_size column 9 -> MariaDB 10.11.9 with default value for innodb_log_write_ahead_size (8192) column 10 -> MariaDB 10.11.9 with innodb_log_write_ahead_size=4096 in etc/my.cnf Results are here and the performance improves in column 10. A similar pattern occurs for MariaDB 11.4, 11.5 and 11.6. Entries with "z11a_lwas4k_c8r32" set innodb_log_write_ahead_size to 4096, otherwise I don't set it.

            When I don't set innodb_log_write_ahead_size in my.cnf, it is 8192 for MariaDB 10.6.20 vs 512 for MariaDB 10.11.10, 11.4.4, 11.5.2,11.6.2 and 11.7.1.

            For 10.6, 10.11, 11.4, 11.5, 11.6 and 11.7 I repeated sysbench with two my.cnf files:

            • my.cnf.cz11a_c8r32 or my.cnf.cz11b_c8r32 - these do not set innodb_log_write_ahead_size
            • my.cnf.cz11a_lwas4k_c8r32 or my.cnf.cz11b_lwas4k_c8r32 - these set innodb_log_write_ahead_size = 4096

            And with sysbench I see a big improvement in QPS for 10.11, 11.4, 11.5, 11.6 and 11.7 when I set innodb_log_write_ahead_size on the write-heavy benchmark steps. The largest improvement might be on the update-index microbenchmark (see here). The numbers are the relative QPS (rQPS) which is (QPS for my version) / (QPS for MariaDB 10.5.27) and on update-index the rQPS in MariaDB 11.7.1 is 0.76 when I don't set innodb_log_write_ahead_size vs 1.08 when I set it to 4096. Similar results occur for versions 10.11.10, 11.4.4, 11.5.2 and 11.6.2.

            mdcallag Mark Callaghan added a comment - When I don't set innodb_log_write_ahead_size in my.cnf, it is 8192 for MariaDB 10.6.20 vs 512 for MariaDB 10.11.10, 11.4.4, 11.5.2,11.6.2 and 11.7.1. For 10.6, 10.11, 11.4, 11.5, 11.6 and 11.7 I repeated sysbench with two my.cnf files: my.cnf.cz11a_c8r32 or my.cnf.cz11b_c8r32 - these do not set innodb_log_write_ahead_size my.cnf.cz11a_lwas4k_c8r32 or my.cnf.cz11b_lwas4k_c8r32 - these set innodb_log_write_ahead_size = 4096 And with sysbench I see a big improvement in QPS for 10.11, 11.4, 11.5, 11.6 and 11.7 when I set innodb_log_write_ahead_size on the write-heavy benchmark steps. The largest improvement might be on the update-index microbenchmark ( see here ). The numbers are the relative QPS (rQPS) which is (QPS for my version) / (QPS for MariaDB 10.5.27) and on update-index the rQPS in MariaDB 11.7.1 is 0.76 when I don't set innodb_log_write_ahead_size vs 1.08 when I set it to 4096. Similar results occur for versions 10.11.10, 11.4.4, 11.5.2 and 11.6.2.

            People

              marko Marko Mäkelä
              mdcallag Mark Callaghan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.