Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27772

Performance regression with default configuration in 10.6

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6
    • None
    • Server
    • None
    • SUSE Enterprise Linux 15 SP4, openSUSE Tumbleweed

    Description

      MariaDB 10.6 has O_DIRECT value as default for innodb_flush_method instead of fsync. This lead to a performance regression in the default configuration, regardless of the filesystem used.
      To benchmark the performance, I have used sysbench-0.4.8 oltp read test on btrfs (the most affected), ext4 and xfs.

      Filesystem threads innodb_flush_method=fsync innodb_flush_method=O_DIRECT
      ext4 1 6.5s 7.0s
      ext4 16 0.97s 1.15s
      brtfs 1 6.5s 77s
      brtfs 16 0.8s 3.1s
      xfs 1 6.4s 7.2s
      xfs 16 0.8s 1.1s

      I have used a mount point in /abuild and the following configuration in /etc/my.cnf.d/sysbench_testing.cnf:

      [mysqld]
      port = 3306
      datadir=/abuild/mysql-sysbench
      socket=/abuild/mysql-sysbench/sysbench-mysql.sock
      #innodb_flush_method=fsync
      

      To reproduce, run the following commands:

      $ # prepare the db
      $ sysbench --test=oltp --mysql-user=root \
             --mysql-socket=/abuild/mysql-sysbench/sysbench-mysql.sock \
             --mysql-db=test --mysql-port=3306 \
             --oltp-dist-type=uniform --oltp-read-only --oltp-order-ranges=0 \
             --oltp-table-size=1000000 --oltp-range-size=1000 prepare
      $ # run the benchmark
      $ sysbench --test=oltp --mysql-user=root \
             --mysql-socket=/abuild/mysql-sysbench/sysbench-mysql.sock \
             --mysql-db=test --mysql-port=3306 \
             --oltp-dist-type=uniform --oltp-read-only --oltp-order-ranges=0 \
             --oltp-table-size=1000000 --oltp-range-size=1000 \
             --num-threads=$NTHREADS run
      $ # cleanup the db
      $ sysbench --test=oltp --mysql-user=root \
               --mysql-socket=/abuild/mysql-sysbench/sysbench-mysql.sock \
      	 --mysql-db=test --mysql-port=3306  cleanup >/dev/null 1>&2
      

      I am also aware that some testing by some colleague lead to similar regression in xfs (353.5476s in 10.6 vs 16.9560s in 10.5).

      Attachments

        Issue Links

          Activity

            danyspin97 Danilo Spinella created issue -

            but if you benchmark a table with 1000 rows, instead of 1 million, it will probably not result in a performance regression, right?

            wlad Vladislav Vaintroub added a comment - but if you benchmark a table with 1000 rows, instead of 1 million, it will probably not result in a performance regression, right?
            marko Marko Mäkelä made changes -
            Field Original Value New Value

            If this is with the default innodb_buffer_pool_size=128M and innodb_log_file_size=96M, then this is as expected. Due to O_DIRECT, any page reads due to the (by today’s standards) tiny buffer pool cannot be served from the Linux file system cache.

            The InnoDB buffer pool itself is supposed to be the data file cache. A tiny buffer pool will also cause more frequent page writes and page checksum calculations on both reads and writes.

            I expect MDEV-19895 to address this.

            marko Marko Mäkelä added a comment - If this is with the default innodb_buffer_pool_size=128M and innodb_log_file_size=96M , then this is as expected. Due to O_DIRECT , any page reads due to the (by today’s standards) tiny buffer pool cannot be served from the Linux file system cache. The InnoDB buffer pool itself is supposed to be the data file cache. A tiny buffer pool will also cause more frequent page writes and page checksum calculations on both reads and writes. I expect MDEV-19895 to address this.

            Yea, with 1000 rows there is almost no difference (just tried with brtfs).

            danyspin97 Danilo Spinella added a comment - Yea, with 1000 rows there is almost no difference (just tried with brtfs).
            wlad Vladislav Vaintroub added a comment - - edited

            ok , for danyspin97 once again the explanation.

            • if innodb_flush_method = fsync , then filesystem cache serves as a not very efficient level-2 cash for the database
            • if innodb_flush_method =O_DIRECT, then there is no level-2 cash so to say, but you can make your level-1 much bigger with innodb_buffer_pool_size=<size>

            There is a certain appeal in innodb_flush_method = fsync, that is, you can misconfigure your database instance, and it will still perform OK-ish, not ideal of course. Unlike SQLServer, Innodb unfortunately does not yet have a buffer pool that would automatically grow or shrink depending on the load.

            As marko says, 128M default is tiny, in fact this would be SQLServer minimum. Maybe 1G should be the new default.

            wlad Vladislav Vaintroub added a comment - - edited ok , for danyspin97 once again the explanation. if innodb_flush_method = fsync , then filesystem cache serves as a not very efficient level-2 cash for the database if innodb_flush_method =O_DIRECT, then there is no level-2 cash so to say, but you can make your level-1 much bigger with innodb_buffer_pool_size=<size> There is a certain appeal in innodb_flush_method = fsync, that is, you can misconfigure your database instance, and it will still perform OK-ish, not ideal of course. Unlike SQLServer , Innodb unfortunately does not yet have a buffer pool that would automatically grow or shrink depending on the load. As marko says, 128M default is tiny, in fact this would be SQLServer minimum. Maybe 1G should be the new default.

            Thank you for the detailed explanation.

            However in the default configuration, fsync outperforms (20 times faster in some cases) O_DIRECT. Would it make sense to have fsync as default until MDEV-19895 has been fixed? This way, O_DIRECT would be used by people who also increase and tweak innodb_buffer_pool_size according to their needs and wouldn't lead to any performance regression on the default configuration. I see that the auto_shrink feature will be added in 10.9, which from my understanding it won't come out any time soon.

            danyspin97 Danilo Spinella added a comment - Thank you for the detailed explanation. However in the default configuration, fsync outperforms (20 times faster in some cases) O_DIRECT. Would it make sense to have fsync as default until MDEV-19895 has been fixed? This way, O_DIRECT would be used by people who also increase and tweak innodb_buffer_pool_size according to their needs and wouldn't lead to any performance regression on the default configuration. I see that the auto_shrink feature will be added in 10.9, which from my understanding it won't come out any time soon.
            wlad Vladislav Vaintroub added a comment - - edited

            In the default configuration, on your benchmark, which does not fit into bufferpool, fsync outperforms. This would be a more accurate statement, as you confirmed that if you benchmark fits into bufferpool, 20x faster does not happen.

            There is a possibility to check if buffer pool size was not set, and then maybe then choose fsync over O_DIRECT, perhaps writing a
            [Note] Using fsync for innodb_flush_method, because innodb_buffer_pool_size was not configured
            marko ?

            In other news, I would not like autosetting of Innodb memory to 80% of my RAM as per MDEV-19895 , going from previous default of 128MB.
            I'm more for intelligent, SQLServer style buffer pool autoallocation, growing and shrinking as needed, taking into account read and write rates, amount of RAM available, and the maximum buffer pool , if configured.

            wlad Vladislav Vaintroub added a comment - - edited In the default configuration, on your benchmark, which does not fit into bufferpool, fsync outperforms. This would be a more accurate statement, as you confirmed that if you benchmark fits into bufferpool, 20x faster does not happen. There is a possibility to check if buffer pool size was not set, and then maybe then choose fsync over O_DIRECT, perhaps writing a [Note] Using fsync for innodb_flush_method, because innodb_buffer_pool_size was not configured marko ? In other news, I would not like autosetting of Innodb memory to 80% of my RAM as per MDEV-19895 , going from previous default of 128MB. I'm more for intelligent, SQLServer style buffer pool autoallocation, growing and shrinking as needed, taking into account read and write rates, amount of RAM available, and the maximum buffer pool , if configured.
            marko Marko Mäkelä made changes -

            wlad, I think that someone is always guaranteed to be surprised or disappointed, whatever we do.

            danyspin97, we try to follow a policy that the default values of parameters are only changed before a server is released as generally available. Sometimes we do deviate from it when there are strong reasons (see MDEV-27734 for a recent example). I don’t think that this is a strong enough reason to do anything about innodb_flush_method. The current default setting O_DIRECT should work in properly configured systems. If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well.

            If MDEV-19895 made the default innodb_buffer_pool_size 80% of the available RAM, and the default innodb_log_file_size the same as innodb_buffer_pool_size, someone could be upset that their tiny database is hogging all RAM for no reason, or their mostly read-only database is suddenly using a huge redo log file. Or they never cared about poor write performance when using the current default innodb_log_file_size=96MiB.

            I do not think that there can ever be a default configuration that really works for everyone in all cases. It is hard to say what is a ‘typical’ or ‘common’ use case.

            When it comes to the innodb_buffer_pool_size, there are some ideas for improving it in MDEV-24670. I do not know how much of that could be implemented on other operating systems than Linux.

            Theoretically, we might also resize the log file based on workload and some limits configured by the DBA.

            marko Marko Mäkelä added a comment - wlad , I think that someone is always guaranteed to be surprised or disappointed, whatever we do. danyspin97 , we try to follow a policy that the default values of parameters are only changed before a server is released as generally available. Sometimes we do deviate from it when there are strong reasons (see MDEV-27734 for a recent example). I don’t think that this is a strong enough reason to do anything about innodb_flush_method . The current default setting O_DIRECT should work in properly configured systems. If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well. If MDEV-19895 made the default innodb_buffer_pool_size 80% of the available RAM, and the default innodb_log_file_size the same as innodb_buffer_pool_size , someone could be upset that their tiny database is hogging all RAM for no reason, or their mostly read-only database is suddenly using a huge redo log file. Or they never cared about poor write performance when using the current default innodb_log_file_size=96MiB . I do not think that there can ever be a default configuration that really works for everyone in all cases. It is hard to say what is a ‘typical’ or ‘common’ use case. When it comes to the innodb_buffer_pool_size , there are some ideas for improving it in MDEV-24670 . I do not know how much of that could be implemented on other operating systems than Linux. Theoretically, we might also resize the log file based on workload and some limits configured by the DBA.

            I think 20x times slower in some basic read queries on the default configuration is strong enough to revert this change. Especially since the benchmark doesn't show any difference for small queries (i.e. when innodb_buffer_pool_size is big enough).

            Saying that that the default configuration requires the user to configure the system it's totally counter intuitive. If I am configuring the system properly, I can set O_DIRECT manually. Indeed, the default configuration could never work for everyone in all cases, but I can't understand why fsync should not be the default since, in the default configuration, it worked for more use cases than O_DIRECT is.

            danyspin97 Danilo Spinella added a comment - I think 20x times slower in some basic read queries on the default configuration is strong enough to revert this change. Especially since the benchmark doesn't show any difference for small queries (i.e. when innodb_buffer_pool_size is big enough). Saying that that the default configuration requires the user to configure the system it's totally counter intuitive. If I am configuring the system properly, I can set O_DIRECT manually. Indeed, the default configuration could never work for everyone in all cases, but I can't understand why fsync should not be the default since, in the default configuration, it worked for more use cases than O_DIRECT is.
            marko Marko Mäkelä made changes -
            wlad Vladislav Vaintroub added a comment - - edited

            Note, that while I do support fsync (if innodb_buffer_pool_size is not configured), I think you're pushing this 20x a little bit too much. First, why would anyone consider btrfs at all - that's not the default, and its read speed as shown by your benchmark, is questionable. So for a database, maybe I'd stay away from btrfs pretty much.

            Second, asking about how much buffer pool size to dedicate to database, might be considered the installer's or if you want, packager's responsibility . MariaDB's Windows MSI by default asks about 12.5% of RAM, and also lets user overwrite this, in both UI and silent mode. MSI has been doing it since 10 years, so maybe zypper also can consider asking user that one question.

            wlad Vladislav Vaintroub added a comment - - edited Note, that while I do support fsync (if innodb_buffer_pool_size is not configured), I think you're pushing this 20x a little bit too much. First, why would anyone consider btrfs at all - that's not the default, and its read speed as shown by your benchmark, is questionable. So for a database, maybe I'd stay away from btrfs pretty much. Second, asking about how much buffer pool size to dedicate to database, might be considered the installer's or if you want, packager's responsibility . MariaDB's Windows MSI by default asks about 12.5% of RAM, and also lets user overwrite this, in both UI and silent mode. MSI has been doing it since 10 years, so maybe zypper also can consider asking user that one question.

            Can't we make MariaDB/InnoDB emit a warning or suggestion to users if it detects that the below is not the case?

            If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well.

            We can't rely on deb/rpm/msi packaging here. Many systems run installs/upgrades in an automatic and unattended way, and there is no human to tell things to and even less to ask them to decide on something. And anyway workloads might change over time long after the installation was done. Best place to decide on good defaults is MariaDB itself and best place to tell users something is wrong or suboptimal is by emitting warnings in the error logs of MariaDB at runtime.

            otto Otto Kekäläinen added a comment - Can't we make MariaDB/InnoDB emit a warning or suggestion to users if it detects that the below is not the case? If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well. We can't rely on deb/rpm/msi packaging here. Many systems run installs/upgrades in an automatic and unattended way, and there is no human to tell things to and even less to ask them to decide on something. And anyway workloads might change over time long after the installation was done. Best place to decide on good defaults is MariaDB itself and best place to tell users something is wrong or suboptimal is by emitting warnings in the error logs of MariaDB at runtime.

            First, why would anyone consider btrfs at all

            The 20x difference is on xfs. My benchmark shows regression on brtfs as well. I don't think making assumptions on the filesystem used by default is good idea either.

            read speed as shown by your benchmark, is questionable

            Why is it questionable? Can you please explain better?

            danyspin97 Danilo Spinella added a comment - First, why would anyone consider btrfs at all The 20x difference is on xfs. My benchmark shows regression on brtfs as well. I don't think making assumptions on the filesystem used by default is good idea either. read speed as shown by your benchmark, is questionable Why is it questionable? Can you please explain better?

            There exist monitoring tools that can build on existing instrumentation in the server to highlight problems. I do not think that such logic should be duplicated in the database server itself. The server can provide raw data. It cannot guess the intentions of the user.

            My intuition suggests that any copy-on-write file system, such as xfs or btrfs, could be a bad match for a write-heavy InnoDB setup, or for any database that is based on B-trees and a circular write-ahead log.

            marko Marko Mäkelä added a comment - There exist monitoring tools that can build on existing instrumentation in the server to highlight problems. I do not think that such logic should be duplicated in the database server itself. The server can provide raw data. It cannot guess the intentions of the user. My intuition suggests that any copy-on-write file system, such as xfs or btrfs, could be a bad match for a write-heavy InnoDB setup, or for any database that is based on B-trees and a circular write-ahead log.
            marko Marko Mäkelä made changes -
            timojyrinki Timo Jyrinki added a comment -

            For a future MariaDB LTS release, it could be beneficial to have fallback scenarios in case MDEV-19895 does not get implemented by then. The pool size default could be bumped to the mentioned 1GB, or reverting to fsync if innodb_buffer_pool_size not configured, maybe with an informational message suggesting to set the value. This would likely improve the out-of-the-box experience for many.

            With O_DIRECT we generally see good results given the 1GB pool size (or higher). The plea for a warning message only comes from the fact that the default does not seem like an optimal compromise, and that the new behavior needs manual intervention compared to more "forgiving" fsync.

            All in all, learning about these is an acceptable learning curve about MariaDB 10.6 and good documentation helps. O_DIRECT has been the correct default choice in general. Good (or at least ok) defaults would be always useful too, though.

            timojyrinki Timo Jyrinki added a comment - For a future MariaDB LTS release, it could be beneficial to have fallback scenarios in case MDEV-19895 does not get implemented by then. The pool size default could be bumped to the mentioned 1GB, or reverting to fsync if innodb_buffer_pool_size not configured, maybe with an informational message suggesting to set the value. This would likely improve the out-of-the-box experience for many. With O_DIRECT we generally see good results given the 1GB pool size (or higher). The plea for a warning message only comes from the fact that the default does not seem like an optimal compromise, and that the new behavior needs manual intervention compared to more "forgiving" fsync. All in all, learning about these is an acceptable learning curve about MariaDB 10.6 and good documentation helps. O_DIRECT has been the correct default choice in general. Good (or at least ok) defaults would be always useful too, though.
            marko Marko Mäkelä made changes -

            People

              Unassigned Unassigned
              danyspin97 Danilo Spinella
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.