[MDEV-27772] Performance regression with default configuration in 10.6 - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.6
Fix Version/s: None
Component/s: Server
Labels:
None
Environment:
SUSE Enterprise Linux 15 SP4, openSUSE Tumbleweed

Description

MariaDB 10.6 has O_DIRECT value as default for innodb_flush_method instead of fsync. This lead to a performance regression in the default configuration, regardless of the filesystem used.
To benchmark the performance, I have used sysbench-0.4.8 oltp read test on btrfs (the most affected), ext4 and xfs.

Filesystem	threads	innodb_flush_method=fsync	innodb_flush_method=O_DIRECT
ext4	1	6.5s	7.0s
ext4	16	0.97s	1.15s
brtfs	1	6.5s	77s
brtfs	16	0.8s	3.1s
xfs	1	6.4s	7.2s
xfs	16	0.8s	1.1s

I have used a mount point in /abuild and the following configuration in /etc/my.cnf.d/sysbench_testing.cnf:

[mysqld]

port = 3306

datadir=/abuild/mysql-sysbench

socket=/abuild/mysql-sysbench/sysbench-mysql.sock

#innodb_flush_method=fsync

To reproduce, run the following commands:

$ # prepare the db

$ sysbench --test=oltp --mysql-user=root \

       --mysql-socket=/abuild/mysql-sysbench/sysbench-mysql.sock \

       --mysql-db=test --mysql-port=3306 \

       --oltp-dist-type=uniform --oltp-read-only --oltp-order-ranges=0 \

       --oltp-table-size=1000000 --oltp-range-size=1000 prepare

$ # run the benchmark

$ sysbench --test=oltp --mysql-user=root \

       --mysql-socket=/abuild/mysql-sysbench/sysbench-mysql.sock \

       --mysql-db=test --mysql-port=3306 \

       --oltp-dist-type=uniform --oltp-read-only --oltp-order-ranges=0 \

       --oltp-table-size=1000000 --oltp-range-size=1000 \

       --num-threads=$NTHREADS run

$ # cleanup the db

$ sysbench --test=oltp --mysql-user=root \

         --mysql-socket=/abuild/mysql-sysbench/sysbench-mysql.sock \

	 --mysql-db=test --mysql-port=3306  cleanup >/dev/null 1>&2

I am also aware that some testing by some colleague lead to similar regression in xfs (353.5476s in 10.6 vs 16.9560s in 10.5).

Attachments

Issue Links

is duplicated by

MDEV-31227 innodb_flush_method=O_DIRECT causes 3x regression in workload

Closed

relates to

MDEV-19895 Support "autoset" in SET GLOBAL for AUTO_SET system variables

Open

MDEV-24670 avoid OOM by linux kernel co-operative memory management

Closed

MDEV-27812 Allow innodb_log_file_size to change without server restart

Closed

MDEV-24854 Change innodb_flush_method=O_DIRECT by default

Closed

Activity

Ascending order - Click to sort in descending order

Danilo Spinella created issue - 2022-02-08 10:12

Vladislav Vaintroub added a comment - 2022-02-08 10:17

but if you benchmark a table with 1000 rows, instead of 1 million, it will probably not result in a performance regression, right?

Vladislav Vaintroub added a comment - 2022-02-08 10:17 but if you benchmark a table with 1000 rows, instead of 1 million, it will probably not result in a performance regression, right?

Marko Mäkelä made changes - 2022-02-08 10:22

Field	Original Value	New Value
Link		This issue relates to MDEV-19895 [ MDEV-19895 ]

Marko Mäkelä added a comment - 2022-02-08 10:22

If this is with the default innodb_buffer_pool_size=128M and innodb_log_file_size=96M, then this is as expected. Due to O_DIRECT, any page reads due to the (by today’s standards) tiny buffer pool cannot be served from the Linux file system cache.

The InnoDB buffer pool itself is supposed to be the data file cache. A tiny buffer pool will also cause more frequent page writes and page checksum calculations on both reads and writes.

I expect MDEV-19895 to address this.

Marko Mäkelä added a comment - 2022-02-08 10:22 If this is with the default innodb_buffer_pool_size=128M and innodb_log_file_size=96M , then this is as expected. Due to O_DIRECT , any page reads due to the (by today’s standards) tiny buffer pool cannot be served from the Linux file system cache. The InnoDB buffer pool itself is supposed to be the data file cache. A tiny buffer pool will also cause more frequent page writes and page checksum calculations on both reads and writes. I expect MDEV-19895 to address this.

Danilo Spinella added a comment - 2022-02-08 10:23

Yea, with 1000 rows there is almost no difference (just tried with brtfs).

Danilo Spinella added a comment - 2022-02-08 10:23 Yea, with 1000 rows there is almost no difference (just tried with brtfs).

Vladislav Vaintroub added a comment - 2022-02-09 00:24 - edited

ok , for danyspin97 once again the explanation.

if innodb_flush_method = fsync , then filesystem cache serves as a not very efficient level-2 cash for the database
if innodb_flush_method =O_DIRECT, then there is no level-2 cash so to say, but you can make your level-1 much bigger with innodb_buffer_pool_size=<size>

There is a certain appeal in innodb_flush_method = fsync, that is, you can misconfigure your database instance, and it will still perform OK-ish, not ideal of course. Unlike SQLServer, Innodb unfortunately does not yet have a buffer pool that would automatically grow or shrink depending on the load.

As marko says, 128M default is tiny, in fact this would be SQLServer minimum. Maybe 1G should be the new default.

Vladislav Vaintroub added a comment - 2022-02-09 00:24 - edited ok , for danyspin97 once again the explanation. if innodb_flush_method = fsync , then filesystem cache serves as a not very efficient level-2 cash for the database if innodb_flush_method =O_DIRECT, then there is no level-2 cash so to say, but you can make your level-1 much bigger with innodb_buffer_pool_size=<size> There is a certain appeal in innodb_flush_method = fsync, that is, you can misconfigure your database instance, and it will still perform OK-ish, not ideal of course. Unlike SQLServer , Innodb unfortunately does not yet have a buffer pool that would automatically grow or shrink depending on the load. As marko says, 128M default is tiny, in fact this would be SQLServer minimum. Maybe 1G should be the new default.

Danilo Spinella added a comment - 2022-02-09 10:32

Thank you for the detailed explanation.

However in the default configuration, fsync outperforms (20 times faster in some cases) O_DIRECT. Would it make sense to have fsync as default until MDEV-19895 has been fixed? This way, O_DIRECT would be used by people who also increase and tweak innodb_buffer_pool_size according to their needs and wouldn't lead to any performance regression on the default configuration. I see that the auto_shrink feature will be added in 10.9, which from my understanding it won't come out any time soon.

Danilo Spinella added a comment - 2022-02-09 10:32 Thank you for the detailed explanation. However in the default configuration, fsync outperforms (20 times faster in some cases) O_DIRECT. Would it make sense to have fsync as default until MDEV-19895 has been fixed? This way, O_DIRECT would be used by people who also increase and tweak innodb_buffer_pool_size according to their needs and wouldn't lead to any performance regression on the default configuration. I see that the auto_shrink feature will be added in 10.9, which from my understanding it won't come out any time soon.

Vladislav Vaintroub added a comment - 2022-02-09 11:44 - edited

In the default configuration, on your benchmark, which does not fit into bufferpool, fsync outperforms. This would be a more accurate statement, as you confirmed that if you benchmark fits into bufferpool, 20x faster does not happen.

There is a possibility to check if buffer pool size was not set, and then maybe then choose fsync over O_DIRECT, perhaps writing a
[Note] Using fsync for innodb_flush_method, because innodb_buffer_pool_size was not configured
marko ?

In other news, I would not like autosetting of Innodb memory to 80% of my RAM as per MDEV-19895 , going from previous default of 128MB.
I'm more for intelligent, SQLServer style buffer pool autoallocation, growing and shrinking as needed, taking into account read and write rates, amount of RAM available, and the maximum buffer pool , if configured.

Vladislav Vaintroub added a comment - 2022-02-09 11:44 - edited In the default configuration, on your benchmark, which does not fit into bufferpool, fsync outperforms. This would be a more accurate statement, as you confirmed that if you benchmark fits into bufferpool, 20x faster does not happen. There is a possibility to check if buffer pool size was not set, and then maybe then choose fsync over O_DIRECT, perhaps writing a [Note] Using fsync for innodb_flush_method, because innodb_buffer_pool_size was not configured marko ? In other news, I would not like autosetting of Innodb memory to 80% of my RAM as per MDEV-19895 , going from previous default of 128MB. I'm more for intelligent, SQLServer style buffer pool autoallocation, growing and shrinking as needed, taking into account read and write rates, amount of RAM available, and the maximum buffer pool , if configured.

Marko Mäkelä made changes - 2022-02-09 12:55

Link

This issue relates to ~~MDEV-24670~~ [ ~~MDEV-24670~~ ]

Marko Mäkelä added a comment - 2022-02-09 12:55

wlad, I think that someone is always guaranteed to be surprised or disappointed, whatever we do.

danyspin97, we try to follow a policy that the default values of parameters are only changed before a server is released as generally available. Sometimes we do deviate from it when there are strong reasons (see ~~MDEV-27734~~ for a recent example). I don’t think that this is a strong enough reason to do anything about innodb_flush_method. The current default setting O_DIRECT should work in properly configured systems. If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well.

If MDEV-19895 made the default innodb_buffer_pool_size 80% of the available RAM, and the default innodb_log_file_size the same as innodb_buffer_pool_size, someone could be upset that their tiny database is hogging all RAM for no reason, or their mostly read-only database is suddenly using a huge redo log file. Or they never cared about poor write performance when using the current default innodb_log_file_size=96MiB.

I do not think that there can ever be a default configuration that really works for everyone in all cases. It is hard to say what is a ‘typical’ or ‘common’ use case.

When it comes to the innodb_buffer_pool_size, there are some ideas for improving it in ~~MDEV-24670~~. I do not know how much of that could be implemented on other operating systems than Linux.

Theoretically, we might also resize the log file based on workload and some limits configured by the DBA.

Marko Mäkelä added a comment - 2022-02-09 12:55 wlad , I think that someone is always guaranteed to be surprised or disappointed, whatever we do. danyspin97 , we try to follow a policy that the default values of parameters are only changed before a server is released as generally available. Sometimes we do deviate from it when there are strong reasons (see MDEV-27734 for a recent example). I don’t think that this is a strong enough reason to do anything about innodb_flush_method . The current default setting O_DIRECT should work in properly configured systems. If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well. If MDEV-19895 made the default innodb_buffer_pool_size 80% of the available RAM, and the default innodb_log_file_size the same as innodb_buffer_pool_size , someone could be upset that their tiny database is hogging all RAM for no reason, or their mostly read-only database is suddenly using a huge redo log file. Or they never cared about poor write performance when using the current default innodb_log_file_size=96MiB . I do not think that there can ever be a default configuration that really works for everyone in all cases. It is hard to say what is a ‘typical’ or ‘common’ use case. When it comes to the innodb_buffer_pool_size , there are some ideas for improving it in MDEV-24670 . I do not know how much of that could be implemented on other operating systems than Linux. Theoretically, we might also resize the log file based on workload and some limits configured by the DBA.

Danilo Spinella added a comment - 2022-02-09 14:14

I think 20x times slower in some basic read queries on the default configuration is strong enough to revert this change. Especially since the benchmark doesn't show any difference for small queries (i.e. when innodb_buffer_pool_size is big enough).

Saying that that the default configuration requires the user to configure the system it's totally counter intuitive. If I am configuring the system properly, I can set O_DIRECT manually. Indeed, the default configuration could never work for everyone in all cases, but I can't understand why fsync should not be the default since, in the default configuration, it worked for more use cases than O_DIRECT is.

Danilo Spinella added a comment - 2022-02-09 14:14 I think 20x times slower in some basic read queries on the default configuration is strong enough to revert this change. Especially since the benchmark doesn't show any difference for small queries (i.e. when innodb_buffer_pool_size is big enough). Saying that that the default configuration requires the user to configure the system it's totally counter intuitive. If I am configuring the system properly, I can set O_DIRECT manually. Indeed, the default configuration could never work for everyone in all cases, but I can't understand why fsync should not be the default since, in the default configuration, it worked for more use cases than O_DIRECT is.

Marko Mäkelä made changes - 2022-02-09 14:23

Link

This issue relates to ~~MDEV-24854~~ [ ~~MDEV-24854~~ ]

Vladislav Vaintroub added a comment - 2022-02-09 16:54 - edited

Note, that while I do support fsync (if innodb_buffer_pool_size is not configured), I think you're pushing this 20x a little bit too much. First, why would anyone consider btrfs at all - that's not the default, and its read speed as shown by your benchmark, is questionable. So for a database, maybe I'd stay away from btrfs pretty much.

Second, asking about how much buffer pool size to dedicate to database, might be considered the installer's or if you want, packager's responsibility . MariaDB's Windows MSI by default asks about 12.5% of RAM, and also lets user overwrite this, in both UI and silent mode. MSI has been doing it since 10 years, so maybe zypper also can consider asking user that one question.

Vladislav Vaintroub added a comment - 2022-02-09 16:54 - edited Note, that while I do support fsync (if innodb_buffer_pool_size is not configured), I think you're pushing this 20x a little bit too much. First, why would anyone consider btrfs at all - that's not the default, and its read speed as shown by your benchmark, is questionable. So for a database, maybe I'd stay away from btrfs pretty much. Second, asking about how much buffer pool size to dedicate to database, might be considered the installer's or if you want, packager's responsibility . MariaDB's Windows MSI by default asks about 12.5% of RAM, and also lets user overwrite this, in both UI and silent mode. MSI has been doing it since 10 years, so maybe zypper also can consider asking user that one question.

Otto Kekäläinen added a comment - 2022-02-09 19:32

Can't we make MariaDB/InnoDB emit a warning or suggestion to users if it detects that the below is not the case?

If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well.

We can't rely on deb/rpm/msi packaging here. Many systems run installs/upgrades in an automatic and unattended way, and there is no human to tell things to and even less to ask them to decide on something. And anyway workloads might change over time long after the installation was done. Best place to decide on good defaults is MariaDB itself and best place to tell users something is wrong or suboptimal is by emitting warnings in the error logs of MariaDB at runtime.

Otto Kekäläinen added a comment - 2022-02-09 19:32 Can't we make MariaDB/InnoDB emit a warning or suggestion to users if it detects that the below is not the case? If the working set fits in the default-size buffer pool, or if you have configured the buffer pool size to match the available RAM and buffer pool, it works well. We can't rely on deb/rpm/msi packaging here. Many systems run installs/upgrades in an automatic and unattended way, and there is no human to tell things to and even less to ask them to decide on something. And anyway workloads might change over time long after the installation was done. Best place to decide on good defaults is MariaDB itself and best place to tell users something is wrong or suboptimal is by emitting warnings in the error logs of MariaDB at runtime.

Danilo Spinella added a comment - 2022-02-10 08:54

First, why would anyone consider btrfs at all

The 20x difference is on xfs. My benchmark shows regression on brtfs as well. I don't think making assumptions on the filesystem used by default is good idea either.

read speed as shown by your benchmark, is questionable

Why is it questionable? Can you please explain better?

Danilo Spinella added a comment - 2022-02-10 08:54 First, why would anyone consider btrfs at all The 20x difference is on xfs. My benchmark shows regression on brtfs as well. I don't think making assumptions on the filesystem used by default is good idea either. read speed as shown by your benchmark, is questionable Why is it questionable? Can you please explain better?

Marko Mäkelä added a comment - 2022-02-11 10:09

There exist monitoring tools that can build on existing instrumentation in the server to highlight problems. I do not think that such logic should be duplicated in the database server itself. The server can provide raw data. It cannot guess the intentions of the user.

My intuition suggests that any copy-on-write file system, such as xfs or btrfs, could be a bad match for a write-heavy InnoDB setup, or for any database that is based on B-trees and a circular write-ahead log.

Marko Mäkelä added a comment - 2022-02-11 10:09 There exist monitoring tools that can build on existing instrumentation in the server to highlight problems. I do not think that such logic should be duplicated in the database server itself. The server can provide raw data. It cannot guess the intentions of the user. My intuition suggests that any copy-on-write file system, such as xfs or btrfs, could be a bad match for a write-heavy InnoDB setup, or for any database that is based on B-trees and a circular write-ahead log.

Marko Mäkelä made changes - 2022-02-11 10:17

Link

This issue relates to ~~MDEV-27812~~ [ ~~MDEV-27812~~ ]

Timo Jyrinki added a comment - 2022-05-27 09:11

For a future MariaDB LTS release, it could be beneficial to have fallback scenarios in case MDEV-19895 does not get implemented by then. The pool size default could be bumped to the mentioned 1GB, or reverting to fsync if innodb_buffer_pool_size not configured, maybe with an informational message suggesting to set the value. This would likely improve the out-of-the-box experience for many.

With O_DIRECT we generally see good results given the 1GB pool size (or higher). The plea for a warning message only comes from the fact that the default does not seem like an optimal compromise, and that the new behavior needs manual intervention compared to more "forgiving" fsync.

All in all, learning about these is an acceptable learning curve about MariaDB 10.6 and good documentation helps. O_DIRECT has been the correct default choice in general. Good (or at least ok) defaults would be always useful too, though.

Timo Jyrinki added a comment - 2022-05-27 09:11 For a future MariaDB LTS release, it could be beneficial to have fallback scenarios in case MDEV-19895 does not get implemented by then. The pool size default could be bumped to the mentioned 1GB, or reverting to fsync if innodb_buffer_pool_size not configured, maybe with an informational message suggesting to set the value. This would likely improve the out-of-the-box experience for many. With O_DIRECT we generally see good results given the 1GB pool size (or higher). The plea for a warning message only comes from the fact that the default does not seem like an optimal compromise, and that the new behavior needs manual intervention compared to more "forgiving" fsync. All in all, learning about these is an acceptable learning curve about MariaDB 10.6 and good documentation helps. O_DIRECT has been the correct default choice in general. Good (or at least ok) defaults would be always useful too, though.

Marko Mäkelä made changes - 2023-08-15 07:32

Link

This issue is duplicated by ~~MDEV-31227~~ [ ~~MDEV-31227~~ ]

People

Assignee:: Unassigned

Reporter:: Danilo Spinella

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2022-02-08 10:12

Updated:: 2023-08-15 07:32

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration