[MDEV-27772] Performance regression with default configuration in 10.6 Created: 2022-02-08 Updated: 2023-08-15 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Danilo Spinella | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
SUSE Enterprise Linux 15 SP4, openSUSE Tumbleweed |
||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
MariaDB 10.6 has O_DIRECT value as default for innodb_flush_method instead of fsync. This lead to a performance regression in the default configuration, regardless of the filesystem used.
I have used a mount point in /abuild and the following configuration in /etc/my.cnf.d/sysbench_testing.cnf:
To reproduce, run the following commands:
I am also aware that some testing by some colleague lead to similar regression in xfs (353.5476s in 10.6 vs 16.9560s in 10.5). |
| Comments |
| Comment by Vladislav Vaintroub [ 2022-02-08 ] |
|
but if you benchmark a table with 1000 rows, instead of 1 million, it will probably not result in a performance regression, right? |
| Comment by Marko Mäkelä [ 2022-02-08 ] |
|
If this is with the default innodb_buffer_pool_size=128M and innodb_log_file_size=96M, then this is as expected. Due to O_DIRECT, any page reads due to the (by today’s standards) tiny buffer pool cannot be served from the Linux file system cache. The InnoDB buffer pool itself is supposed to be the data file cache. A tiny buffer pool will also cause more frequent page writes and page checksum calculations on both reads and writes. I expect MDEV-19895 to address this. |
| Comment by Danilo Spinella [ 2022-02-08 ] |
|
Yea, with 1000 rows there is almost no difference (just tried with brtfs). |
| Comment by Vladislav Vaintroub [ 2022-02-09 ] |
|
ok , for danyspin97 once again the explanation.
There is a certain appeal in innodb_flush_method = fsync, that is, you can misconfigure your database instance, and it will still perform OK-ish, not ideal of course. Unlike SQLServer, Innodb unfortunately does not yet have a buffer pool that would automatically grow or shrink depending on the load. As marko says, 128M default is tiny, in fact this would be SQLServer minimum. Maybe 1G should be the new default. |
| Comment by Danilo Spinella [ 2022-02-09 ] |
|
Thank you for the detailed explanation. However in the default configuration, fsync outperforms (20 times faster in some cases) O_DIRECT. Would it make sense to have fsync as default until MDEV-19895 has been fixed? This way, O_DIRECT would be used by people who also increase and tweak innodb_buffer_pool_size according to their needs and wouldn't lead to any performance regression on the default configuration. I see that the auto_shrink feature will be added in 10.9, which from my understanding it won't come out any time soon. |
| Comment by Vladislav Vaintroub [ 2022-02-09 ] |
|
In the default configuration, on your benchmark, which does not fit into bufferpool, fsync outperforms. This would be a more accurate statement, as you confirmed that if you benchmark fits into bufferpool, 20x faster does not happen. There is a possibility to check if buffer pool size was not set, and then maybe then choose fsync over O_DIRECT, perhaps writing a In other news, I would not like autosetting of Innodb memory to 80% of my RAM as per MDEV-19895 , going from previous default of 128MB. |
| Comment by Marko Mäkelä [ 2022-02-09 ] |
|
wlad, I think that someone is always guaranteed to be surprised or disappointed, whatever we do. danyspin97, we try to follow a policy that the default values of parameters are only changed before a server is released as generally available. Sometimes we do deviate from it when there are strong reasons (see If MDEV-19895 made the default innodb_buffer_pool_size 80% of the available RAM, and the default innodb_log_file_size the same as innodb_buffer_pool_size, someone could be upset that their tiny database is hogging all RAM for no reason, or their mostly read-only database is suddenly using a huge redo log file. Or they never cared about poor write performance when using the current default innodb_log_file_size=96MiB. I do not think that there can ever be a default configuration that really works for everyone in all cases. It is hard to say what is a ‘typical’ or ‘common’ use case. When it comes to the innodb_buffer_pool_size, there are some ideas for improving it in Theoretically, we might also resize the log file based on workload and some limits configured by the DBA. |
| Comment by Danilo Spinella [ 2022-02-09 ] |
|
I think 20x times slower in some basic read queries on the default configuration is strong enough to revert this change. Especially since the benchmark doesn't show any difference for small queries (i.e. when innodb_buffer_pool_size is big enough). Saying that that the default configuration requires the user to configure the system it's totally counter intuitive. If I am configuring the system properly, I can set O_DIRECT manually. Indeed, the default configuration could never work for everyone in all cases, but I can't understand why fsync should not be the default since, in the default configuration, it worked for more use cases than O_DIRECT is. |
| Comment by Vladislav Vaintroub [ 2022-02-09 ] |
|
Note, that while I do support fsync (if innodb_buffer_pool_size is not configured), I think you're pushing this 20x a little bit too much. First, why would anyone consider btrfs at all - that's not the default, and its read speed as shown by your benchmark, is questionable. So for a database, maybe I'd stay away from btrfs pretty much. Second, asking about how much buffer pool size to dedicate to database, might be considered the installer's or if you want, packager's responsibility . MariaDB's Windows MSI by default asks about 12.5% of RAM, and also lets user overwrite this, in both UI and silent mode. MSI has been doing it since 10 years, so maybe zypper also can consider asking user that one question. |
| Comment by Otto Kekäläinen [ 2022-02-09 ] |
|
Can't we make MariaDB/InnoDB emit a warning or suggestion to users if it detects that the below is not the case?
We can't rely on deb/rpm/msi packaging here. Many systems run installs/upgrades in an automatic and unattended way, and there is no human to tell things to and even less to ask them to decide on something. And anyway workloads might change over time long after the installation was done. Best place to decide on good defaults is MariaDB itself and best place to tell users something is wrong or suboptimal is by emitting warnings in the error logs of MariaDB at runtime. |
| Comment by Danilo Spinella [ 2022-02-10 ] |
The 20x difference is on xfs. My benchmark shows regression on brtfs as well. I don't think making assumptions on the filesystem used by default is good idea either.
Why is it questionable? Can you please explain better? |
| Comment by Marko Mäkelä [ 2022-02-11 ] |
|
There exist monitoring tools that can build on existing instrumentation in the server to highlight problems. I do not think that such logic should be duplicated in the database server itself. The server can provide raw data. It cannot guess the intentions of the user. My intuition suggests that any copy-on-write file system, such as xfs or btrfs, could be a bad match for a write-heavy InnoDB setup, or for any database that is based on B-trees and a circular write-ahead log. |
| Comment by Timo Jyrinki [ 2022-05-27 ] |
|
For a future MariaDB LTS release, it could be beneficial to have fallback scenarios in case MDEV-19895 does not get implemented by then. The pool size default could be bumped to the mentioned 1GB, or reverting to fsync if innodb_buffer_pool_size not configured, maybe with an informational message suggesting to set the value. This would likely improve the out-of-the-box experience for many. With O_DIRECT we generally see good results given the 1GB pool size (or higher). The plea for a warning message only comes from the fact that the default does not seem like an optimal compromise, and that the new behavior needs manual intervention compared to more "forgiving" fsync. All in all, learning about these is an acceptable learning curve about MariaDB 10.6 and good documentation helps. O_DIRECT has been the correct default choice in general. Good (or at least ok) defaults would be always useful too, though. |