[MDEV-33203] storage/innobase/os/os0file.cc doesn't compile on Solaris Created: 2024-01-09  Updated: 2024-01-19  Resolved: 2024-01-19

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 11.0, 11.1, 11.2, 11.3, 11.4
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2, 11.4.1

Type: Bug Priority: Critical
Reporter: Rainer Orth Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression
Environment:

Solaris 11.4/amd64


Attachments: File MDEV-TBD-O_DIRECT.patch    
Issue Links:
Blocks
is blocked by MDEV-33095 innodb_flush_method=O_DIRECT creates ... Closed
Problem/Incident
is caused by MDEV-30136 Map innodb_flush_method to new settab... Closed

 Description   

To provide patches for the Solaris build issues I've reported before against MariaDB 10.11.6, I've tried a build on 11.4 instead. Unfortunately, there's a new build failure there:
```
/vol/src/mariadb/mariadb-server/storage/innobase/os/os0file.cc: In function ‘pfs_os_file_t os_file_create_func(const char*, ulint, ulint, ulint, bool, bool*)’:
/vol/src/mariadb/mariadb-server/storage/innobase/os/os0file.cc:1233:49: error: ‘mode_str’ was not declared in this scope; did you mean ‘mode_t’?
1233 | os_file_set_nocache(file, name, mode_str);

^~~~~~~~
mode_t
```
While all other uses of `mode_str` are wrapped in `#ifdef O_DIRECT`, this one is not, which breaks the Solaris build which lacks `O_DIRECT`.

Fixed by wrapping that case in `#ifdef O_DIRECT`, too.



 Comments   
Comment by Marko Mäkelä [ 2024-01-09 ]

The function os_file_set_nocache() should be defined on all other platforms than Microsoft Windows. Solaris at least used to define a preprocessor symbol DIRECTIO_ON and a system call directio(), which the implementation of that function should be using. Could you please follow the code contribution instructions and file a pull request?

Comment by Rainer Orth [ 2024-01-09 ]

My current intent is just to unbreak the build, not implement directio on Solaris.

You mean a pull request for the current patch? So far, Sergei always told me to just attach patches to the issue.

Comment by Marko Mäkelä [ 2024-01-09 ]

Originally, Solaris directio() was implemented for InnoDB in MySQL 5.1.18 if not earlier. Unfortunately, we no longer have any Solaris or its derivatives on the CI systems, so some breakage is inevitable. I think that the correct preprocessor directive around the local variable mode_str would be the following:

#if (defined __sun__ && defined DIRECTIO_ON) || defined O_DIRECT

As far as I understand, this should have been broken in MDEV-30136 (MariaDB Server 11.0) and needs to be fixed there.

Comment by Rainer Orth [ 2024-01-10 ]

For one, please don't mix the build-breaking use of undefined `mode_str` with direct I/O on Solaris. If we really want to tackle the latter, this should be a separate issue.

That said, the Solaris code would have to be completely different: while `O_DIRECT` is an `open(2)` flag, Solaris `DIRECTIO_ON` etc. are options to a completely different `directio(3)` function. More importantly, the `directio(3C)` manpage states:
```
The directio() function is supported for the NFS and UFS file system
types (see fstyp(8)).
```
I've checked the OpenSolaris code and this is still true.

I doubt many people a using MariaDB with storage on NFS these days, and UFS is considered completely obsolete with the advent of ZFS. In fact, it's more than 15 years that I've last used UFS on Solaris, so any effort to implement `directio` support would be almost completely wasted.

Comment by Marko Mäkelä [ 2024-01-10 ]

rorth, thank you. What you say makes complete sense. (By the way, Jira uses its own proprietary markup language, not Markdown.)

I do not have much experience with copy-on-write file systems, or in fact anything else than journal based ones (Linux ext4 and Microsoft NTFS). I understood that ZFS is also compressing data on the fly. That kind of stuff can actually only work if the data is being copied between user and kernel address space, that is, there is no innodb_flush_method=O_DIRECT or innodb_log_file_buffering=OFF or innodb_data_file_buffering=OFF.

Are you suggesting that we should simply remove the directio() code on Solaris and (among others) treat innodb_flush_method=O_DIRECT as if innodb_flush_method=fsync had been specified? If I remember correctly, the directio() change was added by one colleague and not double-checked by anyone else. It looks like the documentation was misunderstood already back then.

Comment by Marko Mäkelä [ 2024-01-12 ]

I just came across https://blogs.oracle.com/solaris/post/zfs-and-directio from 2006. If ZFS never overwrites live data on Solaris, I guess that innodb_doublewrite=OFF would be a safe default. Not only on a copy-on-write file system, but also on a journaling file system, the doublewrite buffer should be redundant, in theory. In practice, implementation flaws may exist: https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4-transactions/

With OpenZFS on Linux I would have some doubts regarding the atomicity of writes. I don’t know if the userspace to kernel space data copying has been fixed in Linux meanwhile. In the past, I observed that any writes (not only to the file system, but also to an RS-232 serial line) may be truncated at multiples of 4096 bytes when merely the process is being killed.

I’ve heard that on Linux you may safely disable the doublewrite buffer when using O_DIRECT on the ext4 file system and an SSD that supports atomic writes of innodb_page_size, provided that the file system allocation block size as well as the alignment of the partition are compatible with innodb_page_size.

Comment by Rainer Orth [ 2024-01-15 ]

Just for correctness: ZFS *can* compress data on the fly, but that's off by default. It can be enabled on a per-dataset/per-filesystem basis as need be, though.

As for removing directio(3C) code: while I cannot say for certain if there are any users of MariaDB on NFS storage (where it could benefit), I'm quite certain that UFS isn't used with current versions of MariaDB these days. So unless NFS storage support is deemed worthwhile, I'd say that removing directio() support would just simplify the codebase for no significant loss in functionality.

Instead of treating innodb_flush_method=O_DIRECT, I'd argue that mariadbd should error out if this is selected explicitly on targets that cannot support it. In our case, we hadn't set the variable at all and got tons of warnings. In this case, I believe the daemon should default to innodb_flush_method=fsync instead, which we had to do explicitly to get rid of the warnings.

Comment by Marko Mäkelä [ 2024-01-16 ]

Before innodb_flush_method was changed to an enumeration in MDEV-12218, it was a string parameter. The validation before and after that change accepted all values, no matter if they were actually implemented on the current platform. Only for Microsoft Windows, the additional values normal, unbuffered, and async_unbuffered had been allowed.

I did implement a prototype that would restrict the choice of allowed values, but in the end I would avoid that. We have to keep in mind that the parameter innodb_flush_method was deprecated in MDEV-30136. It could make sense to conditionally define the Boolean parameters innodb_log_file_buffering and innodb_data_file_buffering, or to issue errors or warnings when they have no effect.

Comment by Marko Mäkelä [ 2024-01-19 ]

I think that this was fixed by MDEV-33095, by removing any support for O_DIRECT on Solaris.

Generated at Thu Feb 08 10:37:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.