[MDEV-30054] debug-no-sync doesnt fully disable sync calls Created: 2022-11-21  Updated: 2023-01-30  Resolved: 2023-01-30

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: None
Fix Version/s: 11.0.0

Type: Bug Priority: Minor
Reporter: Andrii Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: beginner-friendly
Environment:

should be reproducible on any linux where sync is really flushes changes onto disk (i.e. not faked).


Attachments: File fdatasync1.log     File fdatasync2.log    
Issue Links:
Relates
relates to MDEV-30136 Map innodb_flush_method to new settab... Closed

 Description   

According to documentation debug-no-sync Disables system sync calls

But it doesn't disable the calls from all places, which affects benchmarks and timing of testing where disk sync must be excluded from scope.

There are two ways to prove it.
1. running mysql_install_db with eatmydata is twice faster than with --debug-no-sync:

> rm -rf dt ; mkdir dt; time ( mysql_install_db --no-defaults --data=$PWD/dt --debug-no-sync >& /dev/null )
 
real	0m1.232s
user	0m0.175s
sys	0m0.082s
> rm -rf dt ; mkdir dt; time ( eatmydata mysql_install_db --no-defaults --data=$PWD/dt >& /dev/null )
 
real	0m0.676s
user	0m0.148s
sys	0m0.082s

2. Capturing stack traces e.g. during mysql_install_db shows hanging calls to fdatasync().

terminal1 (will show stack traces):

while :; do gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof mariadbd) 2>&1 | grep -A15 fdatasync ; done

terminal2 (run server, e.g. mysql_install_db):

rm -rf dt ; mkdir dt; time ( mysql_install_db --no-defaults --data=$PWD/dt --debug-no-sync >& /dev/null )

see the attached logs for details of stack traces.



 Comments   
Comment by Marko Mäkelä [ 2022-11-21 ]

It looks like both fdatasync1.log and fdatasync2.log have been generated on an executable that is missing debug symbols. I was not even aware of such an option.

If you want to reduce the amount of fsync() or fdatasync() operations inside InnoDB, the crash-safe way to do that is to change innodb_flush_log_at_trx_commit to the value 0 or 2. The default value (1) will ensure that each user transaction commit is durable. Even if you are fine with losing a few last transactions, crash recovery would be totally broken if there was no fdatasync() executed as part of InnoDB log checkpoints. There is no other way to force a certain ordering of writes (write barriers). It would be nice to have a system call interface for that.

Starting with MDEV-14425 in MariaDB Server 10.8, if the InnoDB ib_logfile0 resides in /dev/shm or in persistent memory, then the persistence of the log will be guaranteed without any system calls. The system calls will still be needed around data page writes, to ensure that log checkpoints are written correctly.

Last, if you do not care about data integrity at all (for example, when bulk loading data), you should be able to disable all fsync() or fdatasync() calls by using libeatmydata.so.

Comment by Andrii [ 2022-11-21 ]

I don't think that debug symbols are relevant to this issue: the point wasn't to show exact places, the point was to demonstrate the problem.

I don't need a workaround here - I just point out that documentation doesn't match behavior of server.

And since Server can participate in complex scenarios - I think it is not fair to ask users to use external tools or play with multiple options when documentation claims that the behavior can be achieved with single parameter.

Comment by Marko Mäkelä [ 2022-11-28 ]

As far as I can tell, the function my_sync() will call fdatasync() or fsync() or similar functions. On Microsoft Windows, it never tries to call the fdatasync() equivalent NtFlushBuffersFileEx(), but always the more expensive FlushFileBuffers(). It looks like some or all of the InnoDB os_file_flush_func() should be merged with my_sync().

Comment by Andrii [ 2022-11-28 ]

> It looks like some or all of the InnoDB os_file_flush_func() should be merged with my_sync().

Yes, that was my understanding as well, just it should also be done for the other storage engines (in particular at least sync from Aria influences timing of mysql_install_db).

Comment by Marko Mäkelä [ 2023-01-30 ]

anikitin1, does MariaDB 11.0 (after MDEV-30136) work for you? There, I decided to map innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_flush_method=O_DIRECT in the same way (that is, ignore the settings, and use defaults). The only option for disabling InnoDB fsync() or fdatasync() would then be the option debug-no-sync.

Comment by Andrii [ 2023-01-30 ]

I've tried 11.0.0 tar and indeed the problem is fixed for described steps, thank you!
Assuming that it will not degrade - feel free to close the call as 'fixed in 11.0.0'.

Comment by Marko Mäkelä [ 2023-01-30 ]

anikitin1, thank you. This change was part of the 11.0.0 preview, and it was also applied to the 11.0.1 release separately.

Comment by Andrii [ 2023-01-30 ]

On second thought I am not sure if I like the idea to obsolete O_DIRECT_NO_FSYNC (or did I get it wrong?).
In my understanding --debug-no-sync is much more dangerous than effect of O_DIRECT_NO_FSYNC

Comment by Marko Mäkelä [ 2023-01-30 ]

anikitin1, you are correct about the degree of danger.

I believe that when using the ext4 file system on Linux, innodb_flush_method=O_DIRECT_NO_FSYNC is almost equivalent to innodb_flush_method=O_DIRECT. In our performance tests, we did not notice significant difference between them. Let me quote part of my comment from MDEV-24854:

I found a plausible claim regarding when fdatasync() is needed after an O_DIRECT write:

  • If the file is being extended as part of the write.
  • If this is the first write after the space had been allocated with fallocate().

These are rather rare cases, so the overhead of a no-op fdatasync() call should be relatively small.

The risky scenario (assuming Linux ext4 file system) would be that an InnoDB data file was extended by fallocate(), a log checkpoint was executed, and the operating system crashed and was restarted. In this case, we could fail to recover some newly extended pages in the file. I do not think that almost immeasurable performance gain of using innodb_flush_method=O_DIRECT_NO_FSYNC instead of innodb_flush_method=O_DIRECT is worth the trouble. Therefore, I do not think that losing innodb_flush_method=O_DIRECT_NO_FSYNC in MDEV-30136 was a big deal.

Generated at Thu Feb 08 10:13:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.