Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30054

debug-no-sync doesnt fully disable sync calls

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Fixed
    • None
    • 11.0.0
    • should be reproducible on any linux where sync is really flushes changes onto disk (i.e. not faked).

    Description

      According to documentation debug-no-sync Disables system sync calls

      But it doesn't disable the calls from all places, which affects benchmarks and timing of testing where disk sync must be excluded from scope.

      There are two ways to prove it.
      1. running mysql_install_db with eatmydata is twice faster than with --debug-no-sync:

      > rm -rf dt ; mkdir dt; time ( mysql_install_db --no-defaults --data=$PWD/dt --debug-no-sync >& /dev/null )
       
      real	0m1.232s
      user	0m0.175s
      sys	0m0.082s
      > rm -rf dt ; mkdir dt; time ( eatmydata mysql_install_db --no-defaults --data=$PWD/dt >& /dev/null )
       
      real	0m0.676s
      user	0m0.148s
      sys	0m0.082s
      

      2. Capturing stack traces e.g. during mysql_install_db shows hanging calls to fdatasync().

      terminal1 (will show stack traces):

      while :; do gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof mariadbd) 2>&1 | grep -A15 fdatasync ; done
      

      terminal2 (run server, e.g. mysql_install_db):

      rm -rf dt ; mkdir dt; time ( mysql_install_db --no-defaults --data=$PWD/dt --debug-no-sync >& /dev/null )
      

      see the attached logs for details of stack traces.

      Attachments

        1. fdatasync1.log
          12 kB
          Andrii
        2. fdatasync2.log
          9 kB
          Andrii

        Issue Links

          Activity

            anikitin1, does MariaDB 11.0 (after MDEV-30136) work for you? There, I decided to map innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_flush_method=O_DIRECT in the same way (that is, ignore the settings, and use defaults). The only option for disabling InnoDB fsync() or fdatasync() would then be the option debug-no-sync.

            marko Marko Mäkelä added a comment - anikitin1 , does MariaDB 11.0 (after MDEV-30136 ) work for you? There, I decided to map innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_flush_method=O_DIRECT in the same way (that is, ignore the settings, and use defaults). The only option for disabling InnoDB fsync() or fdatasync() would then be the option debug-no-sync .
            anikitin1 Andrii added a comment -

            I've tried 11.0.0 tar and indeed the problem is fixed for described steps, thank you!
            Assuming that it will not degrade - feel free to close the call as 'fixed in 11.0.0'.

            anikitin1 Andrii added a comment - I've tried 11.0.0 tar and indeed the problem is fixed for described steps, thank you! Assuming that it will not degrade - feel free to close the call as 'fixed in 11.0.0'.

            anikitin1, thank you. This change was part of the 11.0.0 preview, and it was also applied to the 11.0.1 release separately.

            marko Marko Mäkelä added a comment - anikitin1 , thank you. This change was part of the 11.0.0 preview, and it was also applied to the 11.0.1 release separately.
            anikitin1 Andrii added a comment - - edited

            On second thought I am not sure if I like the idea to obsolete O_DIRECT_NO_FSYNC (or did I get it wrong?).
            In my understanding --debug-no-sync is much more dangerous than effect of O_DIRECT_NO_FSYNC

            anikitin1 Andrii added a comment - - edited On second thought I am not sure if I like the idea to obsolete O_DIRECT_NO_FSYNC (or did I get it wrong?). In my understanding --debug-no-sync is much more dangerous than effect of O_DIRECT_NO_FSYNC

            anikitin1, you are correct about the degree of danger.

            I believe that when using the ext4 file system on Linux, innodb_flush_method=O_DIRECT_NO_FSYNC is almost equivalent to innodb_flush_method=O_DIRECT. In our performance tests, we did not notice significant difference between them. Let me quote part of my comment from MDEV-24854:

            I found a plausible claim regarding when fdatasync() is needed after an O_DIRECT write:

            • If the file is being extended as part of the write.
            • If this is the first write after the space had been allocated with fallocate().

            These are rather rare cases, so the overhead of a no-op fdatasync() call should be relatively small.

            The risky scenario (assuming Linux ext4 file system) would be that an InnoDB data file was extended by fallocate(), a log checkpoint was executed, and the operating system crashed and was restarted. In this case, we could fail to recover some newly extended pages in the file. I do not think that almost immeasurable performance gain of using innodb_flush_method=O_DIRECT_NO_FSYNC instead of innodb_flush_method=O_DIRECT is worth the trouble. Therefore, I do not think that losing innodb_flush_method=O_DIRECT_NO_FSYNC in MDEV-30136 was a big deal.

            marko Marko Mäkelä added a comment - anikitin1 , you are correct about the degree of danger. I believe that when using the ext4 file system on Linux, innodb_flush_method=O_DIRECT_NO_FSYNC is almost equivalent to innodb_flush_method=O_DIRECT . In our performance tests, we did not notice significant difference between them. Let me quote part of my comment from MDEV-24854 : I found a plausible claim regarding when fdatasync() is needed after an O_DIRECT write: If the file is being extended as part of the write. If this is the first write after the space had been allocated with fallocate() . These are rather rare cases, so the overhead of a no-op fdatasync() call should be relatively small. The risky scenario (assuming Linux ext4 file system) would be that an InnoDB data file was extended by fallocate() , a log checkpoint was executed, and the operating system crashed and was restarted. In this case, we could fail to recover some newly extended pages in the file. I do not think that almost immeasurable performance gain of using innodb_flush_method=O_DIRECT_NO_FSYNC instead of innodb_flush_method=O_DIRECT is worth the trouble. Therefore, I do not think that losing innodb_flush_method=O_DIRECT_NO_FSYNC in MDEV-30136 was a big deal.

            People

              marko Marko Mäkelä
              anikitin1 Andrii
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.