Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33545

Perf regression from removing innodb_flush_method=O_DIRECT_NO_FSYNC

Details

    Description

      Back when I did web-scale InnoDB I always set innodb_flush_method to O_DIRECT_NO_FSYNC. At one point we had a bug in the FB patch for MySQL similar to the bug described in this comment, but we fixed that and upstream MySQL has a correct implementation of it.

      MariaDB docs for innodb_flush_method now includes this claim, and this claims is news to me. Do you have more detail as to why O_DIRECT_NO_FSYNC isn't good with XFS?
      "Not suitable for XFS filesystems."

      Finally, this is an example of a performance regression from not having O_DIRECT_NO_FSYNC. I ran subset of the insert benchmark and then timed how long it took for MySQL to shutdown.

      The server in this case is a 32-core (yes, real cores) AMD with 128G RAM, Ubuntu 22.04, XFS and SW RAID 10 across 2 NVMe devices. The my.cnf files are here for MariaDB 10.11, for MariaDB 11.4 and for MySQL 8.0.36

      From the results below, the shutdown is much faster with O_DIRECT_NO_FSYNC for the dbms that support it (MariaDB 10.11, MySQL 8.0.36).

      {{configuration:

      • a - innodb_flush_method=O_DIRECT_NO_FSYNC
      • b - innodb_flush_method=O_DIRECT (or equivalent)
      • c - innodb_flush_method=fsync (or equivalent)

      dbms:

      • 10.11.7, 11.4.1 - MariaDB with InnoDB
      • 8.0.36 - MySQL with InnoDB

      This is from 24 clients, 20M rows/table, table/client == 480M rows

      Numbers are seconds for shutdown
      configuration
      dbms a b c
      10.11.7 117 454 1247
      11.4.1 529 1259
      8.0.36 76 2667 3390

      This is from 24 clients, 10M rows/table, table/client == 240M rows
      dbms a b c
      10.11.7 87 254 732
      11.4.1 258 730
      8.0.36 37 1972 2478}}

      If I look at PMP call stacks during the shutdown, this is a common stack with MariaDB 11.4.1 showing that things are frequently waiting on the fsync for the doublewrite buffer:
      __GI_fdatasync,os_file_flush_func(int),fil_space_t::flush_low(),fil_flush_file_spaces(),buf_dblwr_t::write_completed(),buf_page_write_complete(IORequest,IORequest::write_complete(int),write_io_callback(void*),tpool::task_group::execute(tpool::task*),tpool::thread_pool_generic::worker_main(tpool::worker_data*),??,start_thread,clone3

      When I look at stack traces for configs a and b (a=O_DIRECT_NO_FSYNC, b=O_DIRECT) for MySQL 8.0.36, then I also see many stack traces where things are waiting on the doublewritebuffer fsync.

      _GI_fsync,os_file_fsync_posix,os_file_flush_func,pfs_os_file_flush_func,Fil_shard::space_flush,Fil_shard::flush_file_spaces,Fil_system::flush_file_spaces,Double_write::write_complete,buf_page_io_complete,fil_aio_wait,io_handler_thread,std::invoke_impl<void,,std::invoke<void,std::_Bind<void,std::_Bind<void,Detached_thread::operator()<void,std::invoke_impl<void,,std::_invoke<Detached_thread,,std::thread::_Invoker<std::tuple<Detached_thread,,std::thread::_Invoker<std::tuple<Detached_thread,,std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread,,??,start_thread,clone3

      Attachments

        Issue Links

          Activity

            I think, Mark's results need a dedicated MDEV , because they do not really fit into O_DIRECT_NO_FSYNC category.

            As for benchmark results, my guess would be that insert suffers from MDEV-12288 , in that inserts cause purge, whereas MySQL does not and never had. Given that, and IO bound scenario, a lot of bufferpool would be filled with undo pages, and there will be much more read IO going on.

            If that guess is correct, and there is a way to make MDEV-12280 changes optional, it would be great.

            wlad Vladislav Vaintroub added a comment - I think, Mark's results need a dedicated MDEV , because they do not really fit into O_DIRECT_NO_FSYNC category. As for benchmark results, my guess would be that insert suffers from MDEV-12288 , in that inserts cause purge, whereas MySQL does not and never had. Given that, and IO bound scenario, a lot of bufferpool would be filled with undo pages, and there will be much more read IO going on. If that guess is correct, and there is a way to make MDEV-12280 changes optional, it would be great.

            Marko - it is nice to know there is, or will be a fast path, for loading that will help some workloads. But I assume the more common case will be to not use the fast path thus I want crash safety during l.i0.

            WRT performance vs correctness, I assume the bugs that you reference are limited to MariaDB and that change buffering in MySQL's InnoDB doesn't have these problems. Alas, I don't have much production experience with modern MySQL+InnoDB to have a strong opinion on that.

            mdcallag Mark Callaghan added a comment - Marko - it is nice to know there is, or will be a fast path, for loading that will help some workloads. But I assume the more common case will be to not use the fast path thus I want crash safety during l.i0. WRT performance vs correctness, I assume the bugs that you reference are limited to MariaDB and that change buffering in MySQL's InnoDB doesn't have these problems. Alas, I don't have much production experience with modern MySQL+InnoDB to have a strong opinion on that.

            Also, at the end of the day "perf vs correctness" is your issue. Users are just going to see that some write-heavy workloads are ~5X faster on InnoDB when using upstream MySQL instead of using MariaDB.

            mdcallag Mark Callaghan added a comment - Also, at the end of the day "perf vs correctness" is your issue. Users are just going to see that some write-heavy workloads are ~5X faster on InnoDB when using upstream MySQL instead of using MariaDB.

            But even in the case where I compare MariaDB without change buffering to MySQL without change buffering, MariaDB is still much slower than MySQL. Perhaps my attempts to explain the performance problems will start there.

            mdcallag Mark Callaghan added a comment - But even in the case where I compare MariaDB without change buffering to MySQL without change buffering, MariaDB is still much slower than MySQL. Perhaps my attempts to explain the performance problems will start there.

            mdcallag, I am fully sure that the change buffer related bugs MDEV-32132 and MDEV-24448/MDEV-24449 affect MySQL as well, and I would expect MDEV-32898 to be the same.

            When I worked for Oracle, I visited a customer who encountered the mystery bug https://bugs.mysql.com/bug.php?id=61104. It could not be reproduced back then. One possible explanation could be MDEV-24448 in xtrabackup. Before I left Oracle for MariaDB, I remember that there were 2 or 3 open bugs with symptoms similar to MDEV-9663. Back then, there was no https://rr-project.org so we were unable to analyze those failures. I am also rather sure that MySQL 5.7 suffers from MDEV-15326.

            To disable the extra writes caused by MDEV-12288, one could comment out the body of the function row_purge_reset_trx_id().

            marko Marko Mäkelä added a comment - mdcallag , I am fully sure that the change buffer related bugs MDEV-32132 and MDEV-24448 / MDEV-24449 affect MySQL as well, and I would expect MDEV-32898 to be the same. When I worked for Oracle, I visited a customer who encountered the mystery bug https://bugs.mysql.com/bug.php?id=61104 . It could not be reproduced back then. One possible explanation could be MDEV-24448 in xtrabackup . Before I left Oracle for MariaDB, I remember that there were 2 or 3 open bugs with symptoms similar to MDEV-9663 . Back then, there was no https://rr-project.org so we were unable to analyze those failures. I am also rather sure that MySQL 5.7 suffers from MDEV-15326 . To disable the extra writes caused by MDEV-12288 , one could comment out the body of the function row_purge_reset_trx_id() .

            People

              marko Marko Mäkelä
              mdcallag Mark Callaghan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.