Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33545

Perf regression from removing innodb_flush_method=O_DIRECT_NO_FSYNC

    XMLWordPrintable

Details

    Description

      Back when I did web-scale InnoDB I always set innodb_flush_method to O_DIRECT_NO_FSYNC. At one point we had a bug in the FB patch for MySQL similar to the bug described in this comment, but we fixed that and upstream MySQL has a correct implementation of it.

      MariaDB docs for innodb_flush_method now includes this claim, and this claims is news to me. Do you have more detail as to why O_DIRECT_NO_FSYNC isn't good with XFS?
      "Not suitable for XFS filesystems."

      Finally, this is an example of a performance regression from not having O_DIRECT_NO_FSYNC. I ran subset of the insert benchmark and then timed how long it took for MySQL to shutdown.

      The server in this case is a 32-core (yes, real cores) AMD with 128G RAM, Ubuntu 22.04, XFS and SW RAID 10 across 2 NVMe devices. The my.cnf files are here for MariaDB 10.11, for MariaDB 11.4 and for MySQL 8.0.36

      From the results below, the shutdown is much faster with O_DIRECT_NO_FSYNC for the dbms that support it (MariaDB 10.11, MySQL 8.0.36).

      {{configuration:

      • a - innodb_flush_method=O_DIRECT_NO_FSYNC
      • b - innodb_flush_method=O_DIRECT (or equivalent)
      • c - innodb_flush_method=fsync (or equivalent)

      dbms:

      • 10.11.7, 11.4.1 - MariaDB with InnoDB
      • 8.0.36 - MySQL with InnoDB

      This is from 24 clients, 20M rows/table, table/client == 480M rows

      Numbers are seconds for shutdown
      configuration
      dbms a b c
      10.11.7 117 454 1247
      11.4.1 529 1259
      8.0.36 76 2667 3390

      This is from 24 clients, 10M rows/table, table/client == 240M rows
      dbms a b c
      10.11.7 87 254 732
      11.4.1 258 730
      8.0.36 37 1972 2478}}

      If I look at PMP call stacks during the shutdown, this is a common stack with MariaDB 11.4.1 showing that things are frequently waiting on the fsync for the doublewrite buffer:
      __GI_fdatasync,os_file_flush_func(int),fil_space_t::flush_low(),fil_flush_file_spaces(),buf_dblwr_t::write_completed(),buf_page_write_complete(IORequest,IORequest::write_complete(int),write_io_callback(void*),tpool::task_group::execute(tpool::task*),tpool::thread_pool_generic::worker_main(tpool::worker_data*),??,start_thread,clone3

      When I look at stack traces for configs a and b (a=O_DIRECT_NO_FSYNC, b=O_DIRECT) for MySQL 8.0.36, then I also see many stack traces where things are waiting on the doublewritebuffer fsync.

      _GI_fsync,os_file_fsync_posix,os_file_flush_func,pfs_os_file_flush_func,Fil_shard::space_flush,Fil_shard::flush_file_spaces,Fil_system::flush_file_spaces,Double_write::write_complete,buf_page_io_complete,fil_aio_wait,io_handler_thread,std::invoke_impl<void,,std::invoke<void,std::_Bind<void,std::_Bind<void,Detached_thread::operator()<void,std::invoke_impl<void,,std::_invoke<Detached_thread,,std::thread::_Invoker<std::tuple<Detached_thread,,std::thread::_Invoker<std::tuple<Detached_thread,,std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread,,??,start_thread,clone3

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              mdcallag Mark Callaghan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.