Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31095

Create separate tpool thread for async aio

Details

    Description

      Linux mutex does not ensure that mutex are given in a FIFO order, but instead allows 'fast threads' to steal the mutex, which causes other threads to starve.

      This is a problem with tpool as if there is burst of aio_reads, it will create 'nproc*2' threads to handle the request even if one thread could do the job (if the block is on a very fast device or on a memory device). If the file is on a hard disk things may be even worse.

      This can be seen by just starting the server on a data directory with a large 'ib_buffer_pool' file.
      In this case the startup code will on my machine create 72 threads on startup to fill the
      buffer pool, which is not a good idea for most system (especially desktops) as the memory
      used may not be released to the operating system.

      In addition the current code does not honor the variables srv_n_read_io_threads or srv_n_write_io_threads.

      The suggested fix is to use a separate tpool for just async io and use a limited number of threads for this. Here is some suggested code to use as a base:

      --- b/storage/innobase/srv/srv0srv.cc
      +++ b/storage/innobase/srv/srv0srv.cc
      @@ -580,12 +580,13 @@ static void thread_pool_thread_end()
       
       void srv_thread_pool_init()
       {
      +  uint max_threads= srv_n_read_io_threads + srv_n_write_io_threads;
         DBUG_ASSERT(!srv_thread_pool);
       
       #if defined (_WIN32)
      -  srv_thread_pool= tpool::create_thread_pool_win();
      +  srv_thread_pool= tpool::create_thread_pool_win(1, max_threads);
       #else
      -  srv_thread_pool= tpool::create_thread_pool_generic();
      +  srv_thread_pool= tpool::create_thread_pool_generic(1, max_threads);
       #endif
         srv_thread_pool->set_thread_callbacks(thread_pool_thread_init,
                                               thread_pool_thread_end);
      

      Attachments

        Issue Links

          Activity

            wlad created a branch with 3 commits related to this.

            Curiously, this change to make preloading the buffer pool use a single thread would seem to conceptually revert MDEV-26547 and possibly impact MDEV-25417.

            monty, can you please test these changes, one by one? Based on recent experience from MDEV-31343, I expect that the results may vary depending on the GNU libc version or the thread scheduler in the Linux kernel.

            marko Marko Mäkelä added a comment - wlad created a branch with 3 commits related to this. Curiously, this change to make preloading the buffer pool use a single thread would seem to conceptually revert MDEV-26547 and possibly impact MDEV-25417 . monty , can you please test these changes, one by one? Based on recent experience from MDEV-31343 , I expect that the results may vary depending on the GNU libc version or the thread scheduler in the Linux kernel.

            Apparently, the performance did not suffer after using single thread, and was actually slightly improved on 10GB preload I tried. I did perform a performance test, so the claim that it would be better, is founded Conceptually, it is not single as threaded as it used to be. there is a thread that submits async IO, there are threads that handles IO, and there are 1024 async IOs in-flight.

            wlad Vladislav Vaintroub added a comment - Apparently, the performance did not suffer after using single thread, and was actually slightly improved on 10GB preload I tried. I did perform a performance test, so the claim that it would be better, is founded Conceptually, it is not single as threaded as it used to be. there is a thread that submits async IO, there are threads that handles IO, and there are 1024 async IOs in-flight.

            wlad, right, there would be 2 threads involved. Related to this, perhaps you could comment on MDEV-11378. An idea is that for reading pages for applying logs or warming up the buffer pool, it might be better to allocate N contiguous page frame addresses from the buffer pool, and then submit a smaller number of normal or scatter-gather read requests for multiple pages at once.

            marko Marko Mäkelä added a comment - wlad , right, there would be 2 threads involved. Related to this, perhaps you could comment on MDEV-11378 . An idea is that for reading pages for applying logs or warming up the buffer pool, it might be better to allocate N contiguous page frame addresses from the buffer pool, and then submit a smaller number of normal or scatter-gather read requests for multiple pages at once.

            In my limited test, comparing the latest 10.6 with a merge of the branch, I did not observe any noticeable performance impact. I did not test the loading of a buffer pool.

            marko Marko Mäkelä added a comment - In my limited test, comparing the latest 10.6 with a merge of the branch, I did not observe any noticeable performance impact. I did not test the loading of a buffer pool.
            wlad Vladislav Vaintroub added a comment - - edited

            I'm assigning that to me,as tpool is something I'm maintaining.
            Some feedback to the bug report

            • many threads are results of innodb batch processing - rapid async io during bufferpool load, or rapid task submits during recovery.
            • this batch processing is unique to the startup, and does not happen during regular database run.
            • therefore there is no need in maintaining multiple threadpools, as proposed by Monty. It makes sense to make threadpool less eager to create threads exactly when the batch processing is running. multiple threadpools is what we want to get rid of, and we do not really want to deal with cross-threadpool deadlocks or such.
            wlad Vladislav Vaintroub added a comment - - edited I'm assigning that to me,as tpool is something I'm maintaining. Some feedback to the bug report many threads are results of innodb batch processing - rapid async io during bufferpool load, or rapid task submits during recovery. this batch processing is unique to the startup, and does not happen during regular database run. therefore there is no need in maintaining multiple threadpools, as proposed by Monty. It makes sense to make threadpool less eager to create threads exactly when the batch processing is running. multiple threadpools is what we want to get rid of, and we do not really want to deal with cross-threadpool deadlocks or such.

            Fixed "too many threads" by making tpool less eager to create additional threads during Innodb batches of asynchronous work during startup. Loading 10GB buffer pool would now require 6-7 threads overall in tpool, down from previous ncpus*2=32(on my machine).

            Innodb is currently stress-testing tpool during startup and recovery, with load that otherwise would not happen. It did not happen previously either, when recovery or buffer pool load were less multithreaded.

            wlad Vladislav Vaintroub added a comment - Fixed "too many threads" by making tpool less eager to create additional threads during Innodb batches of asynchronous work during startup. Loading 10GB buffer pool would now require 6-7 threads overall in tpool, down from previous ncpus*2=32(on my machine). Innodb is currently stress-testing tpool during startup and recovery, with load that otherwise would not happen. It did not happen previously either, when recovery or buffer pool load were less multithreaded.

            People

              wlad Vladislav Vaintroub
              monty Michael Widenius
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.