InnoDB creates a large number of threads that are specializing on a single task. This makes debugging hard, because core dumps contain stack traces for a large number of threads. It also causes unnecessary thread stack allocation and increases the complexity of scheduling threads. Many of the threads are waking up periodically, polling for work(for those, we can introduce a timer task , for example OS timers would submit work to common pool). A lot of CPU and context switching is nowadays spent on "coordinator" threads(purge, page-cleaner).
We should make InnoDB use a pool of threads, and scale the size of this pool based on the workload. There should be a common work queue for all the threads.
All of the following background threads would be replaced by the common thread pool, listed in roughly descending order of impact/difficulty ratio:
- buf_flush_page_cleaner_worker,buf_flush_page_cleaner_coordinator (only one after MDEV-15058)
- recv_writer_thread (a special "page cleaner" during redo log apply; triggered by buffer pool LRU)
- fil_crypt_thread (needs to be rewritten to use a queue of tablespaces that need key rotation)
- buf_dump_thread (triggered by SET GLOBAL innodb_buffer_pool_(dump|load)_(abort|now))
- srv_purge_coordinator_thread, srv_worker_thread (see also MDEV-16260; work added by transaction commit)
- trx_rollback_all_recovered (any work is submitted at InnoDB startup)
- log_scrub_thread (can probably be removed in MDEV-14425)
- dict_stats_thread (work submitted by dict_stats_update_if_needed() and for defragmentation, btr_page_split_and_insert())
- btr_defragment_thread (work submitted by btr_defragment_add_index() in OPTIMIZE TABLE)
- buf_resize_thread (work initiated by SET GLOBAL innodb_buffer_pool_size)
- fts_optimize_thread (work initiated by fts_optimize_add_table() on DDL or when loading table definition)
- fts_parallel_tokenization, fts_parallel_merge (should be generalized to allow parallel execution of multiple ADD INDEX for any ALTER TABLE; work added by ALTER TABLE)
Some of the following might still need dedicated threads:
We should implement native asynchronous I/O on BSD systems using kevent(), and remove the support for simulated asynchronous I/O threads.
Pending read requests can be directly waited for by buf_page_get_gen(). If read-ahead is desired, that can be implemented by adding a read completion request when handling the I/O completion.
Threadpool is capable of
- submitting tasks (task is void function with void * parameter).
- submitting asynchronous io on files and executing callbacks on io completion
- timers (execute callback in the future)
- create_background_thd() to create a true background THD which is not counted, neither can be seen in SHOW PROCESSLIS, nor they would make server hang in close_connections() when they are not freed. These background THDs are to be used to purge tasks.
- a "preshutdown" method in handler, to be calledafter connections are gone, but before plugins are shut down.
This is used by Innodb for things that were done in thd_destructor_thread previously (stop purge and FTS optimize)
The "ticker" (srv_master_thread, lock_wait_timeout_thread, srv_error_monitor_thread,srv_monitor_thread) threads are mapped to periodic timers.
IO handler threads are gone, substituted with thread_pool::submit_io() and passing the callback on completion.
However., innodb_io_read_threads and innodb_io_write_threads parameters are still used, to limit concurrency of
IO inside the threadpool. In addition, these parameters are used to calculate io_setup() parameter on Linux , and for sizing IO control block caches
Al others threads with exception of buf_flush_page_cleaner_coordinator, recv_writer_thread, fil_crypt_thread, log_scrub_thread are gone and replaced by either tasks, timers or, as in case of purge threads, with combination of tasks and timers . The purge coordinator has idle state, where it sleeps a little and rechecks if work is still there, and for that timer was used.
Purge preallocates/caches background THDs, and purge task attach these THDs when they start, and detach when they are finished.
Sometimes there were threads that did fork/join type of work (fts_parallel..., purge), where one tasks waits for others to complete, for that special "waitable" tasks were used.
Except AIO, there were no big changes in existing logic . Some things can be improved and simplified later. The limits for different kind of tasks are still in place, i.e innodb_purge_threads are still there, only that they limit concurrency of a specific task.