[MDEV-16403] Incorrect synchronisation on srv_running Created: 2018-06-05  Updated: 2020-03-08  Resolved: 2020-03-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Sergey Vojtovich Assignee: Sergey Vojtovich
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-16264 Implement a common work queue for Inn... Closed

 Description   

In innodb_init() there's wait for thd_destructor_thread startup:

                mysql_thread_create(thd_destructor_thread_key,
                                    &thd_destructor_thread,
                                    NULL, thd_destructor_proxy, NULL);
                while (!my_atomic_loadptr_explicit(reinterpret_cast<void**>
                                                   (&srv_running),
                                                   MY_MEMORY_ORDER_RELAXED))
                        os_thread_sleep(20);

However if thd_destructor_thread dies before loadptr, this becomes dead loop. E.g. server shutdown during InnoDB initialisation?

Second problem is in innobase_end(). In fact time to time I get crashes here.

                st_my_thread_var* running = reinterpret_cast<st_my_thread_var*>(
                        my_atomic_loadptr_explicit(
                        reinterpret_cast<void**>(&srv_running),
                        MY_MEMORY_ORDER_RELAXED));
                if (!abort_loop && running) {
                        // may be UNINSTALL PLUGIN statement
                        running->abort = 1;
                        mysql_cond_broadcast(running->current_cond);
                }

If thd_destructor_thread dies between atomic load and mysql_cond_broadcast(), it'll attempt to broadcast on destroyed cond via freed "running" pointer.

The problem is much broader though. Background threads creation/destruction synchronisation is in sorry state. So we need to implement some universally applicable framework.



 Comments   
Comment by Sergey Vojtovich [ 2020-03-08 ]

This particular problem was solved by MDEV-16264 by adding handlerton::pre_shutdown() method.
The broader problem must not be applicable anymore as it is hopefully resolved by turning background threads into background tasks (also in the scope of MDEV-16264).

Although there're still 5 types of background threads left: buf_flush_page_cleaner, fil_crypt_thread, recv_writer_thread, trx_rollback_all_recovered, log_scrub_thread.
But I guess they will be turned into background tasks eventually.

Generated at Thu Feb 08 08:28:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.