[MDEV-24612] innodb hangs if it's initialization is broken before encryption threads are started Created: 2021-01-18  Updated: 2021-01-28  Resolved: 2021-01-19

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: None
Fix Version/s: 10.5.9, 10.6.0

Type: Bug Priority: Major
Reporter: Vladislav Lesin Assignee: Vladislav Lesin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-18976 Implement a CHECKSUM redo log record ... Closed

 Description   

If innodb can't be initialized for some reason innodb_init() invokes srv_shutdown_threads(), which sets srv_shutdown_state = SRV_SHUTDOWN_EXIT_THREADS or innodb_preshutdown() which sets srv_shutdown_state = SRV_SHUTDOWN_INITIATED. The call stack for 10.5 is the following:

▾ fil_crypt_threads_init                                                        
  ▾ fil_crypt_set_thread_cnt                                                    
    ▾ srv_shutdown_threads                                                      
      ▾ srv_init_abort_low                                                      
        ▾ srv_start                                                             
         ▸ innodb_init                                                          
    ▾ innodb_preshutdown                                                        
      • innodb_init     

fil_crypt_set_thread_cnt() invokes fil_crypt_threads_init(), which in turns, invokes fil_crypt_set_thread_cnt(srv_n_fil_crypt_threads) again to start srv_n_fil_crypt_threads threads. Encryption threads are terminated if srv_shutdown_state != SRV_SHUTDOWN_NONE(see fil_crypt_thread()), and fil_crypt_set_thread_cnt() waits while the threads are started infinitely:

#0  0x00007ffff6e13065 in futex_abstimed_wait_cancelable (
    private=<optimized out>, abstime=0x7fffffff7f40, expected=0, 
    futex_word=0x5555586e6f80)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x7fffffff7f40, mutex=0x5555586e6f30, 
    cond=0x5555586e6f58) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x5555586e6f58, mutex=0x5555586e6f30, 
    abstime=0x7fffffff7f40) at pthread_cond_wait.c:667
#3  0x000055555673326a in os_event::timed_wait (this=0x5555586e6f18, 
    abstime=0x7fffffff7f40)
    at ./storage/innobase/os/os0event.cc:275
#4  0x000055555673352a in os_event::wait_time_low (this=0x5555586e6f18, 
    time_in_usec=100000, reset_sig_count=9)
    at ./storage/innobase/os/os0event.cc:385
#5  0x000055555673371e in os_event_wait_time_low (event=0x5555586e6f18, 
    time_in_usec=100000, reset_sig_count=0)
    at ./storage/innobase/os/os0event.cc:485
#6  0x00005555569d91de in fil_crypt_set_thread_cnt (new_cnt=4)
    at ./storage/innobase/fil/fil0crypt.cc:2242
#7  0x00005555569d9604 in fil_crypt_threads_init ()
    at ./storage/innobase/fil/fil0crypt.cc:2362
#8  0x00005555569d9003 in fil_crypt_set_thread_cnt (new_cnt=0)
    at ./storage/innobase/fil/fil0crypt.cc:2219
#9  0x0000555556857009 in srv_shutdown_threads ()
    at ./storage/innobase/srv/srv0start.cc:839
#10 0x000055555685724d in srv_init_abort_low (create_new_db=false, 
    file=0x555556ff09b0 "./storage/innobase/srv/srv0start.cc", line=1495, err=DB_CORRUPTION)
    at ./storage/innobase/srv/srv0start.cc:887
#11 0x00005555568590cf in srv_start (create_new_db=false)
    at ./storage/innobase/srv/srv0start.cc:1495
#12 0x000055555661f491 in innodb_init (p=0x555558542918)

How to repeat:
Cause srv_init_abort_low() call from srv_start().

How to fix:
Do not init encryption threads if shutdown is in progress:

--- a/storage/innobase/fil/fil0crypt.cc
+++ b/storage/innobase/fil/fil0crypt.cc
@@ -2216,6 +2216,8 @@ fil_crypt_set_thread_cnt(
        const uint      new_cnt)
 {
        if (!fil_crypt_threads_inited) {
+               if (srv_shutdown_state != SRV_SHUTDOWN_NONE)
+                       return;
                fil_crypt_threads_init();
        }



 Comments   
Comment by Marko Mäkelä [ 2021-01-19 ]

Thank you, this is probably the simplest fix to the problem.

Generated at Thu Feb 08 09:31:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.