[MDEV-31256] fil_node_open_file() releases fil_system.mutex allowing other thread to open its file node Created: 2023-05-12  Updated: 2023-11-10  Resolved: 2023-05-19

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6
Fix Version/s: 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2

Type: Bug Priority: Major
Reporter: Vladislav Lesin Assignee: Vladislav Lesin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-31595 Using MariaDB with Nextcloud on docke... Open

 Description   

There is room between mutex_exit(&fil_system.mutex) and mutex_enter(&fil_system.mutex) calls in the following function.

static bool fil_node_open_file(fil_node_t *node)                                                                                  
{                                                                               
  ...                                                                           
  for (ulint count= 0; fil_system.n_open >= srv_max_n_open_files; count++)      
  {                                                                             
    if (fil_space_t::try_to_close(count > 1))                                   
      count= 0;                                                                 
    else if (count >= 2)                                                        
    {                                                                           
    ...                                                                         
    }                                                                           
    else                                                                        
    {                                                                           
      mutex_exit(&fil_system.mutex);                                            
      os_thread_sleep(20000);                                                   
      /* Flush tablespaces so that we can close modified files. */              
      fil_flush_file_spaces();                                                  
      mutex_enter(&fil_system.mutex);                                           
    }                                                                           
  }                                                                             
                                                                                
  return fil_node_open_file_low(node);                                          
}

During this room another thread can open the node, and ut_ad(!node->is_open()) assertion in fil_node_open_file_low() can fail.



 Comments   
Comment by Marko Mäkelä [ 2023-05-12 ]

Related to this, it could be a good idea to make the fil_node_t creation and file handle assignment atomic with the fil_space_t object creation. MDEV-31080 in 10.6+ took a step toward that. There is also some code for checking if fil_space_t::chain is empty. That could be removed if we made everything atomic within the critical section of fil_system.mutex. When initially opening the .ibd file, we could open the file handle upfront, then acquire fil_system.mutex and create the objects.

For historical reasons (poorly chosen abstraction that was fixed in MDEV-12266, and clumsy usage of Windows asynchronous I/O that was fixed in MariaDB a long time ago), InnoDB data files are being opened and closed twice. The Datafile and Tablespace classes should be removed.

Comment by Matthias Leich [ 2023-05-12 ]

I hit in some RQG test using innodb_open_files : 10

mysqld: /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/fil/fil0fil.cc:365: bool fil_node_open_file_low(fil_node_t*): Assertion `!node->is_open()' failed.
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f28cd89e859 in __GI_abort () at abort.c:79
#2  0x00007f28cd89e729 in __assert_fail_base (fmt=0x7f28cda34588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561f1cae6d20 "!node->is_open()", 
    file=0x561f1cae6720 "/data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/fil/fil0fil.cc", line=365, function=<optimized out>) at assert.c:92
#3  0x00007f28cd8aff36 in __GI___assert_fail (assertion=0x561f1cae6d20 "!node->is_open()", file=0x561f1cae6720 "/data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/fil/fil0fil.cc", line=365, 
    function=0x561f1cae6d60 "bool fil_node_open_file_low(fil_node_t*)") at assert.c:101
#4  0x0000561f1b9c9026 in fil_node_open_file_low (node=0x607000001910) at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/fil/fil0fil.cc:365
#5  0x0000561f1b9c9f9c in fil_node_open_file (node=0x607000001910) at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/fil/fil0fil.cc:460
#6  0x0000561f1b9cc2da in fil_space_t::prepare (this=0x61500000c880, have_mutex=false) at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/fil/fil0fil.cc:674
#7  0x0000561f1b9d3a05 in fil_space_t::get (id=18) at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/fil/fil0fil.cc:1485
#8  0x0000561f1b8eda5c in buf_load () at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/buf/buf0dump.cc:656
#9  0x0000561f1b8ee08c in buf_dump_load_func () at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/storage/innobase/buf/buf0dump.cc:766
#10 0x0000561f1bb225db in tpool::task::execute (this=0x561f1e5c7be0 <buf_dump_load_task>) at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/tpool/task.cc:52
#11 0x0000561f1bb0e827 in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001fd80) at /data/Server/10.5-enterprise-10.5.15-10-MDEV-28709/tpool/tpool_generic.cc:546
#12 0x0000561f1bb216d1 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=
    @0x604000002028: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x561f1bb0e644 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, 
    __t=@0x604000002020: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73
#13 0x0000561f1bb2147c in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=
    @0x604000002028: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x561f1bb0e644 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>)
    at /usr/include/c++/9/bits/invoke.h:95
#14 0x0000561f1bb2130d in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x604000002018)
    at /usr/include/c++/9/thread:244
#15 0x0000561f1bb21257 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x604000002018)
    at /usr/include/c++/9/thread:251
#16 0x0000561f1bb211bc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (
    this=0x604000002010) at /usr/include/c++/9/thread:195
#17 0x00007f28cdcadde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00007f28cddc8609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x00007f28cd99b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
Error pattern in RQG:
[ 'TBR-1899', 'mysqld: .{1,250}fil0fil.cc.{1,10}: bool fil_node_open_file_low.{1,100}: Assertion \`\!node->is_open\(\)\' failed.+\[ERROR\] mysqld got signal 6.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],

Per Vladislav this error is caused by the current MDEV.

Generated at Thu Feb 08 10:22:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.