[MDEV-31049] fil_delete_tablespace() returns wrong file handle if tablespace was closed by parallel thread Created: 2023-04-13  Updated: 2023-08-04  Resolved: 2023-04-14

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.11.3, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Major
Reporter: Vladislav Lesin Assignee: Vladislav Lesin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-30829 InnoDB: Cannot close file <tablename>... Closed

 Description   

trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:

void trx_t::commit(std::vector<pfs_os_file_t> &deleted)                                                                           
{                                                                               
...                                                                             
    for (const auto &p : mod_tables)                                            
    {                                                                           
      if (p.second.is_dropped())                                                
      {                                                                         
      ...                                                                       
        if (const auto id= space ? space->id : 0)                               
        {                                                                       
          pfs_os_file_t d= fil_delete_tablespace(id);                           
          if (d != OS_FILE_CLOSED)                                              
            deleted.emplace_back(d);                                            
        }                                                                       
      }                                                                         
    }                                                                           
...                                                                             
}

and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array:

int ha_innobase::delete_table(const char *name)                                                                        
{                                                                               
...                                                                                                      
  std::vector<pfs_os_file_t> deleted;                                           
  trx->commit(deleted);                                                         
...                                                                             
  row_mysql_unlock_data_dictionary(trx);                                        
  for (pfs_os_file_t d : deleted)                                               
    os_file_close(d);                                                           
...                                                                             
}

Consider fil_delete_tablespace() function, which returns file handles for "delete" array:

pfs_os_file_t fil_delete_tablespace(ulint id)                                   
{                                                                               
  ut_ad(!is_system_tablespace(id));                                             
  pfs_os_file_t handle= OS_FILE_CLOSED;                                         
  if (fil_space_t *space= fil_space_t::check_pending_operations(id))            
  {                                                                             
    /* Before deleting the file(s), persistently write a log record. */         
    mtr_t mtr;                                                                  
    mtr.start();                                                                
    mtr.log_file_op(FILE_DELETE, id, space->chain.start->name);                 
    handle= space->chain.start->handle;                                        
    mtr.commit_file(*space, nullptr);                                           
                                                                                
    fil_space_free_low(space);                                                  
  }                                                                             
                                                                                
  ibuf_delete_for_discarded_space(id);                                          
  return handle;                                                                
}

fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack:

#1  0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452
#2  0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453
#3  0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false)          
    at ./storage/innobase/fil/fil0fil.cc:124                                    
#4  0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422
#5  0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0)   
    at ./storage/innobase/fil/fil0fil.cc:656                                    
#6  0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482
#7  0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186
#8  0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0)
    at ./storage/innobase/buf/buf0flu.cc:1293                                   
#9  0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0)
    at ./storage/innobase/buf/buf0flu.cc:1362                                   
#10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708
#11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310

So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable.

fil_space_t::try_to_close() is executed under fil_system.mutex. And mtr_t::commit_file() locks it for fil_system_t::detach() call. fil_system_t::detach() returns detached file handle if its argument detach_handle is true. The fix is to let mtr_t::commit_file() to pass that detached file handle to fil_delete_tablespace().


Generated at Thu Feb 08 10:20:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.