Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6
-
None
Description
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
void trx_t::commit(std::vector<pfs_os_file_t> &deleted) |
{
|
...
|
for (const auto &p : mod_tables) |
{
|
if (p.second.is_dropped()) |
{
|
...
|
if (const auto id= space ? space->id : 0) |
{
|
pfs_os_file_t d= fil_delete_tablespace(id);
|
if (d != OS_FILE_CLOSED) |
deleted.emplace_back(d);
|
}
|
}
|
}
|
...
|
}
|
and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array:
int ha_innobase::delete_table(const char *name) |
{
|
...
|
std::vector<pfs_os_file_t> deleted;
|
trx->commit(deleted);
|
...
|
row_mysql_unlock_data_dictionary(trx);
|
for (pfs_os_file_t d : deleted) |
os_file_close(d);
|
...
|
}
|
Consider fil_delete_tablespace() function, which returns file handles for "delete" array:
pfs_os_file_t fil_delete_tablespace(ulint id)
|
{
|
ut_ad(!is_system_tablespace(id));
|
pfs_os_file_t handle= OS_FILE_CLOSED;
|
if (fil_space_t *space= fil_space_t::check_pending_operations(id)) |
{
|
/* Before deleting the file(s), persistently write a log record. */ |
mtr_t mtr;
|
mtr.start();
|
mtr.log_file_op(FILE_DELETE, id, space->chain.start->name);
|
handle= space->chain.start->handle;
|
mtr.commit_file(*space, nullptr);
|
|
fil_space_free_low(space);
|
}
|
|
ibuf_delete_for_discarded_space(id);
|
return handle; |
}
|
fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack:
#1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452
|
#2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453
|
#3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false)
|
at ./storage/innobase/fil/fil0fil.cc:124
|
#4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422
|
#5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0)
|
at ./storage/innobase/fil/fil0fil.cc:656
|
#6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482
|
#7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186
|
#8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0)
|
at ./storage/innobase/buf/buf0flu.cc:1293
|
#9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0)
|
at ./storage/innobase/buf/buf0flu.cc:1362
|
#10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708
|
#11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310
|
So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable.
fil_space_t::try_to_close() is executed under fil_system.mutex. And mtr_t::commit_file() locks it for fil_system_t::detach() call. fil_system_t::detach() returns detached file handle if its argument detach_handle is true. The fix is to let mtr_t::commit_file() to pass that detached file handle to fil_delete_tablespace().
Attachments
Issue Links
- relates to
-
MDEV-30829 InnoDB: Cannot close file <tablename>.ibd because of pending fsync
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Summary | fil_delete_tablespace() deallocates space, but does not detach it from fil_system.* lists | fil_delete_tablespace() returns wrong file handle if tablespace was closed by parrallel thread |
Description |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. fil_space_free_low() deallocates tablespace, but does not remove it from fil_system.space_list. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} but deallocated spaces with closed file handles still stay in fil_system.space_list. I think the fix must be in invoking fil_system_t::detach() before fil_space_free_low(space) call in fil_delete_tablespace(), or in mtr_t::commit_file(fil_space_t &space, const char *name) if name != nullptr. |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. Proposed fix: |
Description |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. Proposed fix: |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. Proposed fix: {code:java} --- a/storage/innobase/fil/fil0fil.cc +++ b/storage/innobase/fil/fil0fil.cc @@ -1669,21 +1669,20 @@ void fil_close_tablespace(ulint id) pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); - pfs_os_file_t handle= OS_FILE_CLOSED; - if (fil_space_t *space= fil_space_t::check_pending_operations(id)) + fil_space_t *space= fil_space_t::check_pending_operations(id); + if (space) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); - handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); - return handle; + return space ? space->chain.start->handle : pfs_os_file_t(OS_FILE_CLOSED); } /*******************************************************************//** {code} |
Description |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. Proposed fix: {code:java} --- a/storage/innobase/fil/fil0fil.cc +++ b/storage/innobase/fil/fil0fil.cc @@ -1669,21 +1669,20 @@ void fil_close_tablespace(ulint id) pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); - pfs_os_file_t handle= OS_FILE_CLOSED; - if (fil_space_t *space= fil_space_t::check_pending_operations(id)) + fil_space_t *space= fil_space_t::check_pending_operations(id); + if (space) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); - handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); - return handle; + return space ? space->chain.start->handle : pfs_os_file_t(OS_FILE_CLOSED); } /*******************************************************************//** {code} |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. fil_space_t::try_to_close() is executed under fil_system.mutex. And mtr_t::commit_file() locks it for fil_system_t::detach() call. After mtr_t::commit_file() call the tablespace must be detached, i.e. it must not present in any lists, and another thread can't change space->chain.start->handle . So it's safe to return space->chain.start->handle after mtr.commit_file() call in fil_delete_tablespace() . Proposed fix: {code:java} --- a/storage/innobase/fil/fil0fil.cc +++ b/storage/innobase/fil/fil0fil.cc @@ -1669,21 +1669,20 @@ void fil_close_tablespace(ulint id) pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); - pfs_os_file_t handle= OS_FILE_CLOSED; - if (fil_space_t *space= fil_space_t::check_pending_operations(id)) + fil_space_t *space= fil_space_t::check_pending_operations(id); + if (space) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); - handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); - return handle; + return space ? space->chain.start->handle : pfs_os_file_t(OS_FILE_CLOSED); } /*******************************************************************//** {code} |
Description |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. fil_space_t::try_to_close() is executed under fil_system.mutex. And mtr_t::commit_file() locks it for fil_system_t::detach() call. After mtr_t::commit_file() call the tablespace must be detached, i.e. it must not present in any lists, and another thread can't change space->chain.start->handle . So it's safe to return space->chain.start->handle after mtr.commit_file() call in fil_delete_tablespace() . Proposed fix: {code:java} --- a/storage/innobase/fil/fil0fil.cc +++ b/storage/innobase/fil/fil0fil.cc @@ -1669,21 +1669,20 @@ void fil_close_tablespace(ulint id) pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); - pfs_os_file_t handle= OS_FILE_CLOSED; - if (fil_space_t *space= fil_space_t::check_pending_operations(id)) + fil_space_t *space= fil_space_t::check_pending_operations(id); + if (space) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); - handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); - return handle; + return space ? space->chain.start->handle : pfs_os_file_t(OS_FILE_CLOSED); } /*******************************************************************//** {code} |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. fil_space_t::try_to_close() is executed under fil_system.mutex. And mtr_t::commit_file() locks it for fil_system_t::detach() call. After mtr_t::commit_file() call the tablespace must be detached, i.e. it must not present in any lists, and another thread can't change space->chain.start->handle . So it's safe to return space->chain.start->handle after mtr.commit_file() call in fil_delete_tablespace() . |
Description |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. fil_space_t::try_to_close() is executed under fil_system.mutex. And mtr_t::commit_file() locks it for fil_system_t::detach() call. After mtr_t::commit_file() call the tablespace must be detached, i.e. it must not present in any lists, and another thread can't change space->chain.start->handle . So it's safe to return space->chain.start->handle after mtr.commit_file() call in fil_delete_tablespace() . |
trx_t::commit(std::vector<pfs_os_file_t> &deleted) invokes fil_delete_tablespace()->fil_space_free_low() for each modified space:
{code:java} void trx_t::commit(std::vector<pfs_os_file_t> &deleted) { ... for (const auto &p : mod_tables) { if (p.second.is_dropped()) { ... if (const auto id= space ? space->id : 0) { pfs_os_file_t d= fil_delete_tablespace(id); if (d != OS_FILE_CLOSED) deleted.emplace_back(d); } } } ... } {code} and collects file handles in "deleted " array. Then ha_innobase::delete_table() closes files handles in "deleted" array: {code:java} int ha_innobase::delete_table(const char *name) { ... std::vector<pfs_os_file_t> deleted; trx->commit(deleted); ... row_mysql_unlock_data_dictionary(trx); for (pfs_os_file_t d : deleted) os_file_close(d); ... } {code} Consider fil_delete_tablespace() function, which returns file handles for "delete" array: {code:java} pfs_os_file_t fil_delete_tablespace(ulint id) { ut_ad(!is_system_tablespace(id)); pfs_os_file_t handle= OS_FILE_CLOSED; if (fil_space_t *space= fil_space_t::check_pending_operations(id)) { /* Before deleting the file(s), persistently write a log record. */ mtr_t mtr; mtr.start(); mtr.log_file_op(FILE_DELETE, id, space->chain.start->name); handle= space->chain.start->handle; mtr.commit_file(*space, nullptr); fil_space_free_low(space); } ibuf_delete_for_discarded_space(id); return handle; } {code} fil_system_t::detach() is invoked from mtr_t::commit_file(). But during fil_delete_tablespace() execution buf_do_LRU_batch() can close the tablespace between "handle= space->chain.start->handle;" and "return handle;" lines. It can do this with the following stack: {noformat} #1 0x000055fd4a0caf59 in os_file_close_func (file=15) at ./storage/innobase/os/os0file.cc:1452 #2 0x000055fd4a319d0d in fil_node_t::close (this=0x5c8e34110e60) at ./storage/innobase/fil/fil0fil.cc:453 #3 0x000055fd4a318e20 in fil_space_t::try_to_close (print_info=false) at ./storage/innobase/fil/fil0fil.cc:124 #4 0x000055fd4a319bf9 in fil_node_open_file (node=0x5c8e34037e30) at ./storage/innobase/fil/fil0fil.cc:422 #5 0x000055fd4a31adb2 in fil_space_t::prepare_acquired (this=0x5c8e34037cf0) at ./storage/innobase/fil/fil0fil.cc:656 #6 0x000055fd4a31e907 in fil_space_t::get (id=67) at ./storage/innobase/fil/fil0fil.cc:1482 #7 0x000055fd4a2af3a4 in buf_flush_space (id=67) at ./storage/innobase/buf/buf0flu.cc:1186 #8 0x000055fd4a2afb72 in buf_flush_LRU_list_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1293 #9 0x000055fd4a2b0002 in buf_do_LRU_batch (max=2000, evict=false, n=0x326e0a09cbf0) at ./storage/innobase/buf/buf0flu.cc:1362 #10 0x000055fd4a2b1597 in buf_flush_LRU (max_n=2000, evict=false) at ./storage/innobase/buf/buf0flu.cc:1708 #11 0x000055fd4a2b441f in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2310 {noformat} So "space->chain.start->handle" can be set to -1 in parallel thread, but fil_delete_tablespace() returns the old value, saved in local "handle" variable. fil_space_t::try_to_close() is executed under fil_system.mutex. And mtr_t::commit_file() locks it for fil_system_t::detach() call. fil_system_t::detach() returns detached file handle if its argument detach_handle is true. The fix is to let mtr_t::commit_file() to pass that detached file handle to fil_delete_tablespace(). |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.11 [ 27614 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Summary | fil_delete_tablespace() returns wrong file handle if tablespace was closed by parrallel thread | fil_delete_tablespace() returns wrong file handle if tablespace was closed by parallel thread |
Fix Version/s | 10.7 [ 24805 ] |
Fix Version/s | 10.6.13 [ 28514 ] | |
Fix Version/s | 10.9.6 [ 28520 ] | |
Fix Version/s | 10.10.4 [ 28522 ] | |
Fix Version/s | 10.11.3 [ 28524 ] | |
Fix Version/s | 10.8.8 [ 28518 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue relates to |