[MDEV-25214] Server crash in fil_space_t::try_to_close or Assertion `!node->is_open()' failed in fil_node_open_file_low Created: 2021-03-21  Updated: 2022-03-15  Resolved: 2022-03-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6
Fix Version/s: 10.9.0, 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Problem/Incident
is caused by MDEV-23855 InnoDB log checkpointing causes regre... Closed

 Description   

10.6 c22e6eed debug

mysqld: /data/src/10.6/storage/innobase/fil/fil0fil.cc:353: bool fil_node_open_file_low(fil_node_t*): Assertion `!node->is_open()' failed.
210322  2:11:40 [ERROR] mysqld got signal 6 ;
 
#7  0x00007fbd82ab9f36 in __GI___assert_fail (assertion=0x56418b0b4b55 "!node->is_open()", file=0x56418b0b4958 "/data/src/10.6/storage/innobase/fil/fil0fil.cc", line=353, function=0x56418b0b4b68 "bool fil_node_open_file_low(fil_node_t*)") at assert.c:101
#8  0x000056418aa52ebf in fil_node_open_file_low (node=0x7fbd286d9de8) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:353
#9  0x000056418aa537cd in fil_node_open_file (node=0x7fbd286d9de8) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:445
#10 0x000056418aa54934 in fil_space_t::prepare (this=0x7fbd28128fe8, have_mutex=true) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:654
#11 0x000056418aa6e192 in fil_space_t::next (space=0x7fbd28128fe8, recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1513
#12 0x000056418aa6af47 in fil_crypt_find_space_to_rotate (key_state=0x7fbd4f7fdd54, state=0x7fbd4f7fdd60, recheck=0x7fbd4f7fdd53) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1586
#13 0x000056418aa6c738 in fil_crypt_thread () at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:2093
#14 0x000056418a9f2447 in std::__invoke_impl<void, void (*)()> (__f=@0x56418da145c8: 0x56418aa6c658 <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:60
#15 0x000056418a9f23f3 in std::__invoke<void (*)()> (__fn=@0x56418da145c8: 0x56418aa6c658 <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:95
#16 0x000056418a9f2394 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56418da145c8) at /usr/include/c++/9/thread:244
#17 0x000056418a9f2366 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56418da145c8) at /usr/include/c++/9/thread:251
#18 0x000056418a9f2346 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56418da145c0) at /usr/include/c++/9/thread:195
#19 0x00007fbd82eb7d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007fbd82fd1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x00007fbd82ba5293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6 c22e6eed non-debug

#3  <signal handler called>
#4  fil_space_t::try_to_close (print_info=false) at /data/src/10.6/storage/innobase/include/os0file.h:103
#5  0x000055efe8dd8b8f in fil_node_open_file (node=0x7faba42c2138) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:426
#6  0x000055efe860c524 in fil_space_t::prepare (this=this@entry=0x7faba42bfdf8, have_mutex=have_mutex@entry=true) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:658
#7  0x000055efe860f2c1 in fil_space_t::next (encrypt=true, recheck=<optimized out>, space=0x7faba42bfdf8) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1556
#8  fil_crypt_find_space_to_rotate (recheck=<synthetic pointer>, state=0x7fabbfffe950, key_state=<synthetic pointer>) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1586
#9  fil_crypt_thread () at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:2093
#10 0x00007fabf27b2d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007fabf28ce609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007fabf24bd293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

To reproduce:

git clone https://github.com/MariaDB/randgen --branch mdev25214 rqg-25214
cd rqg-25214
. ./mdev25214.cmd <basedir>

Can't reproduce on 10.5.
Can't reproduce with rr.
Coredump is available.



 Comments   
Comment by Elena Stepanova [ 2021-06-28 ]

The assertion has changed, but the failure is still reproducible on current 10.6:

10.6 891a927e

mysqld: /data/src/10.6/storage/innobase/fil/fil0fil.cc:89: static bool fil_space_t::try_to_close(bool): Assertion `node' failed.
210628 17:50:02 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f8b0b0f5f36 in __GI___assert_fail (assertion=0x56367d72c64c "node", file=0x56367d72c5f0 "/data/src/10.6/storage/innobase/fil/fil0fil.cc", line=89, function=0x56367d72c620 "static bool fil_space_t::try_to_close(bool)") at assert.c:101
#8  0x000056367d0c30b0 in fil_space_t::try_to_close (print_info=false) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:89
#9  0x000056367d0c423e in fil_node_open_file (node=0x7f8ab01b3ea8) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:423
#10 0x000056367d0c54ec in fil_space_t::prepare (this=0x7f8ab01b3d38, have_mutex=true) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:651
#11 0x000056367d0dd645 in fil_space_t::acquire_if_not_stopped (this=0x7f8ab01b3d38) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1406
#12 0x000056367d0dd981 in fil_system_t::default_encrypt_next (this=0x56367e7a2cc0 <fil_system>, space=0x7f8ab01b3d38, recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1473
#13 0x000056367d0d9ff9 in fil_space_t::next (space=..., recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1499
#14 0x000056367d0da51b in fil_crypt_find_space_to_rotate (key_state=0x7f8ae57fbcd4, state=0x7f8ae57fbce0, recheck=0x7f8ae57fbcc7) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1598
#15 0x000056367d0dbe2a in fil_crypt_thread () at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:2101
#16 0x000056367d064017 in std::__invoke_impl<void, void (*)()> (__f=@0x56367f9d49a8: 0x56367d0dbd3b <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:60
#17 0x000056367d063fc3 in std::__invoke<void (*)()> (__fn=@0x56367f9d49a8: 0x56367d0dbd3b <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:95
#18 0x000056367d063f64 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56367f9d49a8) at /usr/include/c++/9/thread:244
#19 0x000056367d063f36 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56367f9d49a8) at /usr/include/c++/9/thread:251
#20 0x000056367d063f16 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56367f9d49a0) at /usr/include/c++/9/thread:195

Comment by Elena Stepanova [ 2021-06-28 ]

Setting tentatively to a regression, as the assertions exist in 10.5 code, but the failures don't seem to be reproducible on 10.5.

Comment by Marko Mäkelä [ 2022-03-15 ]

I can repeat both assertions in 10.5 with the MDEV-27985 fix, by repeatedly running the test innodb.table_definition_cache_debug. The more frequent assertion is fixed by the following:

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index 39049f237e1..97ca8c2f395 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -454,6 +454,8 @@ static bool fil_node_open_file(fil_node_t *node)
       /* Flush tablespaces so that we can close modified files. */
       fil_flush_file_spaces();
       mutex_enter(&fil_system.mutex);
+      if (node->is_open())
+        return true;
     }
   }
 

This one is much harder to hit:

#7  0x00007f0dbd9b3222 in __GI___assert_fail (assertion=0x5589702bdab4 "node", file=0x5589702be0f0 "/mariadb/10.5m/storage/innobase/fil/fil0fil.cc", line=92, function=0x5589702be580 "static bool fil_space_t::try_to_close(bool)") at assert.c:101
#8  0x000055896fef7576 in fil_space_t::try_to_close (print_info=print_info@entry=false) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:92
#9  0x000055896fefbd4b in fil_node_open_file (node=node@entry=0x7f0d606d7ac0) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:440
#10 0x000055896fefbf2a in fil_space_t::prepare (this=this@entry=0x7f0d6076b290, have_mutex=have_mutex@entry=false) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:676
#11 0x000055896fefc3cc in fil_space_t::get (id=id@entry=52662) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:1487
#12 0x000055896feb3a06 in buf_flush_space (id=id@entry=52662) at /mariadb/10.5m/storage/innobase/buf/buf0flu.cc:1231
#13 0x000055896feb44f2 in buf_do_flush_list_batch (max_n=max_n@entry=1080, lsn=lsn@entry=18446744073709551615) at /mariadb/10.5m/storage/innobase/buf/buf0flu.cc:1482

I think that this trouble was caused by the replacement of the fil_system.LRU list in MDEV-23855.

Comment by Marko Mäkelä [ 2022-03-15 ]

Finally, I got an rr replay stack trace for the ut_ad(node) assertion. The tablespace is only being created at that point of time.

#0  0x000055869d488151 in TTASEventMutex<GenericPolicy>::exit (this=this@entry=0x55869e09fa20 <fil_system+32>) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:428
#1  0x000055869d489520 in PolicyMutex<TTASEventMutex<GenericPolicy> >::exit (this=this@entry=0x55869e09fa20 <fil_system+32>) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:564
#2  0x000055869d6b4143 in fil_space_t::create (name=<optimized out>, name@entry=0x7f962c314450 "test/t_571", id=<optimized out>, id@entry=435, flags=flags@entry=21, purpose=purpose@entry=FIL_TYPE_TABLESPACE, 
    crypt_data=crypt_data@entry=0x0, mode=mode@entry=FIL_ENCRYPTION_DEFAULT) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:1039
#3  0x000055869d6b48a4 in fil_ibd_create (space_id=435, name=0x7f962c314450 "test/t_571", path=<optimized out>, path@entry=0x7f962c31c1c0 "./test/t_571.ibd", flags=flags@entry=21, size=size@entry=4, 
    mode=FIL_ENCRYPTION_DEFAULT, key_id=1, err=0x1f8639cd15d4) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:2398

At the time of the crash, this thread is trying to add the data file to the tablespace object:

#4  0x000055869d489712 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=this@entry=0x55869e09fa20 <fil_system+32>, n_spins=30, n_delay=4, name=name@entry=0x55869da790f0 "/mariadb/10.5m/storage/innobase/fil/fil0fil.cc", line=line@entry=344) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:588
#5  0x000055869d6b2e18 in fil_space_t::add (this=this@entry=0x7f962c31baf0, name=0x7f962c31c1c0 "./test/t_571.ibd", handle=<optimized out>, handle@entry={m_file = 37}, size=size@entry=4, is_raw=is_raw@entry=false, atomic_write=atomic_write@entry=true, max_pages=4294967295) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:344
#6  0x000055869d6b48e2 in fil_ibd_create (space_id=435, name=0x7f962c314450 "test/t_571", path=<optimized out>, path@entry=0x7f962c31c1c0 "./test/t_571.ibd", flags=<optimized out>, flags@entry=21, size=size@entry=4, mode=FIL_ENCRYPTION_DEFAULT, key_id=1, err=0x1f8639cd15d4) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:2402

Generated at Thu Feb 08 09:36:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.