Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25214

Server crash in fil_space_t::try_to_close or Assertion `!node->is_open()' failed in fil_node_open_file_low

Details

    Description

      10.6 c22e6eed debug

      mysqld: /data/src/10.6/storage/innobase/fil/fil0fil.cc:353: bool fil_node_open_file_low(fil_node_t*): Assertion `!node->is_open()' failed.
      210322  2:11:40 [ERROR] mysqld got signal 6 ;
       
      #7  0x00007fbd82ab9f36 in __GI___assert_fail (assertion=0x56418b0b4b55 "!node->is_open()", file=0x56418b0b4958 "/data/src/10.6/storage/innobase/fil/fil0fil.cc", line=353, function=0x56418b0b4b68 "bool fil_node_open_file_low(fil_node_t*)") at assert.c:101
      #8  0x000056418aa52ebf in fil_node_open_file_low (node=0x7fbd286d9de8) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:353
      #9  0x000056418aa537cd in fil_node_open_file (node=0x7fbd286d9de8) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:445
      #10 0x000056418aa54934 in fil_space_t::prepare (this=0x7fbd28128fe8, have_mutex=true) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:654
      #11 0x000056418aa6e192 in fil_space_t::next (space=0x7fbd28128fe8, recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1513
      #12 0x000056418aa6af47 in fil_crypt_find_space_to_rotate (key_state=0x7fbd4f7fdd54, state=0x7fbd4f7fdd60, recheck=0x7fbd4f7fdd53) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1586
      #13 0x000056418aa6c738 in fil_crypt_thread () at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:2093
      #14 0x000056418a9f2447 in std::__invoke_impl<void, void (*)()> (__f=@0x56418da145c8: 0x56418aa6c658 <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:60
      #15 0x000056418a9f23f3 in std::__invoke<void (*)()> (__fn=@0x56418da145c8: 0x56418aa6c658 <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:95
      #16 0x000056418a9f2394 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56418da145c8) at /usr/include/c++/9/thread:244
      #17 0x000056418a9f2366 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56418da145c8) at /usr/include/c++/9/thread:251
      #18 0x000056418a9f2346 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56418da145c0) at /usr/include/c++/9/thread:195
      #19 0x00007fbd82eb7d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #20 0x00007fbd82fd1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #21 0x00007fbd82ba5293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      10.6 c22e6eed non-debug

      #3  <signal handler called>
      #4  fil_space_t::try_to_close (print_info=false) at /data/src/10.6/storage/innobase/include/os0file.h:103
      #5  0x000055efe8dd8b8f in fil_node_open_file (node=0x7faba42c2138) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:426
      #6  0x000055efe860c524 in fil_space_t::prepare (this=this@entry=0x7faba42bfdf8, have_mutex=have_mutex@entry=true) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:658
      #7  0x000055efe860f2c1 in fil_space_t::next (encrypt=true, recheck=<optimized out>, space=0x7faba42bfdf8) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1556
      #8  fil_crypt_find_space_to_rotate (recheck=<synthetic pointer>, state=0x7fabbfffe950, key_state=<synthetic pointer>) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1586
      #9  fil_crypt_thread () at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:2093
      #10 0x00007fabf27b2d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #11 0x00007fabf28ce609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #12 0x00007fabf24bd293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      To reproduce:

      git clone https://github.com/MariaDB/randgen --branch mdev25214 rqg-25214
      cd rqg-25214
      . ./mdev25214.cmd <basedir>
      

      Can't reproduce on 10.5.
      Can't reproduce with rr.
      Coredump is available.

      Attachments

        Issue Links

          Activity

            The assertion has changed, but the failure is still reproducible on current 10.6:

            10.6 891a927e

            mysqld: /data/src/10.6/storage/innobase/fil/fil0fil.cc:89: static bool fil_space_t::try_to_close(bool): Assertion `node' failed.
            210628 17:50:02 [ERROR] mysqld got signal 6 ;
             
            #7  0x00007f8b0b0f5f36 in __GI___assert_fail (assertion=0x56367d72c64c "node", file=0x56367d72c5f0 "/data/src/10.6/storage/innobase/fil/fil0fil.cc", line=89, function=0x56367d72c620 "static bool fil_space_t::try_to_close(bool)") at assert.c:101
            #8  0x000056367d0c30b0 in fil_space_t::try_to_close (print_info=false) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:89
            #9  0x000056367d0c423e in fil_node_open_file (node=0x7f8ab01b3ea8) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:423
            #10 0x000056367d0c54ec in fil_space_t::prepare (this=0x7f8ab01b3d38, have_mutex=true) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:651
            #11 0x000056367d0dd645 in fil_space_t::acquire_if_not_stopped (this=0x7f8ab01b3d38) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1406
            #12 0x000056367d0dd981 in fil_system_t::default_encrypt_next (this=0x56367e7a2cc0 <fil_system>, space=0x7f8ab01b3d38, recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1473
            #13 0x000056367d0d9ff9 in fil_space_t::next (space=..., recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1499
            #14 0x000056367d0da51b in fil_crypt_find_space_to_rotate (key_state=0x7f8ae57fbcd4, state=0x7f8ae57fbce0, recheck=0x7f8ae57fbcc7) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1598
            #15 0x000056367d0dbe2a in fil_crypt_thread () at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:2101
            #16 0x000056367d064017 in std::__invoke_impl<void, void (*)()> (__f=@0x56367f9d49a8: 0x56367d0dbd3b <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:60
            #17 0x000056367d063fc3 in std::__invoke<void (*)()> (__fn=@0x56367f9d49a8: 0x56367d0dbd3b <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:95
            #18 0x000056367d063f64 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56367f9d49a8) at /usr/include/c++/9/thread:244
            #19 0x000056367d063f36 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56367f9d49a8) at /usr/include/c++/9/thread:251
            #20 0x000056367d063f16 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56367f9d49a0) at /usr/include/c++/9/thread:195
            

            elenst Elena Stepanova added a comment - The assertion has changed, but the failure is still reproducible on current 10.6: 10.6 891a927e mysqld: /data/src/10.6/storage/innobase/fil/fil0fil.cc:89: static bool fil_space_t::try_to_close(bool): Assertion `node' failed. 210628 17:50:02 [ERROR] mysqld got signal 6 ;   #7 0x00007f8b0b0f5f36 in __GI___assert_fail (assertion=0x56367d72c64c "node", file=0x56367d72c5f0 "/data/src/10.6/storage/innobase/fil/fil0fil.cc", line=89, function=0x56367d72c620 "static bool fil_space_t::try_to_close(bool)") at assert.c:101 #8 0x000056367d0c30b0 in fil_space_t::try_to_close (print_info=false) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:89 #9 0x000056367d0c423e in fil_node_open_file (node=0x7f8ab01b3ea8) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:423 #10 0x000056367d0c54ec in fil_space_t::prepare (this=0x7f8ab01b3d38, have_mutex=true) at /data/src/10.6/storage/innobase/fil/fil0fil.cc:651 #11 0x000056367d0dd645 in fil_space_t::acquire_if_not_stopped (this=0x7f8ab01b3d38) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1406 #12 0x000056367d0dd981 in fil_system_t::default_encrypt_next (this=0x56367e7a2cc0 <fil_system>, space=0x7f8ab01b3d38, recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1473 #13 0x000056367d0d9ff9 in fil_space_t::next (space=..., recheck=false, encrypt=true) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1499 #14 0x000056367d0da51b in fil_crypt_find_space_to_rotate (key_state=0x7f8ae57fbcd4, state=0x7f8ae57fbce0, recheck=0x7f8ae57fbcc7) at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:1598 #15 0x000056367d0dbe2a in fil_crypt_thread () at /data/src/10.6/storage/innobase/fil/fil0crypt.cc:2101 #16 0x000056367d064017 in std::__invoke_impl<void, void (*)()> (__f=@0x56367f9d49a8: 0x56367d0dbd3b <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:60 #17 0x000056367d063fc3 in std::__invoke<void (*)()> (__fn=@0x56367f9d49a8: 0x56367d0dbd3b <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:95 #18 0x000056367d063f64 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56367f9d49a8) at /usr/include/c++/9/thread:244 #19 0x000056367d063f36 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56367f9d49a8) at /usr/include/c++/9/thread:251 #20 0x000056367d063f16 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56367f9d49a0) at /usr/include/c++/9/thread:195

            Setting tentatively to a regression, as the assertions exist in 10.5 code, but the failures don't seem to be reproducible on 10.5.

            elenst Elena Stepanova added a comment - Setting tentatively to a regression, as the assertions exist in 10.5 code, but the failures don't seem to be reproducible on 10.5.

            I can repeat both assertions in 10.5 with the MDEV-27985 fix, by repeatedly running the test innodb.table_definition_cache_debug. The more frequent assertion is fixed by the following:

            diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
            index 39049f237e1..97ca8c2f395 100644
            --- a/storage/innobase/fil/fil0fil.cc
            +++ b/storage/innobase/fil/fil0fil.cc
            @@ -454,6 +454,8 @@ static bool fil_node_open_file(fil_node_t *node)
                   /* Flush tablespaces so that we can close modified files. */
                   fil_flush_file_spaces();
                   mutex_enter(&fil_system.mutex);
            +      if (node->is_open())
            +        return true;
                 }
               }
             
            

            This one is much harder to hit:

            #7  0x00007f0dbd9b3222 in __GI___assert_fail (assertion=0x5589702bdab4 "node", file=0x5589702be0f0 "/mariadb/10.5m/storage/innobase/fil/fil0fil.cc", line=92, function=0x5589702be580 "static bool fil_space_t::try_to_close(bool)") at assert.c:101
            #8  0x000055896fef7576 in fil_space_t::try_to_close (print_info=print_info@entry=false) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:92
            #9  0x000055896fefbd4b in fil_node_open_file (node=node@entry=0x7f0d606d7ac0) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:440
            #10 0x000055896fefbf2a in fil_space_t::prepare (this=this@entry=0x7f0d6076b290, have_mutex=have_mutex@entry=false) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:676
            #11 0x000055896fefc3cc in fil_space_t::get (id=id@entry=52662) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:1487
            #12 0x000055896feb3a06 in buf_flush_space (id=id@entry=52662) at /mariadb/10.5m/storage/innobase/buf/buf0flu.cc:1231
            #13 0x000055896feb44f2 in buf_do_flush_list_batch (max_n=max_n@entry=1080, lsn=lsn@entry=18446744073709551615) at /mariadb/10.5m/storage/innobase/buf/buf0flu.cc:1482
            

            I think that this trouble was caused by the replacement of the fil_system.LRU list in MDEV-23855.

            marko Marko Mäkelä added a comment - I can repeat both assertions in 10.5 with the MDEV-27985 fix, by repeatedly running the test innodb.table_definition_cache_debug . The more frequent assertion is fixed by the following: diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc index 39049f237e1..97ca8c2f395 100644 --- a/storage/innobase/fil/fil0fil.cc +++ b/storage/innobase/fil/fil0fil.cc @@ -454,6 +454,8 @@ static bool fil_node_open_file(fil_node_t *node) /* Flush tablespaces so that we can close modified files. */ fil_flush_file_spaces(); mutex_enter(&fil_system.mutex); + if (node->is_open()) + return true; } } This one is much harder to hit: #7 0x00007f0dbd9b3222 in __GI___assert_fail (assertion=0x5589702bdab4 "node", file=0x5589702be0f0 "/mariadb/10.5m/storage/innobase/fil/fil0fil.cc", line=92, function=0x5589702be580 "static bool fil_space_t::try_to_close(bool)") at assert.c:101 #8 0x000055896fef7576 in fil_space_t::try_to_close (print_info=print_info@entry=false) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:92 #9 0x000055896fefbd4b in fil_node_open_file (node=node@entry=0x7f0d606d7ac0) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:440 #10 0x000055896fefbf2a in fil_space_t::prepare (this=this@entry=0x7f0d6076b290, have_mutex=have_mutex@entry=false) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:676 #11 0x000055896fefc3cc in fil_space_t::get (id=id@entry=52662) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:1487 #12 0x000055896feb3a06 in buf_flush_space (id=id@entry=52662) at /mariadb/10.5m/storage/innobase/buf/buf0flu.cc:1231 #13 0x000055896feb44f2 in buf_do_flush_list_batch (max_n=max_n@entry=1080, lsn=lsn@entry=18446744073709551615) at /mariadb/10.5m/storage/innobase/buf/buf0flu.cc:1482 I think that this trouble was caused by the replacement of the fil_system.LRU list in MDEV-23855 .

            Finally, I got an rr replay stack trace for the ut_ad(node) assertion. The tablespace is only being created at that point of time.

            #0  0x000055869d488151 in TTASEventMutex<GenericPolicy>::exit (this=this@entry=0x55869e09fa20 <fil_system+32>) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:428
            #1  0x000055869d489520 in PolicyMutex<TTASEventMutex<GenericPolicy> >::exit (this=this@entry=0x55869e09fa20 <fil_system+32>) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:564
            #2  0x000055869d6b4143 in fil_space_t::create (name=<optimized out>, name@entry=0x7f962c314450 "test/t_571", id=<optimized out>, id@entry=435, flags=flags@entry=21, purpose=purpose@entry=FIL_TYPE_TABLESPACE, 
                crypt_data=crypt_data@entry=0x0, mode=mode@entry=FIL_ENCRYPTION_DEFAULT) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:1039
            #3  0x000055869d6b48a4 in fil_ibd_create (space_id=435, name=0x7f962c314450 "test/t_571", path=<optimized out>, path@entry=0x7f962c31c1c0 "./test/t_571.ibd", flags=flags@entry=21, size=size@entry=4, 
                mode=FIL_ENCRYPTION_DEFAULT, key_id=1, err=0x1f8639cd15d4) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:2398
            

            At the time of the crash, this thread is trying to add the data file to the tablespace object:

            #4  0x000055869d489712 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=this@entry=0x55869e09fa20 <fil_system+32>, n_spins=30, n_delay=4, name=name@entry=0x55869da790f0 "/mariadb/10.5m/storage/innobase/fil/fil0fil.cc", line=line@entry=344) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:588
            #5  0x000055869d6b2e18 in fil_space_t::add (this=this@entry=0x7f962c31baf0, name=0x7f962c31c1c0 "./test/t_571.ibd", handle=<optimized out>, handle@entry={m_file = 37}, size=size@entry=4, is_raw=is_raw@entry=false, atomic_write=atomic_write@entry=true, max_pages=4294967295) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:344
            #6  0x000055869d6b48e2 in fil_ibd_create (space_id=435, name=0x7f962c314450 "test/t_571", path=<optimized out>, path@entry=0x7f962c31c1c0 "./test/t_571.ibd", flags=<optimized out>, flags@entry=21, size=size@entry=4, mode=FIL_ENCRYPTION_DEFAULT, key_id=1, err=0x1f8639cd15d4) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:2402
            

            marko Marko Mäkelä added a comment - Finally, I got an rr replay stack trace for the ut_ad(node) assertion. The tablespace is only being created at that point of time. #0 0x000055869d488151 in TTASEventMutex<GenericPolicy>::exit (this=this@entry=0x55869e09fa20 <fil_system+32>) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:428 #1 0x000055869d489520 in PolicyMutex<TTASEventMutex<GenericPolicy> >::exit (this=this@entry=0x55869e09fa20 <fil_system+32>) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:564 #2 0x000055869d6b4143 in fil_space_t::create (name=<optimized out>, name@entry=0x7f962c314450 "test/t_571", id=<optimized out>, id@entry=435, flags=flags@entry=21, purpose=purpose@entry=FIL_TYPE_TABLESPACE, crypt_data=crypt_data@entry=0x0, mode=mode@entry=FIL_ENCRYPTION_DEFAULT) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:1039 #3 0x000055869d6b48a4 in fil_ibd_create (space_id=435, name=0x7f962c314450 "test/t_571", path=<optimized out>, path@entry=0x7f962c31c1c0 "./test/t_571.ibd", flags=flags@entry=21, size=size@entry=4, mode=FIL_ENCRYPTION_DEFAULT, key_id=1, err=0x1f8639cd15d4) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:2398 At the time of the crash, this thread is trying to add the data file to the tablespace object: #4 0x000055869d489712 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=this@entry=0x55869e09fa20 <fil_system+32>, n_spins=30, n_delay=4, name=name@entry=0x55869da790f0 "/mariadb/10.5m/storage/innobase/fil/fil0fil.cc", line=line@entry=344) at /mariadb/10.5m/storage/innobase/include/ib0mutex.h:588 #5 0x000055869d6b2e18 in fil_space_t::add (this=this@entry=0x7f962c31baf0, name=0x7f962c31c1c0 "./test/t_571.ibd", handle=<optimized out>, handle@entry={m_file = 37}, size=size@entry=4, is_raw=is_raw@entry=false, atomic_write=atomic_write@entry=true, max_pages=4294967295) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:344 #6 0x000055869d6b48e2 in fil_ibd_create (space_id=435, name=0x7f962c314450 "test/t_571", path=<optimized out>, path@entry=0x7f962c31c1c0 "./test/t_571.ibd", flags=<optimized out>, flags@entry=21, size=size@entry=4, mode=FIL_ENCRYPTION_DEFAULT, key_id=1, err=0x1f8639cd15d4) at /mariadb/10.5m/storage/innobase/fil/fil0fil.cc:2402

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.