Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27404

SIGSEGV in rbt_free_node, hang in 'closing tables' state on DROP TABLE

Details

    Description

      Sporadic issue, SIGSEGV requires about 5 to 20 executions, hang potentially more (less than 1000). SIGSEGV happens on optimized and debug builds. The hangs may be limited to debug builds only, though it is difficult to say as optimized builds hit the SIGSEGV first/earlier. Issue seems SPIDER triggered, but stack is InnoDB. DROP/CREATE/USE test (and INSTALL PLUGIN) are only there to ensure a clean state on loop, not required otherwise.

      DROP DATABASE test;
      CREATE DATABASE test;
      USE test;
      INSTALL PLUGIN spider SONAME 'ha_spider.so';
      CREATE TABLE t1 (f VARBINARY(1)) ENGINE=SPIDER;
      CREATE TABLE t2 (c INT KEY,c2 INT) ENGINE=SPIDER;
      INSERT INTO t2 VALUES (0);
      ALTER TABLE t1 CHANGE COLUMN a a INT;
      ALTER TABLE t2 CHANGE COLUMN a a INT;
      INSERT INTO t1 VALUES (0);
      SELECT (SELECT * FROM t2) AS c FROM t2;
      ALTER TABLE t2 CHANGE COLUMN a a INT;
      CREATE TABLE t3 (a CHAR(1),FULLTEXT (a)) ENGINE=InnoDB;
      INSERT INTO t2 VALUES (0);
      DROP TABLE t3;
      

      Leads to:

      10.8.0 ccdf5711a8fff0cd610a91fdcf37c8ff1182878c (Optimized)

      Core was generated by `/test/MD121221-mariadb-10.8.0-linux-x86_64-opt/bin/mysqld --no-defaults --core-'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x0000557c430ae382 in rbt_free_node (node=0x0, nil=0x150f8c0bc738)
          at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:731
      731			rbt_free_node(node->left, nil);
      [Current thread is 1 (Thread 0x151040477700 (LWP 3396074))]
      (gdb) bt
      #0  0x0000557c430ae382 in rbt_free_node (node=0x0, nil=0x150f8c0bc738) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:731
      #1  0x0000557c430af822 in rbt_free_node (nil=0x150f8c0bc738, node=0x150f8c0bc700) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:731
      #2  rbt_free_node (nil=0x150f8c0bc738, node=<optimized out>) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:732
      #3  rbt_free_node (nil=0x150f8c0bc738, node=<optimized out>) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:732
      #4  rbt_free_node (nil=0x150f8c0bc738, node=<optimized out>) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:732
      #5  rbt_free_node (nil=0x150f8c0bc738, node=0x150f8c0bdcb8) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:731
      #6  rbt_free_node (nil=0x150f8c0bc738, node=0x150f8c0bc788) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:731
      #7  rbt_free (tree=0x150f8c0bd4d8) at /test/10.8_opt/storage/innobase/ut/ut0rbt.cc:745
      #8  0x0000557c4314e634 in fts_cache_destroy (cache=0x150f8c09be80) at /test/10.8_opt/storage/innobase/fts/fts0fts.cc:289
      #9  0x0000557c43152132 in fts_t::~fts_t (this=0x150f8c032eb0, __in_chrg=<optimized out>) at /test/10.8_opt/storage/innobase/fts/fts0fts.cc:5198
      #10 fts_free (table=table@entry=0x150f8c03f9f0) at /test/10.8_opt/storage/innobase/fts/fts0fts.cc:5235
      #11 0x0000557c4310a4fc in dict_sys_t::remove (this=0x557c43b42440 <dict_sys>, table=table@entry=0x150f8c03f9f0, lru=lru@entry=false, keep=keep@entry=false) at /test/10.8_opt/storage/innobase/dict/dict0dict.cc:1898
      #12 0x0000557c43128256 in trx_t::commit (this=this@entry=0x151041510230, deleted=std::vector of length 1, capacity 1 = {{m_file = 60, m_psi = 0x0}}) at /test/10.8_opt/storage/innobase/dict/drop.cc:249
      #13 0x0000557c42f9f74c in ha_innobase::delete_table (this=<optimized out>, name=<optimized out>) at /test/10.8_opt/storage/innobase/handler/ha_innodb.cc:13610
      #14 0x0000557c42ca5745 in hton_drop_table (hton=<optimized out>, path=<optimized out>) at /test/10.8_opt/sql/handler.cc:576
      #15 0x0000557c42cabf1c in ha_delete_table (thd=thd@entry=0x150f8c000c58, hton=hton@entry=0x557c44b39b58, path=path@entry=0x1510404759c0 "./test/t3", db=db@entry=0x1510404757b0, alias=alias@entry=0x1510404757c0, generate_warning=generate_warning@entry=true) at /test/10.8_opt/sql/handler.cc:3097
      #16 0x0000557c42b01f16 in mysql_rm_table_no_locks (thd=0x150f8c000c58, tables=<optimized out>, current_db=<optimized out>, ddl_log_state=<optimized out>, if_exists=<optimized out>, drop_temporary=<optimized out>, drop_view=<optimized out>, drop_sequence=<optimized out>, dont_log_query=<optimized out>, dont_free_locks=<optimized out>) at /test/10.8_opt/sql/sql_table.cc:1572
      #17 0x0000557c42b0355e in mysql_rm_table (thd=thd@entry=0x150f8c000c58, tables=tables@entry=0x150f8c010960, if_exists=<optimized out>, drop_temporary=<optimized out>, drop_sequence=<optimized out>, dont_log_query=dont_log_query@entry=false) at /test/10.8_opt/sql/sql_table.cc:1147
      #18 0x0000557c42a60289 in mysql_execute_command (thd=0x150f8c000c58, is_called_from_prepared_stmt=<optimized out>) at /test/10.8_opt/sql/structs.h:568
      #19 0x0000557c42a4f986 in mysql_parse (thd=0x150f8c000c58, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/10.8_opt/sql/sql_parse.cc:8028
      #20 0x0000557c42a5bb35 in dispatch_command (command=COM_QUERY, thd=0x150f8c000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/10.8_opt/sql/sql_class.h:1360
      #21 0x0000557c42a5dd27 in do_command (thd=0x150f8c000c58, blocking=blocking@entry=true) at /test/10.8_opt/sql/sql_parse.cc:1402
      #22 0x0000557c42b7c2e7 in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at /test/10.8_opt/sql/sql_connect.cc:1418
      #23 0x0000557c42b7c62d in handle_one_connection (arg=arg@entry=0x557c45030a58) at /test/10.8_opt/sql/sql_connect.cc:1312
      #24 0x0000557c42eea5d8 in pfs_spawn_thread (arg=0x557c44fe81f8) at /test/10.8_opt/storage/perfschema/pfs.cc:2201
      #25 0x0000151044309609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #26 0x0000151043ef7293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      10.8.0 ccdf5711a8fff0cd610a91fdcf37c8ff1182878c (Debug)

      MariaDB [test]> show processlist;
      +----+-------------+-----------+------+---------+------+----------------------------------------------------+------------------+----------+
      | Id | User        | Host      | db   | Command | Time | State                                              | Info             | Progress |
      +----+-------------+-----------+------+---------+------+----------------------------------------------------+------------------+----------+
      |  4 | root        | localhost | test | Query   |  724 | closing tables                                     | DROP TABLE t3    |    0.000 |
      |  5 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      |  6 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      |  7 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      |  8 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 11 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 13 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 14 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      | 15 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      | 17 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      | 18 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 19 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 20 | system user |           | NULL | Sleep   |  740 | Reset for next command                             | NULL             |    0.000 |
      | 16 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 12 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      |  9 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      | 23 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      | 22 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      | 10 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 21 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL             |    0.000 |
      | 24 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL             |    0.000 |
      | 26 | root        | localhost | test | Query   |    0 | starting                                           | show processlist |    0.000 |
      +----+-------------+-----------+------+---------+------+----------------------------------------------------+------------------+----------+
      22 rows in set (0.000 sec)
      

      Bug confirmed present in:
      MariaDB: 10.5.14 (dbg), 10.5.14 (opt), 10.6.6 (dbg), 10.6.6 (opt), 10.7.2 (dbg), 10.7.2 (opt), 10.8.0 (dbg), 10.8.0 (opt)

      Bug (or feature/syntax) confirmed not present in:
      MariaDB: 10.2.42 (dbg), 10.2.42 (opt), 10.3.33 (dbg), 10.3.33 (opt), 10.4.23 (dbg), 10.4.23 (opt)
      MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.36 (dbg), 5.7.36 (opt), 8.0.27 (dbg), 8.0.27 (opt)

      Attachments

        Issue Links

          Activity

            Also observed, using the same testcase:

            10.5.14 88b339805d7a9ddebc3fd61e9dee83270dbf474d (Debug)

            Core was generated by `/test/MD121221-mariadb-10.5.14-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
            Program terminated with signal SIGSEGV, Segmentation fault.
            #0  PolicyMutex<TTASEventMutex<GenericPolicy> >::pfs_begin_lock (line=74, 
                name=0x5644980f8ab0 "/test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc", 
                state=0x7ffe9437b620, this=0x0)
                at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:700
            700		PSI_mutex_locker* pfs_begin_lock(
            [Current thread is 1 (Thread 0x1487c3cb0800 (LWP 3393804))]
            (gdb) bt
            #0  PolicyMutex<TTASEventMutex<GenericPolicy> >::pfs_begin_lock (line=74, name=0x5644980f8ab0 "/test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc", state=0x7ffe9437b620, this=0x0) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:700
            #1  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x0, n_spins=30, n_delay=4, name=0x5644980f8ab0 "/test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc", line=74) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:585
            #2  0x0000564497a79bc2 in ib_wqueue_add (wq=0x0, item=0x56449b103e88, heap=0x56449b103e08, wq_locked=wq_locked@entry=false) at /test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc:74
            #3  0x0000564497be05a7 in add_msg (msg=<optimized out>, wq_locked=wq_locked@entry=false) at /test/10.5_dbg/storage/innobase/fts/fts0opt.cc:2535
            #4  0x0000564497be4d53 in fts_optimize_shutdown () at /test/10.5_dbg/storage/innobase/fts/fts0opt.cc:3029
            #5  0x0000564497a0eaa6 in srv_shutdown_bg_undo_sources () at /test/10.5_dbg/storage/innobase/srv/srv0start.cc:1968
            #6  0x0000564497a0eb8f in innodb_preshutdown () at /test/10.5_dbg/storage/innobase/srv/srv0start.cc:2001
            #7  0x00005644973f6e09 in plugin_pre_shutdown (plugin=<optimized out>) at /test/10.5_dbg/sql/handler.cc:981
            #8  0x000056449713a401 in plugin_foreach_with_mask (thd=thd@entry=0x0, func=func@entry=0x5644973f6dec <plugin_pre_shutdown(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=10, arg=arg@entry=0x0) at /test/10.5_dbg/sql/sql_plugin.cc:2505
            #9  0x00005644973fba58 in ha_pre_shutdown () at /test/10.5_dbg/sql/handler.cc:988
            #10 0x0000564497017f2d in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.5_dbg/sql/mysqld.cc:5682
            #11 0x0000564497008b16 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.5_dbg/sql/main.cc:25
            

            Let me know if this is considered a separate issue.

            Roel Roel Van de Paar added a comment - Also observed, using the same testcase: 10.5.14 88b339805d7a9ddebc3fd61e9dee83270dbf474d (Debug) Core was generated by `/test/MD121221-mariadb-10.5.14-linux-x86_64-dbg/bin/mysqld --no-defaults --core'. Program terminated with signal SIGSEGV, Segmentation fault. #0 PolicyMutex<TTASEventMutex<GenericPolicy> >::pfs_begin_lock (line=74, name=0x5644980f8ab0 "/test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc", state=0x7ffe9437b620, this=0x0) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:700 700 PSI_mutex_locker* pfs_begin_lock( [Current thread is 1 (Thread 0x1487c3cb0800 (LWP 3393804))] (gdb) bt #0 PolicyMutex<TTASEventMutex<GenericPolicy> >::pfs_begin_lock (line=74, name=0x5644980f8ab0 "/test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc", state=0x7ffe9437b620, this=0x0) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:700 #1 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x0, n_spins=30, n_delay=4, name=0x5644980f8ab0 "/test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc", line=74) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:585 #2 0x0000564497a79bc2 in ib_wqueue_add (wq=0x0, item=0x56449b103e88, heap=0x56449b103e08, wq_locked=wq_locked@entry=false) at /test/10.5_dbg/storage/innobase/ut/ut0wqueue.cc:74 #3 0x0000564497be05a7 in add_msg (msg=<optimized out>, wq_locked=wq_locked@entry=false) at /test/10.5_dbg/storage/innobase/fts/fts0opt.cc:2535 #4 0x0000564497be4d53 in fts_optimize_shutdown () at /test/10.5_dbg/storage/innobase/fts/fts0opt.cc:3029 #5 0x0000564497a0eaa6 in srv_shutdown_bg_undo_sources () at /test/10.5_dbg/storage/innobase/srv/srv0start.cc:1968 #6 0x0000564497a0eb8f in innodb_preshutdown () at /test/10.5_dbg/storage/innobase/srv/srv0start.cc:2001 #7 0x00005644973f6e09 in plugin_pre_shutdown (plugin=<optimized out>) at /test/10.5_dbg/sql/handler.cc:981 #8 0x000056449713a401 in plugin_foreach_with_mask (thd=thd@entry=0x0, func=func@entry=0x5644973f6dec <plugin_pre_shutdown(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=10, arg=arg@entry=0x0) at /test/10.5_dbg/sql/sql_plugin.cc:2505 #9 0x00005644973fba58 in ha_pre_shutdown () at /test/10.5_dbg/sql/handler.cc:988 #10 0x0000564497017f2d in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.5_dbg/sql/mysqld.cc:5682 #11 0x0000564497008b16 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.5_dbg/sql/main.cc:25 Let me know if this is considered a separate issue.

            This problem fixed in latest 10.5. The following patch fixes the issue:

            commit 2ecd39c983a232289b863ba2c7bf8f70ebaa261f (HEAD)
            Author: Nayuta Yanagisawa <nayuta.yanagisawa@hey.com>
            Date:   Tue Jan 11 21:57:11 2022 +0900
             
                MDEV-27240 SIGSEGV in ha_spider::store_lock on LOCK TABLE
                
                The commit e954d9de gave different lifetime to wide_share and
                partition_handler_share. This introduced the possibility that
                partition_handler_share could be accessed even after it was freed.
                
                We stop sharing partitoiin_handler_share and make it belong to
                a single wide_handler to fix the problem.
            

            Concern is that InnoDB fts optimize thread got FTS_STOP_MSG even before shutdown. It leads to freeing
            of all fts related slots(fts_optimize_wq), destroy the background optimize thread.
            Later, shutdown/drop table tries to access fts_optimize_wq, and it leads to hang or crash.
            Mostly, it happens due to memory corruption.

            thiru Thirunarayanan Balathandayuthapani added a comment - This problem fixed in latest 10.5. The following patch fixes the issue: commit 2ecd39c983a232289b863ba2c7bf8f70ebaa261f (HEAD) Author: Nayuta Yanagisawa <nayuta.yanagisawa@hey.com> Date: Tue Jan 11 21:57:11 2022 +0900   MDEV-27240 SIGSEGV in ha_spider::store_lock on LOCK TABLE The commit e954d9de gave different lifetime to wide_share and partition_handler_share. This introduced the possibility that partition_handler_share could be accessed even after it was freed. We stop sharing partitoiin_handler_share and make it belong to a single wide_handler to fix the problem. Concern is that InnoDB fts optimize thread got FTS_STOP_MSG even before shutdown. It leads to freeing of all fts related slots(fts_optimize_wq), destroy the background optimize thread. Later, shutdown/drop table tries to access fts_optimize_wq, and it leads to hang or crash. Mostly, it happens due to memory corruption.

            People

              thiru Thirunarayanan Balathandayuthapani
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.