Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21954

mysqld got signal 11 Fatal signal 6 while backtracing on parallel show global status

Details

    Description

      After upgrading from 10.3.22 to 10.4.12 I have encountered a strange issue - if two or more ' show global status where ...' are issued in parallel mysql crashes

      Something like:

      #!/bin/bash
           mysql -e 'show global status where Variable_name="Com_delete"' &
           mysql -e 'show global status where Variable_name="Com_insert"' &
           mysql -e 'show global status where Variable_name="Com_select"' &
      

      crashes the server reliably:

      200316 20:09:03 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.4.12-MariaDB-log
      key_buffer_size=209715200
      read_buffer_size=4194304
      max_used_connections=3
      max_threads=10002
      thread_count=20
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 82388318 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f7dba213dc8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f7dba69f528 thread_stack 0x49000
      *** buffer overflow detected ***: /usr/sbin/mysqld terminated
      Fatal signal 6 while backtracing
      

      If you execute each query sequentially there are no problems.

      The coredump looks like:

      Core was generated by `/usr/sbin/mysqld'.
      Program terminated with signal SIGABRT, Aborted.
      #0  0x00007f8dbf36d160 in raise () from /lib64/libc.so.6
       
      (gdb) bt
      #0  0x00007f8dbf36d160 in raise () from /lib64/libc.so.6
      #1  0x00007f8dbf36e81f in abort () from /lib64/libc.so.6
      #2  0x00007f8dbf3b04a7 in __libc_message () from /lib64/libc.so.6
      #3  0x00007f8dbf440f6e in __fortify_fail_abort () from /lib64/libc.so.6
      #4  0x00007f8dbf440fa1 in __fortify_fail () from /lib64/libc.so.6
      #5  0x00007f8dbf43ee50 in __chk_fail () from /lib64/libc.so.6
      #6  0x00007f8dbf440eaa in __fdelt_warn () from /lib64/libc.so.6
      #7  0x000055f6b4164475 in my_addr_resolve (ptr=<optimized out>, loc=loc@entry=0x7f7dba699900) at /usr/src/debug/MariaDB-10.4.12/src_0/mysys/my_addr_resolve.c:234
      #8  0x000055f6b4149ba3 in print_with_addr_resolve (n=<optimized out>, addrs=0x7f7dba699920) at /usr/src/debug/MariaDB-10.4.12/src_0/mysys/stacktrace.c:254
      #9  my_print_stacktrace (stack_bottom=<optimized out>, thread_stack=<optimized out>, silent=silent@entry=0 '\000') at /usr/src/debug/MariaDB-10.4.12/src_0/mysys/stacktrace.c:273
      #10 0x000055f6b3bbd745 in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/signal_handler.cc:206
      #11 <signal handler called>
      #12 0x00007f8d9e89b55a in read_partitioned_counter () from /usr/lib64/mysql/plugin/ha_tokudb.so
      #13 0x00007f8d9e83fc2d in ?? () from /usr/lib64/mysql/plugin/ha_tokudb.so
      #14 0x000055f6b3a16752 in show_status_array (thd=thd@entry=0x7f7dba213dc8, wild=wild@entry=0x0, variables=0x7f8dbe0684d8, scope=scope@entry=SHOW_OPT_GLOBAL,
          status_var=status_var@entry=0x7f7dba69b260, prefix=prefix@entry=0x55f6b43291b5 "", table=0x7f7dba25e020, ucase_names=false, cond=0x7f7dba247e00)
          at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_show.cc:3804
      #15 0x000055f6b3a199b3 in fill_status (thd=0x7f7dba213dc8, tables=0x7f7dba2482f8, cond=<optimized out>) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_show.cc:7946
      #16 0x000055f6b3a1f072 in get_schema_tables_result (join=join@entry=0x7f7dba248db0, executed_place=executed_place@entry=PROCESSED_BY_JOIN_EXEC)
          at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_show.cc:8914
      #17 0x000055f6b3a052cd in JOIN::exec_inner (this=this@entry=0x7f7dba248db0) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_select.cc:4409
      #18 0x000055f6b3a058a3 in JOIN::exec (this=this@entry=0x7f7dba248db0) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_select.cc:4234
      #19 0x000055f6b3a03c61 in mysql_select (thd=0x7f7dba213dc8, tables=0x7f7dba2482f8, wild_num=0, fields=..., conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0,
          proc_param=0x0, select_options=2684619520, result=0x7f7dba248d88, unit=0x7f7dba217b30, select_lex=0x7f7dba218328) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_select.cc:4666
      #20 0x000055f6b3a04611 in handle_select (thd=0x7f7dba213dc8, lex=0x7f7dba217a70, result=0x7f7dba248d88, setup_tables_done_option=0)
          at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_select.cc:408
      #21 0x000055f6b39a0c21 in execute_sqlcom_select (thd=0x7f7dba213dc8, all_tables=0x7f7dba2482f8) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_parse.cc:6360
      #22 0x000055f6b39ae27c in mysql_execute_command (thd=0x7f7dba213dc8) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_parse.cc:6394
      #23 0x000055f6b39b07da in mysql_parse (thd=0x7f7dba213dc8, rawbuf=<optimized out>, length=51, parser_state=0x7f7dba69e850, is_com_multi=<optimized out>, is_next_command=<optimized out>)
          at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_parse.cc:7901
      #24 0x000055f6b39b2d71 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f7dba213dc8,
          packet=packet@entry=0x7f7dba231f89 "show global status where Variable_name=\"Com_delete\"", packet_length=packet_length@entry=51, is_com_multi=is_com_multi@entry=false,
          is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_parse.cc:1841
      #25 0x000055f6b39b4648 in do_command (thd=0x7f7dba213dc8) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_parse.cc:1359
      #26 0x000055f6b3a907fe in do_handle_one_connection (connect=connect@entry=0x7f8dbe044c68) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_connect.cc:1412
      #27 0x000055f6b3a908bd in handle_one_connection (arg=0x7f8dbe044c68) at /usr/src/debug/MariaDB-10.4.12/src_0/sql/sql_connect.cc:1316
      #28 0x00007f8dc18ab569 in start_thread () from /lib64/libpthread.so.0
      #29 0x00007f8dbf42f9ef in clone () from /lib64/libc.so.6
      

      (full backtrace in attachment)

      Attachments

        1. bt.txt
          19 kB
        2. bt2.txt
          5 kB

        Activity

          Roze Reinis Rozitis added a comment - - edited

          It looks somewhat similar to MDEV-19310

          Roze Reinis Rozitis added a comment - - edited It looks somewhat similar to MDEV-19310

          Initially I though/wrote that the issue doesn't manifest on an empty server, but just by increasing the parallel query count (for example to 6) it crashes also, but at least it now mysqld generates a stacktrace (bt2.txt)

          Roze Reinis Rozitis added a comment - Initially I though/wrote that the issue doesn't manifest on an empty server, but just by increasing the parallel query count (for example to 6) it crashes also, but at least it now mysqld generates a stacktrace (bt2.txt)
          alice Alice Sherepa added a comment -

          Thanks! I repeated on 10.4, when TokuDB engine is installed, could not repeat on 5.5-10.3

          source include/have_tokudb.inc;
           
          connect (conn1,localhost,root,,);
          --send
          show global status where Variable_name="Com_delete";
           
          connection default;
          show global status where Variable_name="Com_select";
          

          10.4 517f659e6d5eeb7e01bf19

          ==11464==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fa88f5162e9 bp 0x7fa87b249ae0 sp 0x7fa87b249a40 T25)
              #0 0x7fa88f5162e8 in read_partitioned_counter /10.4/storage/tokudb/PerconaFT/util/partitioned_counter.cc:389
              #1 0x7fa88f3028d4 in show_tokudb_vars /10.4/storage/tokudb/hatoku_hton.cc:1961
              #2 0xb6d60f in show_status_array /10.4/sql/sql_show.cc:3805
              #3 0xb9d828 in fill_status(THD*, TABLE_LIST*, Item*) /10.4/sql/sql_show.cc:7946
              #4 0xba5fb2 in get_schema_tables_result(JOIN*, enum_schema_table_state) /10.4/sql/sql_show.cc:8914
              #5 0xa90533 in JOIN::exec_inner() /10.4/sql/sql_select.cc:4412
              #6 0xa8e5e5 in JOIN::exec() /10.4/sql/sql_select.cc:4237
              #7 0xa9224c in mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /10.4/sql/sql_select.cc:4669
              #8 0xa6745c in handle_select(THD*, LEX*, select_result*, unsigned long) /10.4/sql/sql_select.cc:422
              #9 0x9e67c4 in execute_sqlcom_select /10.4/sql/sql_parse.cc:6359
              #10 0x9e6f64 in execute_show_status /10.4/sql/sql_parse.cc:6393
              #11 0x9d3dd7 in mysql_execute_command(THD*) /10.4/sql/sql_parse.cc:3816
              #12 0x9ef112 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /10.4/sql/sql_parse.cc:7900
              #13 0x9c7d28 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /10.4/sql/sql_parse.cc:1842
              #14 0x9c4b58 in do_command(THD*) /10.4/sql/sql_parse.cc:1360
              #15 0xd5ba8c in do_handle_one_connection(CONNECT*) /10.4/sql/sql_connect.cc:1412
              #16 0xd5b432 in handle_one_connection /10.4/sql/sql_connect.cc:1316
              #17 0x2218db6 in pfs_spawn_thread /10.4/storage/perfschema/pfs.cc:1869
              #18 0x7fa89891b6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
              #19 0x7fa897bac41c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
           
          AddressSanitizer can not provide additional info.
          SUMMARY: AddressSanitizer: SEGV /10.4/storage/tokudb/PerconaFT/util/partitioned_counter.cc:389 read_partitioned_counter
          Thread T25 created by T0 here:
              #0 0x7fa8997eb253 in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x36253)
              #1 0x22191a3 in spawn_thread_v1 /10.4/storage/perfschema/pfs.cc:1919
              #2 0x70e830 in inline_mysql_thread_create /10.4/include/mysql/psi/mysql_thread.h:1275
              #3 0x7238f7 in create_thread_to_handle_connection(CONNECT*) /10.4/sql/mysqld.cc:6242
              #4 0x72401b in create_new_thread(CONNECT*) /10.4/sql/mysqld.cc:6312
              #5 0x7243c3 in handle_accepted_socket(st_mysql_socket, st_mysql_socket) /10.4/sql/mysqld.cc:6410
              #6 0x725040 in handle_connections_sockets() /10.4/sql/mysqld.cc:6568
              #7 0x723108 in mysqld_main(int, char**) /10.4/sql/mysqld.cc:5900
              #8 0x70c625 in main /10.4/sql/main.cc:25
              #9 0x7fa897ac582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
          

          alice Alice Sherepa added a comment - Thanks! I repeated on 10.4, when TokuDB engine is installed, could not repeat on 5.5-10.3 source include/have_tokudb.inc;   connect (conn1,localhost,root,,); --send show global status where Variable_name= "Com_delete" ;   connection default ; show global status where Variable_name= "Com_select" ; 10.4 517f659e6d5eeb7e01bf19 ==11464==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fa88f5162e9 bp 0x7fa87b249ae0 sp 0x7fa87b249a40 T25) #0 0x7fa88f5162e8 in read_partitioned_counter /10.4/storage/tokudb/PerconaFT/util/partitioned_counter.cc:389 #1 0x7fa88f3028d4 in show_tokudb_vars /10.4/storage/tokudb/hatoku_hton.cc:1961 #2 0xb6d60f in show_status_array /10.4/sql/sql_show.cc:3805 #3 0xb9d828 in fill_status(THD*, TABLE_LIST*, Item*) /10.4/sql/sql_show.cc:7946 #4 0xba5fb2 in get_schema_tables_result(JOIN*, enum_schema_table_state) /10.4/sql/sql_show.cc:8914 #5 0xa90533 in JOIN::exec_inner() /10.4/sql/sql_select.cc:4412 #6 0xa8e5e5 in JOIN::exec() /10.4/sql/sql_select.cc:4237 #7 0xa9224c in mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /10.4/sql/sql_select.cc:4669 #8 0xa6745c in handle_select(THD*, LEX*, select_result*, unsigned long) /10.4/sql/sql_select.cc:422 #9 0x9e67c4 in execute_sqlcom_select /10.4/sql/sql_parse.cc:6359 #10 0x9e6f64 in execute_show_status /10.4/sql/sql_parse.cc:6393 #11 0x9d3dd7 in mysql_execute_command(THD*) /10.4/sql/sql_parse.cc:3816 #12 0x9ef112 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /10.4/sql/sql_parse.cc:7900 #13 0x9c7d28 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /10.4/sql/sql_parse.cc:1842 #14 0x9c4b58 in do_command(THD*) /10.4/sql/sql_parse.cc:1360 #15 0xd5ba8c in do_handle_one_connection(CONNECT*) /10.4/sql/sql_connect.cc:1412 #16 0xd5b432 in handle_one_connection /10.4/sql/sql_connect.cc:1316 #17 0x2218db6 in pfs_spawn_thread /10.4/storage/perfschema/pfs.cc:1869 #18 0x7fa89891b6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) #19 0x7fa897bac41c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)   AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /10.4/storage/tokudb/PerconaFT/util/partitioned_counter.cc:389 read_partitioned_counter Thread T25 created by T0 here: #0 0x7fa8997eb253 in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x36253) #1 0x22191a3 in spawn_thread_v1 /10.4/storage/perfschema/pfs.cc:1919 #2 0x70e830 in inline_mysql_thread_create /10.4/include/mysql/psi/mysql_thread.h:1275 #3 0x7238f7 in create_thread_to_handle_connection(CONNECT*) /10.4/sql/mysqld.cc:6242 #4 0x72401b in create_new_thread(CONNECT*) /10.4/sql/mysqld.cc:6312 #5 0x7243c3 in handle_accepted_socket(st_mysql_socket, st_mysql_socket) /10.4/sql/mysqld.cc:6410 #6 0x725040 in handle_connections_sockets() /10.4/sql/mysqld.cc:6568 #7 0x723108 in mysqld_main(int, char**) /10.4/sql/mysqld.cc:5900 #8 0x70c625 in main /10.4/sql/main.cc:25 #9 0x7fa897ac582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
          Roze Reinis Rozitis added a comment - - edited

          Any status or ETA on this? It still affects 10.4.14 but not the 10.3.x branch.

          The only way to upgrade is to turn off monitoring system and hope someone doesn't run 'show global status' too often

          Roze Reinis Rozitis added a comment - - edited Any status or ETA on this? It still affects 10.4.14 but not the 10.3.x branch. The only way to upgrade is to turn off monitoring system and hope someone doesn't run 'show global status' too often

          Does it happen without TokuDB? TokuDB is not supported even by its owner, there's little chance we'll be able to fix a bug, if it's inside TokuDB

          serg Sergei Golubchik added a comment - Does it happen without TokuDB? TokuDB is not supported even by its owner, there's little chance we'll be able to fix a bug, if it's inside TokuDB

          Without tokudb there is no crash, but it doesn't happen on 10.3.x branch with tokudb. Isn't the source base for the engine generally the same so there is something different how 10.4 fetches the engine stats?

          Also I assumed that at least till EOL of 10.4 tokudb will be somewhat supported. Atm I have to downgrade back to 10.3 because it's somewhat scary to have such an easy way to invoke a segfault

          Roze Reinis Rozitis added a comment - Without tokudb there is no crash, but it doesn't happen on 10.3.x branch with tokudb. Isn't the source base for the engine generally the same so there is something different how 10.4 fetches the engine stats? Also I assumed that at least till EOL of 10.4 tokudb will be somewhat supported. Atm I have to downgrade back to 10.3 because it's somewhat scary to have such an easy way to invoke a segfault

          This is TokuDB bug after all. In show_tokudb_vars() (storage/tokudb/hatoku_hton.cc:1961) TokuDB reads the status into and freely modifies a shared global status array with the comment

          static TOKU_ENGINE_STATUS_ROW_S* toku_global_status_rows = NULL;
          ...
          static int show_tokudb_vars(TOKUDB_UNUSED(THD* thd),
                                      SHOW_VAR* var,
                                      TOKUDB_UNUSED(char* buff)) {
          ...
              error = db_env->get_engine_status(
                  db_env,
                  toku_global_status_rows,
                  toku_global_status_max_rows,
                  &num_rows,
                  &redzone_state,
                  &panic,
                  panic_string,
                  panic_string_len,
                  TOKU_GLOBAL_STATUS);
          ...
                      TOKU_ENGINE_STATUS_ROW_S &status_row = toku_global_status_rows[row];
          ...
                          // Reuse the memory in status_row. (It belongs to us).
          

          See, a shared global toku_global_status_rows, the status is read into it, later a status row is modified under the assumption that "it belongs to us".

          When this function is called concurrently by two threads at the same time the global array of status rows gets corrupted.

          It 10.3 this did not cause a crash because the server protected access to status variables with a mutex. In 10.4 MDEV-15135 replaced it with an rwlock.

          serg Sergei Golubchik added a comment - This is TokuDB bug after all. In show_tokudb_vars() (storage/tokudb/hatoku_hton.cc:1961) TokuDB reads the status into and freely modifies a shared global status array with the comment static TOKU_ENGINE_STATUS_ROW_S* toku_global_status_rows = NULL; ... static int show_tokudb_vars(TOKUDB_UNUSED(THD* thd), SHOW_VAR* var, TOKUDB_UNUSED( char * buff)) { ... error = db_env->get_engine_status( db_env, toku_global_status_rows, toku_global_status_max_rows, &num_rows, &redzone_state, &panic, panic_string, panic_string_len, TOKU_GLOBAL_STATUS); ... TOKU_ENGINE_STATUS_ROW_S &status_row = toku_global_status_rows[row]; ... // Reuse the memory in status_row. (It belongs to us). See, a shared global toku_global_status_rows , the status is read into it, later a status row is modified under the assumption that "it belongs to us". When this function is called concurrently by two threads at the same time the global array of status rows gets corrupted. It 10.3 this did not cause a crash because the server protected access to status variables with a mutex. In 10.4 MDEV-15135 replaced it with an rwlock.

          People

            serg Sergei Golubchik
            Roze Reinis Rozitis
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.