Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16618

Executing stored procedure crashes mysqld

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.1.24
    • 10.1.29, 10.2.11, 10.3.3
    • Server
    • None
    • CentOS 7.3.

    Description

      we recently upgraded a host from 5.5.50 to 10.1.24; and now get the following error when a certain stored procedure is executed. It does not happen every time and immediately after restart - the stored procedure call finishes without issue. I dropped and recreated all the stored procedures on the host in case it was some sort of corruption, but after a couple days - the server crashed again reporting the following in the error log:

      180628  7:01:35 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.1.24-MariaDB
      key_buffer_size=5368709120
      read_buffer_size=270336
      max_used_connections=251
      max_threads=1252
      thread_count=236
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 8163227 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f270138f008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f23da2410b0 thread_stack 0x48400
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f2c6dacc33e]
      /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7f2c6d5f20a5]
      /lib64/libpthread.so.0(+0xf370)[0x7f2c6cc0a370]
      /usr/sbin/mysqld(_ZN4JOIN4execEv+0x11)[0x7f2c6d4c86e1]
      /usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x112)[0x7f2c6d4c4dc2]
      mysys/stacktrace.c:268(my_print_stacktrace)[0x7f2c6d44a0bb]
      sql/sql_analyze_stmt.h:97(Time_and_counter_tracker::incr_loops())[0x7f2c6d44b3d4]
      sql/sql_derived.cc:942(mysql_derived_fill(THD*, LEX*, TABLE_LIST*))[0x7f2c6d4a53df]
      sql/sql_select.cc:21464(create_sort_index(THD*, JOIN*, st_order*, unsigned long long, unsigned long long, bool))[0x7f2c6d4be698]
      sql/sql_select.cc:3186(JOIN::exec_inner())[0x7f2c6d4c6637]
      sql/sql_select.cc:2512(JOIN::exec())[0x7f2c6d4c8724]
      sql/sql_select.cc:3447(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x7f2c6d4c4dc2]
      sql/sql_select.cc:384(handle_select(THD*, LEX*, select_result*, unsigned long))[0x7f2c6d4c58a5]
      sql/sql_parse.cc:5923(execute_sqlcom_select(THD*, TABLE_LIST*))[0x7f2c6d464971]
      sql/sql_parse.cc:2979(mysql_execute_command(THD*))[0x7f2c6d470fbb]
      sql/sp_head.cc:3213(sp_instr_stmt::exec_core(THD*, unsigned int*))[0x7f2c6d700d86]
      sql/sp_head.cc:2978(sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x7f2c6d707853]
      sql/sp_head.cc:3129(sp_instr_stmt::execute(THD*, unsigned int*))[0x7f2c6d707e05]
      sql/sp_head.cc:1318(sp_head::execute(THD*, bool))[0x7f2c6d703d46]
      sql/sp_head.cc:2103(sp_head::execute_procedure(THD*, List<Item>*))[0x7f2c6d705397]
      sql/sql_parse.cc:2415(mysql_execute_command(THD*))[0x7f2c6d4714c8]
      sql/sql_parse.cc:7340(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x7f2c6d474592]
      sql/sql_parse.cc:1490(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x7f2c6d477a6a]
      sql/sql_parse.cc:1111(do_command(THD*))[0x7f2c6d4782ca]
      sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x7f2c6d54039a]
      sql/sql_connect.cc:1263(handle_one_connection)[0x7f2c6d540540]
      /lib64/libpthread.so.0(+0x7dc5)[0x7f2c6cc02dc5]
      /lib64/libc.so.6(clone+0x6d)[0x7f2c6b02176d]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f2402f8b3c8): is an invalid pointer
      Connection ID (thread ID): 19291
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
       
      We think the query pointer is invalid, but we will try to print it anyway.
      Query: select *
      from
      ((select
      r.trade_date, r.user_id, r.context,
      cast(@runningNet := @runningNet + r.net_pnl as decimal(12,2)) as running_net, r.net_pnl,
      cast(@runningGross := @runningGross + r.gross_pnl as decimal(12,2)) as running_gross, r.gross_pnl,
      cast(@runningCosts := @runningCosts + r.costs as decimal(12,2)) as running_costs, r.costs,
      cast(r.round_turns as signed) as round_turns,
      cast((@runningGross - @runningCosts) / (@count := @count + 1) as decimal(12,2)) as avg_net,
      cast(@runningCosts * 100 / @runningGross as decimal(6,2)) as cost_ratio
      from
      ((select
      trade_date, user_id, context,
      sum(net_pnl) as net_pnl,
      sum(gross_pnl) as gross_pnl,
      sum(costs) as costs,
      ((sum(buys) + sum(sells)) / 2) as round_turns
      from daily where context =  NAME_CONST('ctxt',_latin1'Tahoe' COLLATE 'latin1_swedish_ci') group by trade_date order by trade_date asc) as r)) as a) order by trade_date desc
       
      180628 07:01:36 mysqld_safe Number of processes running now: 0
      180628 07:01:36 mysqld_safe mysqld restarted
      

      Attachments

        Issue Links

          Activity

            lpatrick Lori Patrick added a comment -

            Thanks Elena! I hear what you're saying. I have 30+ servers and this was one of the last two that would get me off 5x. It was my understanding that 10.1.24 was the last version that didn't require me to compile glibc for CentOS 6.5 and that is why I stuck with it as I still have some CentOS 6.5 hosts.. Part of my test cycle was to get to these hosts last as they are the highest risk - so a new version even 10.1.x would set me back quite a bit. Unfortunate I tripped into this problem as I was nearly done getting everything to 10 and I am pretty surprised I found such a serious bug this late in the upgrade cycle.

            I will pull together all the structures involved and add this to the ticket. Thanks!

            lpatrick Lori Patrick added a comment - Thanks Elena! I hear what you're saying. I have 30+ servers and this was one of the last two that would get me off 5x. It was my understanding that 10.1.24 was the last version that didn't require me to compile glibc for CentOS 6.5 and that is why I stuck with it as I still have some CentOS 6.5 hosts.. Part of my test cycle was to get to these hosts last as they are the highest risk - so a new version even 10.1.x would set me back quite a bit. Unfortunate I tripped into this problem as I was nearly done getting everything to 10 and I am pretty surprised I found such a serious bug this late in the upgrade cycle. I will pull together all the structures involved and add this to the ticket. Thanks!
            lpatrick Lori Patrick added a comment -

            I have not been able to reproduce it consistently. I have tried bringing down the size of innodb_buffer_pool to 100MB and open tables down to 100. Running several restores of mysql dumps files concurrently and looping thru the stored procedures that cause the crash at the same time to try to reproduce. It seems like it happens once something need to recycle (as in a dirty but unused cache that is now needed and needs to recycle), but I can't seem to configure the server to be able to reproduce at will. Until I can do that - testing a new version fixes the problem is a bit difficult.

            Any ideas what config I can run to be a minimal as possible and potentially fill what ever cache or buffer needs to recycle that trips me into this crash?

            lpatrick Lori Patrick added a comment - I have not been able to reproduce it consistently. I have tried bringing down the size of innodb_buffer_pool to 100MB and open tables down to 100. Running several restores of mysql dumps files concurrently and looping thru the stored procedures that cause the crash at the same time to try to reproduce. It seems like it happens once something need to recycle (as in a dirty but unused cache that is now needed and needs to recycle), but I can't seem to configure the server to be able to reproduce at will. Until I can do that - testing a new version fixes the problem is a bit difficult. Any ideas what config I can run to be a minimal as possible and potentially fill what ever cache or buffer needs to recycle that trips me into this crash?
            lpatrick Lori Patrick added a comment - - edited

            Hi. I have been trying to reproduce this error to no avail. I have tried:

            • looping thru stored procedures involved from both windows and linux 10,000 times -> no crash
            • running above concurrently -> no crash
            • brought down buffer pool and was able to get innodb to throw the following:
              InnoDB: Warning: difficult to find free blocks in
              InnoDB: the buffer pool (339 search iterations)!
              InnoDB: 0 failed attempts to flush a page! Consider
              InnoDB: increasing the buffer pool size.
              InnoDB: It is also possible that in your Unix version
              InnoDB: fsync is very slow, or completely frozen inside
              InnoDB: the OS kernel. Then upgrading to a newer version
              InnoDB: of your operating system may help. Look at the
              InnoDB: number of fsyncs in diagnostic info below.
              InnoDB: Pending flushes (fsync) log: 0; buffer pool: 1
              InnoDB: 523993 OS file reads, 2186607 OS file writes, 266724 OS fsyncs
              InnoDB: Starting InnoDB Monitor to print further
              InnoDB: diagnostics to the standard output.
              but not crash
            • have used dumps that came from 5.5.34 and restored and it wouldn't crash
            • run tests against 10.1.24 replicated copy and wouldn't crash

            If I can't reproduce the crash, I can't confirm a newer version fixes and upgrade accordingly. Based on the error on crash - can someone provide any guidance as to what area of the server I need to adjust or stress to reproduce?

            lpatrick Lori Patrick added a comment - - edited Hi. I have been trying to reproduce this error to no avail. I have tried: looping thru stored procedures involved from both windows and linux 10,000 times -> no crash running above concurrently -> no crash brought down buffer pool and was able to get innodb to throw the following: InnoDB: Warning: difficult to find free blocks in InnoDB: the buffer pool (339 search iterations)! InnoDB: 0 failed attempts to flush a page! Consider InnoDB: increasing the buffer pool size. InnoDB: It is also possible that in your Unix version InnoDB: fsync is very slow, or completely frozen inside InnoDB: the OS kernel. Then upgrading to a newer version InnoDB: of your operating system may help. Look at the InnoDB: number of fsyncs in diagnostic info below. InnoDB: Pending flushes (fsync) log: 0; buffer pool: 1 InnoDB: 523993 OS file reads, 2186607 OS file writes, 266724 OS fsyncs InnoDB: Starting InnoDB Monitor to print further InnoDB: diagnostics to the standard output. but not crash have used dumps that came from 5.5.34 and restored and it wouldn't crash run tests against 10.1.24 replicated copy and wouldn't crash If I can't reproduce the crash, I can't confirm a newer version fixes and upgrade accordingly. Based on the error on crash - can someone provide any guidance as to what area of the server I need to adjust or stress to reproduce?
            lpatrick Lori Patrick added a comment -

            This is issue fixed by MDEV-13936. It looks like I tripped into a known bug. I can reproduce my crash on demand at this point. This issue can be closed.

            lpatrick Lori Patrick added a comment - This is issue fixed by MDEV-13936 . It looks like I tripped into a known bug. I can reproduce my crash on demand at this point. This issue can be closed.

            Thanks for checking.

            elenst Elena Stepanova added a comment - Thanks for checking.

            People

              Unassigned Unassigned
              lpatrick Lori Patrick
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.