Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8370

mysqld got signal 11 with aggressive parallel replication

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.5
    • 10.1.6
    • OTHER
    • None

    Description

      Hi,

      I have a "mysqld got signal 11" in the logs when running aggressive parallel replication with high number of threads. When running with single threaded replication, I have no problem: I ran many times without problems with 5, 10, 20, 40, 80 and 120 thread; but when I run with 320, 240 and 160 threads, sometimes it works, and sometimes mysqld is killed. Stack trace below.

      Already reported similar problems in the following bugs for mysql_admin_table and execute_sqlcom_select:

      but I think this might be a different issue as tis one is on mysql_insert.

      Also, the number of Open Tables as shown by SHOW GLOBAL STATUS is much bigger with aggressive parallel replication than with single threaded replication (see image in attachment where the 1st run is with single threaded replication, and the 3 other runs are with aggressive parallel replication with number of threads of 120, 240 and 320 respectively).

      2015-06-21 23:46:45 140496568968960 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
      2015-06-21 23:46:45 140496568968960 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.000010' at position 929208383; GTID position '0-187225014-740885929', GTID event skip 1
      2015-06-21 23:52:21 140496568968960 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
      2015-06-21 23:52:21 140496568968960 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.000011' at position 3634315; GTID position '0-187225014-740980660', GTID event skip 205
      150621 23:52:26 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.1.5-MariaDB-log
      key_buffer_size=268435456
      read_buffer_size=4194304
      max_used_connections=7
      max_threads=3002
      thread_count=2
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15693505 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0x7fa644812008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fd2b194ac80 thread_stack 0x48000
      mysys/stacktrace.c:247(my_print_stacktrace)[0x7fd2eeda683b]
      sql/signal_handler.cc:153(handle_fatal_signal)[0x7fd2ee9408a5]
      /lib64/libpthread.so.0(+0xf710)[0x7fd2edf4d710]
      sql/sql_plist.h:181(I_P_List_iterator)[0x7fd2ee8f2ab3]
      mysys/lf_hash.c:125(lfind)[0x7fd2eedafee0]
      sql/table_cache.cc:1084(tdc_iterate(THD*, char (*)(void*, void*), void*, bool))[0x7fd2ee8f4093]
      sql/table_cache.cc:267(tc_add_table(THD*, TABLE*))[0x7fd2ee8f43b8]
      sql/sql_base.cc:2572(open_table(THD*, TABLE_LIST*, Open_table_context*))[0x7fd2ee787e82]
      sql/sql_base.cc:4016(open_and_process_table)[0x7fd2ee78bcc0]
      sql/sql_base.cc:5184(open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*))[0x7fd2ee78bf84]
      sql/sql_insert.cc:719(mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool))[0x7fd2ee7b871b]
      sql/sql_parse.cc:3866(mysql_execute_command(THD*))[0x7fd2ee7cd687]
      sql/sql_parse.cc:7173(mysql_parse)[0x7fd2ee7d374d]
      sql/log_event.cc:4274(Query_log_event::do_apply_event(rpl_group_info*, char const*, unsigned int))[0x7fd2eea0092c]
      sql/log_event.h:1348(Log_event::apply_event(rpl_group_info*))[0x7fd2ee760711]
      sql/slave.cc:3291(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*))[0x7fd2ee756b1b]
      sql/rpl_parallel.cc:49(rpt_handle_event)[0x7fd2ee8db6e2]
      perfschema/pfs.cc:1863(pfs_spawn_thread)[0x7fd2eec9435d]
      /lib64/libpthread.so.0(+0x79d1)[0x7fd2edf459d1]
      /lib64/libc.so.6(clone+0x6d)[0x7fd2ec42b86d]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7fa687a5580f): is an invalid pointer
      Connection ID (thread ID): 3719
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=off,index_merge_union=off,index_merge_sort_union=off,index_merge_intersection=off,index_merge_sort_intersection=off,engin
      e_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=o
      n,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_c
      ache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,exte
      nded_keys=on,exists_to_in=on
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      150621 23:52:36 mysqld_safe Number of processes running now: 0
      150621 23:52:36 mysqld_safe mysqld restarted

      JFG

      Attachments

        Issue Links

          Activity

            MDEV-8333 provides stack traces both for DELETE and SELECT, as far as I remember, it might fail on INSERT as well, depending on luck and build; so I don't think there is a difference here. I will assign it to svoj to confirm.

            elenst Elena Stepanova added a comment - MDEV-8333 provides stack traces both for DELETE and SELECT, as far as I remember, it might fail on INSERT as well, depending on luck and build; so I don't think there is a difference here. I will assign it to svoj to confirm.

            I was surprised by the Open Tables difference between a run without parallel replication and with different number of threads (maybe normal, just asking).

            Note: all those tests start by restoring a backup and running the same binlogs, so exactly the same workload run in all tests, hence the difference in Open Tables puzzling me.

            jgagne Jean-François Gagné added a comment - I was surprised by the Open Tables difference between a run without parallel replication and with different number of threads (maybe normal, just asking). Note: all those tests start by restoring a backup and running the same binlogs, so exactly the same workload run in all tests, hence the difference in Open Tables puzzling me.

            I suspect high number of tables should be expected with parallel replication, same as with high number of connecting clients.

            svoj Sergey Vojtovich added a comment - I suspect high number of tables should be expected with parallel replication, same as with high number of connecting clients.

            Tests in MDEV-8333 / MDEV-8339 are essentially concurrent. I brought the number of threads down to 2 to make debugging simpler, but generally the more threads, the easier to reproduce. Parallel replication apparently introduces the same concurrent pattern, only on the slave.

            elenst Elena Stepanova added a comment - Tests in MDEV-8333 / MDEV-8339 are essentially concurrent. I brought the number of threads down to 2 to make debugging simpler, but generally the more threads, the easier to reproduce. Parallel replication apparently introduces the same concurrent pattern, only on the slave.

            Fixed in scope of MDEV-8339.

            svoj Sergey Vojtovich added a comment - Fixed in scope of MDEV-8339 .

            People

              svoj Sergey Vojtovich
              jgagne Jean-François Gagné
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.