Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18729

Same server crash in different nodes

    XMLWordPrintable

Details

    Description

      We are running a clustered environment, with 44 MariaDB 10.2.10 nodes. Since the last two months we are facing some issues with unexpected restarts of random instances. It happens on a regular basis (one or two times a day), but not under the same load, nor the same moment of the day. Our last update was from MariaDB version 10.2.7, which had no problems.

      Note that our databases are mostly formed by MyISAM tables.

      Actually we can't reproduce the problem, but we have identified the same backtrace in all the crashes (different nodes):

      Server version: 10.2.10-MariaDB-log
      key_buffer_size=25769803776
      read_buffer_size=131072
      max_used_connections=210
      max_threads=2050
      thread_count=92
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 29670037 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f8a3c165c88
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f9016f96d30 thread_stack 0x49000
      *** buffer overflow detected ***: /usr/sbin/mysqld terminated
      ======= Backtrace: =========
      /lib64/libc.so.6(__fortify_fail+0x37)[0x7f96ad6919e7]
      /lib64/libc.so.6(+0x115b62)[0x7f96ad68fb62]
      /lib64/libc.so.6(+0x117947)[0x7f96ad691947]
      /usr/sbin/mysqld(my_addr_resolve+0x48)[0x55b96d33c128]
      /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55b96d325bc2]
      /usr/sbin/mysqld(handle_fatal_signal+0x30d)[0x55b96cda2b6d]
      /lib64/libpthread.so.0(+0xf5d0)[0x7f96af2dc5d0]
      /usr/sbin/mysqld(_ZN10Item_ident5printEP6String15enum_query_type+0x50)[0x55b96cdb84d0]
      /usr/sbin/mysqld(+0x544295)[0x55b96cc7e295]
      /usr/sbin/mysqld(+0x53211a)[0x55b96cc6c11a]
      /usr/sbin/mysqld(_Z14get_all_tablesP3THDP10TABLE_LISTP4Item+0x7ba)[0x55b96cc7f39a]
      /usr/sbin/mysqld(_Z24get_schema_tables_resultP4JOIN23enum_schema_table_state+0x266)[0x55b96cc80916]
      /usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x7cd)[0x55b96cc675dd]
      /usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x55b96cc67a63]
      /usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x11a)[0x55b96cc67bba]
      /usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x254)[0x55b96cc68714]
      /usr/sbin/mysqld(+0x415783)[0x55b96cb4f783]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x6688)[0x55b96cc18418]
      /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x2de)[0x55b96cc1afde]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x202d)[0x55b96cc1df7d]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x149)[0x55b96cc1eb79]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1aa)[0x55b96cce2b7a]
      /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55b96cce2c9d]
      /lib64/libpthread.so.0(+0x7dd5)[0x7f96af2d4dd5]
      /lib64/libc.so.6(clone+0x6d)[0x7f96ad677ead]
      

      The thread pointers are different, but the mysql function calls are the same every time.

      The my.cnf file is the same in all the instances:

      [mysqld]
      datadir                        = /var/lib/data/mysql/3301
      default_storage_engine         = MyISAM
      default_tmp_storage_engine     = MyISAM
      expire_logs_days               = 7
      extra_port                     = 3391
      key_buffer_size                = 24G
      key_cache_file_hash_size       = 8192
      key_cache_segments             = 16
      host_cache_size                = 256
      lock_wait_timeout              = 1800
      log_error                      = /var/log/mysql/3301-error.log
      log_warnings                   = 1
      long_query_time                = 10
      max_allowed_packet             = 128M
      max_connections                = 2048
      max_heap_table_size            = 128M
      max_statement_time             = 1800
      net_read_timeout               = 600
      net_write_timeout              = 600
      open_files_limit               = 250000
      port                           = 3301
      pid_file                       = /var/run/mysql/3301.pid
      query_cache_size               = 0
      server_id                      = 1
      skip_name_resolve              = 1
      slow_query_log                 = ON
      slow_query_log_file            = /var/log/mysql/3301-slow.log
      server_id                      = 1
      socket                         = /var/run/mysql/3301.sock
      sql_mode                       = "STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
      table_definition_cache         = 32768
      table_open_cache               = 32768
      tmp_table_size                 = 128M
      tmpdir                         = /var/lib/data/tmp/mysql/3301
      tmp_disk_table_size            = 32G
      

      Hardware info of each node (all are configured the same):

       
      Architecture:          x86_64
      CPU op-mode(s):        32-bit, 64-bit
      Byte Order:            Little Endian
      CPU(s):                56
      On-line CPU(s) list:   0-55
      Thread(s) per core:    2
      Core(s) per socket:    14
      Socket(s):             2
      NUMA node(s):          2
      Vendor ID:             GenuineIntel
      CPU family:            6
      Model:                 79
      Model name:            Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
      Stepping:              1
      CPU MHz:               2600.000
      CPU max MHz:           2600.0000
      CPU min MHz:           1200.0000
      BogoMIPS:              5188.04
      Virtualization:        VT-x
      L1d cache:             32K
      L1i cache:             32K
      L2 cache:              256K
      L3 cache:              35840K
      NUMA node0 CPU(s):     0-13,28-41
      NUMA node1 CPU(s):     14-27,42-55
      Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
      
      

      MemTotal:       263835372 kB
      MemFree:         3513080 kB
      MemAvailable:   168224948 kB
      Buffers:             148 kB
      Cached:         166370332 kB
      SwapCached:       215452 kB
      Active:         164600840 kB
      Inactive:       89645376 kB
      Active(anon):   83715024 kB
      Inactive(anon):  7981272 kB
      Active(file):   80885816 kB
      Inactive(file): 81664104 kB
      Unevictable:          32 kB
      Mlocked:              32 kB
      SwapTotal:       8388604 kB
      SwapFree:        3715836 kB
      Dirty:                92 kB
      Writeback:             0 kB
      AnonPages:      87662636 kB
      Mapped:           112884 kB
      Shmem:           3820560 kB
      Slab:            3517440 kB
      SReclaimable:    3046128 kB
      SUnreclaim:       471312 kB
      KernelStack:       30560 kB
      PageTables:       220692 kB
      NFS_Unstable:          0 kB
      Bounce:                0 kB
      WritebackTmp:          0 kB
      CommitLimit:    140306288 kB
      Committed_AS:   97590032 kB
      VmallocTotal:   34359738367 kB
      VmallocUsed:      723408 kB
      VmallocChunk:   34224732156 kB
      HardwareCorrupted:     0 kB
      AnonHugePages:  21215232 kB
      CmaTotal:              0 kB
      CmaFree:               0 kB
      HugePages_Total:       0
      HugePages_Free:        0
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:       2048 kB
      DirectMap4k:      377340 kB
      DirectMap2M:    14170112 kB
      DirectMap1G:    255852544 kB
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            kurome Brais Chao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.