Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11595

MariaDB server cluster regularly stops with an error

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 10.1.20
    • Fix Version/s: N/A
    • Component/s: Server, wsrep
    • Labels:
    • Environment:
      Distributor ID: Ubuntu
      Description: Ubuntu 14.04.5 LTS
      Release: 14.04
      Codename: trusty

      Linux db03 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

      Description

      I run 3 node MariaDB 10.1 galera cluster
      On a regular basis one of the database server stops with the following stacktrace:

      Dec 18 06:49:01 db03 mysqld: 161218  6:49:01 [ERROR] mysqld got signal 11 ;
      Dec 18 06:49:01 db03 mysqld: This could be because you hit a bug. It is also possible that this binary
      Dec 18 06:49:01 db03 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
      Dec 18 06:49:01 db03 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: We will try our best to scrape up some info that will hopefully help
      Dec 18 06:49:01 db03 mysqld: diagnose the problem, but since we have already crashed, 
      Dec 18 06:49:01 db03 mysqld: something is definitely wrong and this may fail.
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: Server version: 10.1.20-MariaDB-1~trusty
      Dec 18 06:49:01 db03 mysqld: key_buffer_size=33554432
      Dec 18 06:49:01 db03 mysqld: read_buffer_size=2097152
      Dec 18 06:49:01 db03 mysqld: max_used_connections=18
      Dec 18 06:49:01 db03 mysqld: max_threads=502
      Dec 18 06:49:01 db03 mysqld: thread_count=3
      Dec 18 06:49:01 db03 mysqld: It is possible that mysqld could use up to 
      Dec 18 06:49:01 db03 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3127346 K  bytes of memory
      Dec 18 06:49:01 db03 mysqld: Hope that's ok; if not, decrease some variables in the equation.
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: Thread pointer: 0x0x7f1282107008
      Dec 18 06:49:01 db03 mysqld: Attempting backtrace. You can use the following information to find out
      Dec 18 06:49:01 db03 mysqld: where mysqld died. If you see no messages after this, something went
      Dec 18 06:49:01 db03 mysqld: terribly wrong...
      Dec 18 06:49:01 db03 mysqld: stack_bottom = 0x7f213f8611f0 thread_stack 0x48400
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f2142a44c2e]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7f2142567a95]
      Dec 18 06:49:01 db03 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f2140ab8330]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f2142629aad]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f2142629b41]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f214262c1b2]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x130)[0x7f21424cff10]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1314)[0x7f21423e2cd4]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x331)[0x7f21423eb2a1]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(+0x439ac9)[0x7f21423ebac9]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e2b)[0x7f21423edfcb]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f21423eedc9]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f21424b5efa]
      Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f21424b60d0]
      Dec 18 06:49:01 db03 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f2140ab0184]
      Dec 18 06:49:01 db03 mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f213ffcf37d]
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: Trying to get some variables.
      Dec 18 06:49:01 db03 mysqld: Some pointers may be invalid and cause the dump to abort.
      Dec 18 06:49:01 db03 mysqld: Query (0x7f127bc20020): is an invalid pointer
      Dec 18 06:49:01 db03 mysqld: Connection ID (thread ID): 349413
      Dec 18 06:49:01 db03 mysqld: Status: NOT_KILLED
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      Dec 18 06:49:01 db03 mysqld: information that should help you find out what is causing the crash.
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 06:49:01 db03 mysqld: We think the query pointer is invalid, but we will try to print it anyway. 
      Dec 18 06:49:01 db03 mysqld: Query: flush logs
      Dec 18 06:49:01 db03 mysqld: 
      Dec 18 07:13:58 db03 mysqld_safe: Number of processes running now: 0
      Dec 18 07:13:58 db03 mysqld_safe: WSREP: not restarting wsrep node automatically
      Dec 18 07:13:58 db03 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
      

      This does usually occur just after weekly cron jobs are run but not every time:
      I only have 2 standard jobs there:

      • apt-xapian-index
      • man-db

      CPUinfo:

      processor       : 47
      vendor_id       : GenuineIntel
      cpu family      : 6
      model           : 63
      model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      stepping        : 2
      microcode       : 0x1f
      cpu MHz         : 1200.000
      cache size      : 30720 KB
      physical id     : 1
      siblings        : 24
      core id         : 13
      cpu cores       : 12
      apicid          : 59
      initial apicid  : 59
      fpu             : yes
      fpu_exception   : yes
      cpuid level     : 15
      wp              : yes
      flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
      bogomips        : 5001.48
      clflush size    : 64
      cache_alignment : 64
      address sizes   : 46 bits physical, 48 bits virtual
      

      Meminfo:

      MemTotal:       65855904 kB
      MemFree:        50075004 kB
      Buffers:          289528 kB
      Cached:          9494524 kB
      SwapCached:         2400 kB
      Active:         10941028 kB
      Inactive:        3798096 kB
      Active(anon):    4951372 kB
      Inactive(anon):     3756 kB
      Active(file):    5989656 kB
      Inactive(file):  3794340 kB
      Unevictable:           0 kB
      Mlocked:               0 kB
      SwapTotal:      68358140 kB
      SwapFree:       68351796 kB
      Dirty:               136 kB
      Writeback:             0 kB
      AnonPages:       4956920 kB
      Mapped:            20340 kB
      Shmem:                56 kB
      Slab:             512752 kB
      SReclaimable:     414892 kB
      SUnreclaim:        97860 kB
      KernelStack:        5048 kB
      PageTables:        12212 kB
      NFS_Unstable:          0 kB
      Bounce:                0 kB
      WritebackTmp:          0 kB
      CommitLimit:    101286092 kB
      Committed_AS:   62324824 kB
      VmallocTotal:   34359738367 kB
      VmallocUsed:      412748 kB
      VmallocChunk:   34325775492 kB
      HardwareCorrupted:     0 kB
      AnonHugePages:   4866048 kB
      HugePages_Total:       0
      HugePages_Free:        0
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:       2048 kB
      DirectMap4k:      139304 kB
      DirectMap2M:     4984832 kB
      DirectMap1G:    63963136 kB
      

      my.cnf:

      [client]
      port            = 3306
      socket          = /var/run/mysqld/mysqld.sock
      [mysqld_safe]
      socket          = /var/run/mysqld/mysqld.sock
      nice            = 0
      [mysqld]
      user            = mysql
      pid-file        = /var/run/mysqld/mysqld.pid
      socket          = /var/run/mysqld/mysqld.sock
      port            = 3306
      basedir         = /usr
      datadir         = /var/lib/mysql
      tmpdir          = /tmp
      lc_messages_dir = /usr/share/mysql
      lc_messages     = en_US
      skip-external-locking
      bind-address            = 127.0.0.1
      max_connections         = 100
      connect_timeout         = 5
      wait_timeout            = 600
      max_allowed_packet      = 16M
      thread_cache_size       = 128
      sort_buffer_size        = 4M
      bulk_insert_buffer_size = 16M
      tmp_table_size          = 32M
      max_heap_table_size     = 32M
      myisam_recover_options = BACKUP
      key_buffer_size         = 128M
      table_open_cache        = 400
      myisam_sort_buffer_size = 512M
      concurrent_insert       = 2
      read_buffer_size        = 2M
      read_rnd_buffer_size    = 1M
      query_cache_limit               = 128K
      query_cache_size                = 64M
      log_warnings            = 2
      slow_query_log_file     = /var/log/mysql/mariadb-slow.log
      long_query_time = 10
      log_slow_verbosity      = query_plan
      log_bin                 = /var/log/mysql/mariadb-bin
      log_bin_index           = /var/log/mysql/mariadb-bin.index
      expire_logs_days        = 10
      max_binlog_size         = 100M
      default_storage_engine  = InnoDB
      innodb_buffer_pool_size = 256M
      innodb_log_buffer_size  = 8M
      innodb_file_per_table   = 1
      innodb_open_files       = 400
      innodb_io_capacity      = 400
      innodb_flush_method     = O_DIRECT
      [galera]
      [mysqldump]
      quick
      quote-names
      max_allowed_packet      = 16M
      [mysql]
      [isamchk]
      key_buffer              = 16M
      !includedir /etc/mysql/conf.d/
      

      /etc/mysql/conf.d/mysqld_safe_syslog.cnf:

      [mysqld_safe]
      skip_log_error
      syslog
      

      /etc/mysql/conf.d/extra1.cnf:

      [mysqld]
      default-storage-engine         = InnoDB
      key-buffer-size                = 32M
      myisam-recover                 = FORCE,BACKUP
      max-allowed-packet             = 16M
      max-connect-errors             = 1000000
      sysdate-is-now                 = 1
      innodb                         = FORCE
      tmp-table-size                 = 32M
      max-heap-table-size            = 32M
      query-cache-type               = 0  
      query-cache-size               = 0  
      max-connections                = 500
      thread-cache-size              = 50 
      open-files-limit               = 65535
      table-definition-cache         = 4096 
      table-open-cache               = 4096 
      innodb-flush-method            = O_DIRECT
      innodb-log-files-in-group      = 2
      innodb-log-file-size           = 512M
      innodb-flush-log-at-trx-commit = 2   
      innodb-file-per-table          = 1   
      innodb-buffer-pool-size        = 54G 
      log-error                      = /var/lib/mysql/mysql-error.log
      log-queries-not-using-indexes  = 1
      slow-query-log                 = 1
      slow-query-log-file            = /var/lib/mysql/mysql-slow.log
      max-binlog-size                = 10G
      max-relay-log-size             = 0 
      relay-log-space-limit          = 20G 
      tmpdir                         = /var/lib/mysql/tmp
      skip-name-resolve
      

      /etc/cron.weekly# cat /etc/mysql/conf.d/galera.cnf:

      [mysqld]
      query_cache_size=0
      binlog_format=ROW
      default-storage-engine=innodb
      innodb_autoinc_lock_mode=2
      query_cache_type=0
      bind-address=0.0.0.0
      server_id=103
      gtid_strict_mode=ON
      wsrep_gtid_domain_id=1
      gtid_domain_id=1
      wsrep_gtid_mode=ON
      wsrep_on=ON
      wsrep_provider=/usr/lib/galera/libgalera_smm.so
      wsrep_cluster_name="xxx_db"
      wsrep_cluster_address="gcomm://10.100.11.1,10.100.11.2,10.100.11.3"
      wsrep_sst_method=rsync
      wsrep_sst_auth=root:XXXXX
      wsrep_sync_wait=1
      wsrep_node_address="10.100.11.3"
      wsrep_node_name="db03"
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sachin.setiya.007 Sachin Setiya
              Reporter:
              belgarath Paul Ryszka
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: