[MDEV-11595] MariaDB server cluster regularly stops with an error Created: 2016-12-18  Updated: 2018-03-08  Resolved: 2018-03-08

Status: Closed
Project: MariaDB Server
Component/s: Server, wsrep
Affects Version/s: 10.1.20
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Paul Ryszka Assignee: Sachin Setiya (Inactive)
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

Linux db03 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Relates
relates to MDEV-9510 Segmentation fault in binlog thread c... Closed

 Description   

I run 3 node MariaDB 10.1 galera cluster
On a regular basis one of the database server stops with the following stacktrace:

Dec 18 06:49:01 db03 mysqld: 161218  6:49:01 [ERROR] mysqld got signal 11 ;
Dec 18 06:49:01 db03 mysqld: This could be because you hit a bug. It is also possible that this binary
Dec 18 06:49:01 db03 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Dec 18 06:49:01 db03 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: We will try our best to scrape up some info that will hopefully help
Dec 18 06:49:01 db03 mysqld: diagnose the problem, but since we have already crashed, 
Dec 18 06:49:01 db03 mysqld: something is definitely wrong and this may fail.
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: Server version: 10.1.20-MariaDB-1~trusty
Dec 18 06:49:01 db03 mysqld: key_buffer_size=33554432
Dec 18 06:49:01 db03 mysqld: read_buffer_size=2097152
Dec 18 06:49:01 db03 mysqld: max_used_connections=18
Dec 18 06:49:01 db03 mysqld: max_threads=502
Dec 18 06:49:01 db03 mysqld: thread_count=3
Dec 18 06:49:01 db03 mysqld: It is possible that mysqld could use up to 
Dec 18 06:49:01 db03 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3127346 K  bytes of memory
Dec 18 06:49:01 db03 mysqld: Hope that's ok; if not, decrease some variables in the equation.
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: Thread pointer: 0x0x7f1282107008
Dec 18 06:49:01 db03 mysqld: Attempting backtrace. You can use the following information to find out
Dec 18 06:49:01 db03 mysqld: where mysqld died. If you see no messages after this, something went
Dec 18 06:49:01 db03 mysqld: terribly wrong...
Dec 18 06:49:01 db03 mysqld: stack_bottom = 0x7f213f8611f0 thread_stack 0x48400
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f2142a44c2e]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7f2142567a95]
Dec 18 06:49:01 db03 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f2140ab8330]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f2142629aad]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f2142629b41]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f214262c1b2]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x130)[0x7f21424cff10]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1314)[0x7f21423e2cd4]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x331)[0x7f21423eb2a1]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(+0x439ac9)[0x7f21423ebac9]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e2b)[0x7f21423edfcb]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f21423eedc9]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f21424b5efa]
Dec 18 06:49:01 db03 mysqld: /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f21424b60d0]
Dec 18 06:49:01 db03 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f2140ab0184]
Dec 18 06:49:01 db03 mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f213ffcf37d]
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: Trying to get some variables.
Dec 18 06:49:01 db03 mysqld: Some pointers may be invalid and cause the dump to abort.
Dec 18 06:49:01 db03 mysqld: Query (0x7f127bc20020): is an invalid pointer
Dec 18 06:49:01 db03 mysqld: Connection ID (thread ID): 349413
Dec 18 06:49:01 db03 mysqld: Status: NOT_KILLED
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Dec 18 06:49:01 db03 mysqld: information that should help you find out what is causing the crash.
Dec 18 06:49:01 db03 mysqld: 
Dec 18 06:49:01 db03 mysqld: We think the query pointer is invalid, but we will try to print it anyway. 
Dec 18 06:49:01 db03 mysqld: Query: flush logs
Dec 18 06:49:01 db03 mysqld: 
Dec 18 07:13:58 db03 mysqld_safe: Number of processes running now: 0
Dec 18 07:13:58 db03 mysqld_safe: WSREP: not restarting wsrep node automatically
Dec 18 07:13:58 db03 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

This does usually occur just after weekly cron jobs are run but not every time:
I only have 2 standard jobs there:

  • apt-xapian-index
  • man-db

CPUinfo:

processor       : 47
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping        : 2
microcode       : 0x1f
cpu MHz         : 1200.000
cache size      : 30720 KB
physical id     : 1
siblings        : 24
core id         : 13
cpu cores       : 12
apicid          : 59
initial apicid  : 59
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips        : 5001.48
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual

Meminfo:

MemTotal:       65855904 kB
MemFree:        50075004 kB
Buffers:          289528 kB
Cached:          9494524 kB
SwapCached:         2400 kB
Active:         10941028 kB
Inactive:        3798096 kB
Active(anon):    4951372 kB
Inactive(anon):     3756 kB
Active(file):    5989656 kB
Inactive(file):  3794340 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      68358140 kB
SwapFree:       68351796 kB
Dirty:               136 kB
Writeback:             0 kB
AnonPages:       4956920 kB
Mapped:            20340 kB
Shmem:                56 kB
Slab:             512752 kB
SReclaimable:     414892 kB
SUnreclaim:        97860 kB
KernelStack:        5048 kB
PageTables:        12212 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    101286092 kB
Committed_AS:   62324824 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      412748 kB
VmallocChunk:   34325775492 kB
HardwareCorrupted:     0 kB
AnonHugePages:   4866048 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      139304 kB
DirectMap2M:     4984832 kB
DirectMap1G:    63963136 kB

my.cnf:

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0
[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages     = en_US
skip-external-locking
bind-address            = 127.0.0.1
max_connections         = 100
connect_timeout         = 5
wait_timeout            = 600
max_allowed_packet      = 16M
thread_cache_size       = 128
sort_buffer_size        = 4M
bulk_insert_buffer_size = 16M
tmp_table_size          = 32M
max_heap_table_size     = 32M
myisam_recover_options = BACKUP
key_buffer_size         = 128M
table_open_cache        = 400
myisam_sort_buffer_size = 512M
concurrent_insert       = 2
read_buffer_size        = 2M
read_rnd_buffer_size    = 1M
query_cache_limit               = 128K
query_cache_size                = 64M
log_warnings            = 2
slow_query_log_file     = /var/log/mysql/mariadb-slow.log
long_query_time = 10
log_slow_verbosity      = query_plan
log_bin                 = /var/log/mysql/mariadb-bin
log_bin_index           = /var/log/mysql/mariadb-bin.index
expire_logs_days        = 10
max_binlog_size         = 100M
default_storage_engine  = InnoDB
innodb_buffer_pool_size = 256M
innodb_log_buffer_size  = 8M
innodb_file_per_table   = 1
innodb_open_files       = 400
innodb_io_capacity      = 400
innodb_flush_method     = O_DIRECT
[galera]
[mysqldump]
quick
quote-names
max_allowed_packet      = 16M
[mysql]
[isamchk]
key_buffer              = 16M
!includedir /etc/mysql/conf.d/

/etc/mysql/conf.d/mysqld_safe_syslog.cnf:

[mysqld_safe]
skip_log_error
syslog

/etc/mysql/conf.d/extra1.cnf:

[mysqld]
default-storage-engine         = InnoDB
key-buffer-size                = 32M
myisam-recover                 = FORCE,BACKUP
max-allowed-packet             = 16M
max-connect-errors             = 1000000
sysdate-is-now                 = 1
innodb                         = FORCE
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0  
query-cache-size               = 0  
max-connections                = 500
thread-cache-size              = 50 
open-files-limit               = 65535
table-definition-cache         = 4096 
table-open-cache               = 4096 
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 512M
innodb-flush-log-at-trx-commit = 2   
innodb-file-per-table          = 1   
innodb-buffer-pool-size        = 54G 
log-error                      = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log                 = 1
slow-query-log-file            = /var/lib/mysql/mysql-slow.log
max-binlog-size                = 10G
max-relay-log-size             = 0 
relay-log-space-limit          = 20G 
tmpdir                         = /var/lib/mysql/tmp
skip-name-resolve

/etc/cron.weekly# cat /etc/mysql/conf.d/galera.cnf:

[mysqld]
query_cache_size=0
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=0
bind-address=0.0.0.0
server_id=103
gtid_strict_mode=ON
wsrep_gtid_domain_id=1
gtid_domain_id=1
wsrep_gtid_mode=ON
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="xxx_db"
wsrep_cluster_address="gcomm://10.100.11.1,10.100.11.2,10.100.11.3"
wsrep_sst_method=rsync
wsrep_sst_auth=root:XXXXX
wsrep_sync_wait=1
wsrep_node_address="10.100.11.3"
wsrep_node_name="db03"



 Comments   
Comment by Daniel Black [ 2018-01-22 ]

The backtrace here looks like MDEV-9510 in 10.1.30, 10.2.12. Can you ensure you have at least one of those versions and this problem will hopefully be solved.

Generated at Thu Feb 08 07:51:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.