[MDEV-28322] Galera node crash Created: 2022-04-15  Updated: 2022-05-17  Resolved: 2022-05-17

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.6.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Rick Tuk Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: crash, galera
Environment:

Ubuntu 20.04 LTS



 Description   

We are running a 2 node + arbitrator galera cluster. We are using haproxy in front of the cluster to allow splitting reads and writes over different ports
node 1 is dedicated write node, with node 2 being backup write.
both nodes are available as read nodes.

node 2 crashed with the following logging:

Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: This could be because you hit a bug. It is also possible that this binary
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: or one of the libraries it was linked against is corrupt, improperly built,
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: or misconfigured. This error can also be caused by malfunctioning hardware.
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: We will try our best to scrape up some info that will hopefully help
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: diagnose the problem, but since we have already crashed,
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: something is definitely wrong and this may fail.
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: Server version: 10.6.5-MariaDB-1:10.6.5+maria~focal-log
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: key_buffer_size=131072
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: read_buffer_size=131072
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: max_used_connections=37
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: max_threads=258
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: thread_count=47
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: It is possible that mysqld could use up to
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 568228 K  bytes of memory
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: Hope that's ok; if not, decrease some variables in the equation.
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: Thread pointer: 0x7fb180000c58
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: Attempting backtrace. You can use the following information to find out
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: where mysqld died. If you see no messages after this, something went
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: terribly wrong...
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: stack_bottom = 0x7fb2d01dcdc8 thread_stack 0x49000
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: Printing to addr2line failed
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x562d30809442]
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x562d302bc6a5]
Apr 15 04:02:44 node02.mariadb01 mariadbd[2360093]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7fb33a5603c0]
Apr 15 04:02:44 node02.mariadb01 mariadbd[2360093]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fb33a06403b]
Apr 15 04:02:45 node02.mariadb01 mariadbd[2360093]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fb33a043859]
Apr 15 04:02:45 node02.mariadb01 mariadbd[2360093]: /lib/x86_64-linux-gnu/libc.so.6(+0x22729)[0x7fb33a043729]
Apr 15 04:02:46 node02.mariadb01 mariadbd[2360093]: /lib/x86_64-linux-gnu/libc.so.6(+0x34006)[0x7fb33a055006]

The node needs a full resync to be able to rejoin the cluster



 Comments   
Comment by Daniel Black [ 2022-04-19 ]

rtuk, thanks for the bug report. I'd like a few more details to try to identify this crash more completely.

Was a core dump saved on this crash (maybe in apport if configured in /proc/sys/kernel/core_pattern)? If so can you use install mariadb debuginfo pckages and use apport-retrace --gdb to get a full backtrace?

Was there a SQL query in your logs? If so can you include it, potentially with the SHOW CREATE TABLE for the table affected?

Are you using any non-default galera settings?

Comment by Rick Tuk [ 2022-04-19 ]

@danblack, unfortunately I do not have a core dump, should this happen again I will make sure to look for one before restoring the node.
No query was logged, I did find an additional line in the logging that I missed when creating this report:

Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: mariadbd: /home/buildbot/buildbot/build/mariadb-10.6.5/wsrep-lib/include/wsrep/client_state.hpp:679: int wsrep::client_state::total_order_bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed.
Apr 15 04:02:43 node02.mariadb01 mariadbd[2360093]: 220415  4:02:43 [ERROR] mysqld got signal 6 ;

I don't believe I have non-default galera settings, to be sure these are the settings I am using:

[galera]
wsrep_on = ON
wsrep_cluster_name =mariadb01
wsrep_provider = /usr/lib/galera/libgalera_smm.so
wsrep_cluster_address = "gcomm://node01.mariadb01,node02.mariadb01,arbitrator.mariadb01"
wsrep_sst_method = mariabackup
wsrep_sst_auth = mariabackup:<password>
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
bind_address = 10.96.37.12
 
wsrep_slave_threads = 8
innodb_flush_log_at_trx_commit = 0
 
wsrep_provider_options = "gcs.fc_limit=40;gcs.fc_factor=0.8;gcache.size=1G;socket.ssl=1;socket.ssl_ca=/etc/mysql/ssl/root.rsa.crt;socket.ssl_cert=/etc/mysql/ssl/server.rsa.crt;socket.ssl_key=/etc/mysql/ssl/server.rsa.key;socket.ssl_cipher=TLS-AES-256-GCM-SHA384:TLS-CHACHA20-POLY1305-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305;"
 
[sst]
encrypt = 4
tca = /etc/mysql/ssl/root.rsa.crt
tcert = /etc/mysql/ssl/server.rsa.crt
tkey = /etc/mysql/ssl/server.rsa.key
 
[mysqld]
skip_name_resolve
skip_external_locking
 
proxy_protocol_networks = 10.96.36.4,10.96.36.5,::1,localhost
 
character_set_server = utf8mb4
collation_server     = utf8mb4_unicode_ci
init_connect         = 'SET NAMES utf8mb4'
 
bind_address = 10.96.37.12
server_id    = 1
 
aria_log_dir_path   = /var/log/mysql
 
lower_case_table_names = 1
 
general_log     = 0
long_query_time = 2
slow_query_log  = 1
log_queries_not_using_indexes = 1
min_examined_row_limit        = 100
 
alter_algorithm = COPY
transaction_isolation = READ-COMMITTED
 
innodb_flush_method            = O_DIRECT
innodb_file_per_table          = 1
innodb_data_home_dir           = /var/lib/mysql
innodb_data_file_path          = ibdata1:10M:autoextend
innodb_log_group_home_dir      = /var/lib/mysql
innodb_buffer_pool_size        = 3G
innodb_buffer_pool_instances   = 3
innodb_log_file_size           = 1G
innodb_log_files_in_group      = 2
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout       = 30
 
performance_schema  = OFF
tmp_table_size      = 64M
max_heap_table_size = 64M
query_cache_size    = 0
thread_cache_size   = 256
wait_timeout        = 300
interactive_timeout = 300
 
table_open_cache            = 65535
table_open_cache_instances  = 8
 
table_definition_cache      = 33167
 
max_allowed_packet      = 64M
bulk_insert_buffer_size = 64M
 
max_connections = 256
 
log_bin             = /var/log/mysql/mysql-bin
log_bin_index       = /var/log/mysql/mariadb-bin.index
binlog_format       = row
expire_logs_days    = 7
#sync_binlog        = 1
max_binlog_size     = 1G
 
#relay_log           = /var/log/mysql/relay-bin
#relay_log_index     = /var/log/mysql/relay-bin.index
#relay_log_info_file = /var/log/mysql/relay-bin.info
#log_slave_updates
#read_only
 
key_buffer_size         = 128K
myisam_sort_buffer_size = 4M
 
ssl-ca   = /etc/mysql/ssl/root.rsa.crt
ssl-cert = /etc/mysql/ssl/server.rsa.crt
ssl-key  = /etc/mysql/ssl/server.rsa.key
tls_version = TLSv1.2,TLSv1.3
 
default_time_zone = +00:00

Comment by Daniel Black [ 2022-04-19 ]

This is the same assertion as MDEV-26803, so 10.6.7 might be the fix for you.

Comment by Rick Tuk [ 2022-04-19 ]

Thanks, we will try upgrading.

Generated at Thu Feb 08 09:59:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.