[MDEV-33349] Crash when a new node attempts to join the Galera cluster Created: 2024-02-01  Updated: 2024-02-05

Status: Open
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 11.2.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Martin Montes Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: galera

Attachments: PNG File Screenshot from 2024-02-05 11-07-25.png     File mariadb-galera-0-no-k8s-probes.log     File mariadb-galera-1-donor.log     File mariadb-galera-1-no-k8s-probes.log     File mariadb-galera-2-joiner.log     File mariadb-galera-2-no-k8s-probes.log     File mariadb-galera-2-recovered.log     File mariadb-galera-2.log    

 Description   

The Kubernetes operator for MariaDB is able to provision MariaDB clusters by creating containers one by one waiting until the `wsrep_ready` variable is enabled. This is to ensure that just one node is attempting to join the cluster at a given time.

When a new node joins the cluster, Galera performs an SST choosing an existing node as donor and transferring the state to the new node in order to initialize it. We've seen this process failing sometimes when bootstrapping the cluster or when a node goes down, so the unhealthy container gets restarted by Kubernetes and a new one is created, which implies that the SST is retried again. This situation is repeated until the container reaches a healthy state and the node is part of the cluster:

Here there are the configuration files for each node:

mariadb-galera-0

[mariadb]
bind-address=0.0.0.0
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
 
# Cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://mariadb-galera-0.mariadb-galera-internal.default.svc.cluster.local,mariadb-galera-1.mariadb-galera-internal.default.svc.cluster.local,mariadb-galera-2.mariadb-galera-internal.default.svc.cluster.local"
wsrep_cluster_name=mariadb-operator
wsrep_slave_threads=1
 
# Node configuration
wsrep_node_address="mariadb-galera-0.mariadb-galera-internal.default.svc.cluster.local"
wsrep_node_name="mariadb-galera-0"
wsrep_sst_method="mariabackup"
wsrep_sst_auth="<user>:<password>"

mariadb-galera-1

[mariadb]
bind-address=0.0.0.0
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
 
# Cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://mariadb-galera-0.mariadb-galera-internal.default.svc.cluster.local,mariadb-galera-1.mariadb-galera-internal.default.svc.cluster.local,mariadb-galera-2.mariadb-galera-internal.default.svc.cluster.local"
wsrep_cluster_name=mariadb-operator
wsrep_slave_threads=1
 
# Node configuration
wsrep_node_address="mariadb-galera-1.mariadb-galera-internal.default.svc.cluster.local"
wsrep_node_name="mariadb-galera-1"
wsrep_sst_method="mariabackup"
wsrep_sst_auth="<user>:<password>"

mariadb-galera-2

[mariadb]
bind-address=0.0.0.0
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
 
# Cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://mariadb-galera-0.mariadb-galera-internal.default.svc.cluster.local,mariadb-galera-1.mariadb-galera-internal.default.svc.cluster.local,mariadb-galera-2.mariadb-galera-internal.default.svc.cluster.local"
wsrep_cluster_name=mariadb-operator
wsrep_slave_threads=1
 
# Node configuration
wsrep_node_address="mariadb-galera-2.mariadb-galera-internal.default.svc.cluster.local"
wsrep_node_name="mariadb-galera-2"
wsrep_sst_method="mariabackup"
wsrep_sst_auth="<user>:<password>"

It is important to note that we are using DNS names in the cluster address, which get resolved to the IP of the containers, but every time a container is restarted by Kubernetes it gets a new IP assigned.

There are some log files attached to the current Jira showing the crash happening after a node went down and attempting to rejoin the cluster for a while. Also, after ~30m or so, it finally managed to join.

I have also tried to use rsync instead of mariabackup, but it didn't help.



 Comments   
Comment by Marko Mäkelä [ 2024-02-01 ]

I was a bit curious about the InnoDB part of this and initially thought that there was something strange. But that part looks fine after all.

In mariadb-galera-2-recovered.log we can see that wsrep_sst_method=mariabackup did invoke mariadb-backup --prepare, because InnoDB starts up with a dummy ib_logfile0:

mariadb-11.2.2

2024-01-31 10:29:35 0 [Note] InnoDB: End of log at LSN=1623012
2024-01-31 10:29:35 0 [Note] InnoDB: Resizing redo log from 12.016KiB to 96.000MiB; LSN=1623012

Starting with MDEV-14425, the normal log file size should be a multiple of 4096 bytes. Here, the file size seems to be 12288+16=12304 bytes. Starting with MDEV-27199, a log file is mandatory. Such dummy 12304-byte files are created at the end of mariadb-backup --prepare in order to pass a start LSN to the server.

Comment by Martin Montes [ 2024-02-02 ]

Thanks for taking a look Marko, this might be an issue related to Galera and not to InnoDB, let me know if you need more debug info if you think it is worth digging deeper on the InnoDB side.

This morning I've managed to reproduce the problem in MariaDB 10.6. I have attached the donor and joiner logs.

mariadb-galera-2-joiner.log
mariadb-galera-1-donor.log

Comment by Martin Montes [ 2024-02-05 ]

I've tried with a different test scenario:

  • Provision a Galera cluster in Kubernetes without liveness/readiness probes, meaning that Kubernetes will not interfere with the container (no restarts and no traffic stopped when the container is considered unhealthy). The only way for the container to stop is returning a non zero status code.
  • After provisioning, I've created some data with sysbench for warming up the database
  • Stopped the sysbench tests
  • Deleted the node marked as primary by MaxScale (mariadb-galera-2)
  • The failover is effectively performed to another node (mariadb-galera-0)
  • mariadb-galera-2 comes back online and requests and SST from mariadb-galera-1
  • The crash starts happening again:

WSREP_SST: [ERROR] previous SST script still running. (20240205 09:57:30.706)
2024-02-05  9:57:30 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_mariabackup --role 'joiner' --address 'mariadb-galera-2.mariadb-galera-internal.default.svc.cluster.local' --datadir '/var/lib/mysql/' --parent 1 --progress 0
	Read: '(null)'
2024-02-05  9:57:30 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address 'mariadb-galera-2.mariadb-galera-internal.default.svc.cluster.local' --datadir '/var/lib/mysql/' --parent 1 --progress 0: 114 (Operation already in progress)
2024-02-05  9:57:30 2 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
2024-02-05  9:57:30 2 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.
2024-02-05  9:57:30 2 [Note] WSREP: ReplicatorSMM::abort()
2024-02-05  9:57:30 2 [Note] WSREP: Closing send monitor...
2024-02-05  9:57:30 2 [Note] WSREP: Closed send monitor.
2024-02-05  9:57:30 2 [Note] WSREP: gcomm: terminating thread
2024-02-05  9:57:30 2 [Note] WSREP: gcomm: joining thread
2024-02-05  9:57:30 2 [Note] WSREP: gcomm: closing backend
2024-02-05  9:57:30 2 [Note] WSREP: view(view_id(NON_PRIM,ae347157-9c57,13) memb {
	f3b5e06d-a33c,0
} joined {
} left {
} partitioned {
	ae347157-9c57,0
	b1ca18b3-8a25,0
})
2024-02-05  9:57:30 2 [Note] WSREP: PC protocol downgrade 1 -> 0
2024-02-05  9:57:30 2 [Note] WSREP: view((empty))
2024-02-05  9:57:30 2 [Note] WSREP: gcomm: closed
2024-02-05  9:57:30 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2024-02-05  9:57:30 0 [Note] WSREP: Flow-control interval: [16, 16]
2024-02-05  9:57:30 0 [Note] WSREP: Received NON-PRIMARY.
2024-02-05  9:57:30 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 122675)
2024-02-05  9:57:30 0 [Note] WSREP: New SELF-LEAVE.
2024-02-05  9:57:30 0 [Note] WSREP: Flow-control interval: [0, 0]
2024-02-05  9:57:30 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2024-02-05  9:57:30 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 122675)
2024-02-05  9:57:30 0 [Note] WSREP: RECV thread exiting 0: Success
2024-02-05  9:57:30 2 [Note] WSREP: recv_thread() joined.
2024-02-05  9:57:30 2 [Note] WSREP: Closing replication queue.
2024-02-05  9:57:30 2 [Note] WSREP: Closing slave action queue.
2024-02-05  9:57:30 2 [Note] WSREP: mariadbd: Terminated.
240205  9:57:30 [ERROR] mysqld got signal 11 ;
Sorry, we probably made a mistake, and this is a bug.
 
Your assistance in bug reporting will enable us to fix this for the next release.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 11.2.2-MariaDB-1:11.2.2+maria~ubu2204 source revision: 929532a9426d085111c24c63de9c23cc54382259
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=3
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337017 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f8758000c68
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f877c25ac68 thread_stack 0x49000
Printing to addr2line failed
mariadbd(my_print_stacktrace+0x32)[0x55d207c2e032]
mariadbd(handle_fatal_signal+0x478)[0x55d207702158]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f877ea24520]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x178)[0x7f877ea0a898]
/usr/lib/galera/libgalera_smm.so(+0x157602)[0x7f877e4a6602]
/usr/lib/galera/libgalera_smm.so(+0x700e1)[0x7f877e3bf0e1]
/usr/lib/galera/libgalera_smm.so(+0x6cc94)[0x7f877e3bbc94]
/usr/lib/galera/libgalera_smm.so(+0x8b311)[0x7f877e3da311]
/usr/lib/galera/libgalera_smm.so(+0x604a0)[0x7f877e3af4a0]
/usr/lib/galera/libgalera_smm.so(+0x48261)[0x7f877e397261]
mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x55d207cecb02]
mariadbd(+0xd7ff31)[0x55d2079bff31]
mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x55d2079adcfb]
mariadbd(+0xcf24c6)[0x55d2079324c6]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f877ea76ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f877eb08850]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 2
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on,cset_narrowing=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: 
 
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    0                    bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            1048576              1048576              files     
Max locked memory         8388608              8388608              bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       126293               126293               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
 
Kernel version: Linux version 6.5.0-15-generic (buildd@bos03-amd64-040) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2

See node status in MaxScale after the crash:

See the full logs here:
mariadb-galera-0-no-k8s-probes.log mariadb-galera-1-no-k8s-probes.log mariadb-galera-2-no-k8s-probes.log

Comment by Martin Montes [ 2024-02-05 ]

I've tested the same scenario, but instead of providing DNS names in the Galera config file, resolving the DNS in the operator and providing the IPs:

mariadb-galera-0

[mariadb]
bind-address=0.0.0.0
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
 
# Cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.244.0.157,10.244.0.48,10.244.0.153"
wsrep_cluster_name=mariadb-operator
wsrep_slave_threads=1
 
# Node configuration
wsrep_node_address="10.244.0.157"
wsrep_node_name="mariadb-galera-0"
wsrep_sst_method="mariabackup"
wsrep_sst_auth="<user>:<password>"

mariadb-galera-1

[mariadb]
bind-address=0.0.0.0
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
 
# Cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.244.0.36,10.244.0.48,10.244.0.31"
wsrep_cluster_name=mariadb-operator
wsrep_slave_threads=1
 
# Node configuration
wsrep_node_address="10.244.0.48"
wsrep_node_name="mariadb-galera-1"
wsrep_sst_method="mariabackup"
wsrep_sst_auth="<user>:<password>"

mariadb-galera-2

[mariadb]
bind-address=0.0.0.0
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
 
# Cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.244.0.157,10.244.0.48,10.244.0.159"
wsrep_cluster_name=mariadb-operator
wsrep_slave_threads=1
 
# Node configuration
wsrep_node_address="10.244.0.159"
wsrep_node_name="mariadb-galera-2"
wsrep_sst_method="mariabackup"
wsrep_sst_auth="<user>:<password>"

After killing the primary multiple times, even with sysbench tests running, I didn't manage to reproduce the issue. So the issue has to do with DNS resolution either inside the MariaDB container or in the Galera lib.

Generated at Thu Feb 08 10:38:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.