[MDEV-30237] Unable to bootstrap cluster from crashed last survivor - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.6.11
Fix Version/s: None
Component/s: Galera
Labels:
- crash
Environment:
Ubuntu 20.04

Description

Hello,
I am running 2 nodes MariaDB servers and a galera arbitrator node and today wanted to simulate crash so i powered off one of the nodes intranet-test1.
The DB was running fine on the other node intranet-test2 but when i started intranet-test1 again the working node crashed and all instances of MariaDB server were down.

I am sure the last survivor was intranet-test2 so i edited grastate.dat and changed safe_to_bootstrap: 1 and ran galera_new_cluster but this ended with error:

{{2022-12-15 18:05:21 0 [Note] WSREP: Loading provider /usr/lib/libgalera_smm.so initial position: a6d885cf-23bd-11ed-a7b5-fb2da02f3a3d:80532
2022-12-15 18:05:21 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2022-12-15 18:05:21 0 [Note] WSREP: wsrep_load(): Galera 26.4.13(rfe497aeb) by Codership Oy <info@codership.com> loaded successfully.
2022-12-15 18:05:21 0 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2022-12-15 18:05:21 0 [Note] WSREP: SSL cipher list set to 'AES128-SHA256'
2022-12-15 18:05:21 0 [Note] WSREP: Found saved state: a6d885cf-23bd-11ed-a7b5-fb2da02f3a3d:-1, safe_to_bootstrap: 1
2022-12-15 18:05:21 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: a6d885cf-23bd-11ed-a7b5-fb2da02f3a3d
Seqno: -1 - -1
Offset: -1
Synced: 0
2022-12-15 18:05:21 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: a6d885cf-23bd-11ed-a7b5-fb2da02f3a3d, offset: -1
2022-12-15 18:05:21 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2022-12-15 18:05:21 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2022-12-15 18:05:21 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 15762582740729856-15762582740729856
2022-12-15 18:05:21 0 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/33310984 bytes) complete.
2022-12-15 18:05:21 0 [Note] WSREP: Recovering GCache ring buffer: found 0/1 locked buffers
2022-12-15 18:05:21 0 [Note] WSREP: Recovering GCache ring buffer: free space: 100906744/134217728
2022-12-15 18:05:21 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (33310984/33310984 bytes) complete.
2022-12-15 18:05:21 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.2.55; base_port = 4567; cert.log_conflicts = ON; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0;
2022-12-15 18:05:21 0 [Note] WSREP: SSL cipher list set to 'AES128-SHA256'
2022-12-15 18:05:21 0 [Note] WSREP: Service thread queue flushed.
2022-12-15 18:05:21 0 [Note] WSREP: ####### Assign initial position for certification: a6d885cf-23bd-11ed-a7b5-fb2da02f3a3d:80532, protocol version: -1
munmap_chunk(): invalid pointer
221215 18:05:21 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.6.11-MariaDB-1:10.6.11+maria~ubu2004
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 336894 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
Printing to addr2line failed
/usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5612742148b2]
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x561273cd21a5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fec95407420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fec94f0b00b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fec94eea859]
/lib/x86_64-linux-gnu/libc.so.6(+0x8d26e)[0x7fec94f5526e]
/lib/x86_64-linux-gnu/libc.so.6(+0x952fc)[0x7fec94f5d2fc]
/lib/x86_64-linux-gnu/libc.so.6(+0x9554c)[0x7fec94f5d54c]
/usr/lib/libgalera_smm.so(+0x1cdaec)[0x7fec94abaaec]
/usr/lib/libgalera_smm.so(+0x1ce12e)[0x7fec94abb12e]
/usr/lib/libgalera_smm.so(+0x1b300a)[0x7fec94aa000a]
/usr/lib/libgalera_smm.so(+0x821c8)[0x7fec9496f1c8]
/usr/lib/libgalera_smm.so(+0x51c52)[0x7fec9493ec52]
/usr/sbin/mariadbd(ZN5wsrep18wsrep_provider_v26C1ERNS_12server_stateERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEESA_RKNS_8provider8servicesE+0x1ec)[0x5612742b26cc]
/usr/sbin/mariadbd(ZN5wsrep8provider13make_providerERNS_12server_stateERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEESA_RKNS0_8servicesE+0x54)[0x5612742af494]
/usr/sbin/mariadbd(ZN5wsrep12server_state13load_providerERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_RKNS_8provider8servicesE+0x1f3)[0x56127429a753]
/usr/sbin/mariadbd(_Z10wsrep_initv+0x193)[0x561273f8f643]
/usr/sbin/mariadbd(_Z18wsrep_init_startupb+0x16)[0x561273f8fcf6]
/usr/sbin/mariadbd(+0x6d5eb1)[0x5612739c1eb1]
/usr/sbin/mariadbd(_Z11mysqld_mainiPPc+0x3fd)[0x5612739c6bbd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fec94eec083]
/usr/sbin/mariadbd(_start+0x2e)[0x5612739bb91e]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 7803 7803 processes
Max open files 32768 32768 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 7803 7803 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g – %E

Kernel version: Linux version 5.4.0-125-generic (buildd@lcy02-amd64-083) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022}}

Why i can't bootstrap from this node?

Unable to bootstrap cluster from crashed last survivor

Details

Description

Attachments

Activity

People

Dates

Git Integration