[MDEV-29375] Galera server crashes after 10.3 > 10.4 upgrade Created: 2022-08-25  Updated: 2023-03-28  Resolved: 2022-10-12

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.36, 10.4.26
Fix Version/s: 10.3.37, 10.4.27

Type: Bug Priority: Blocker
Reporter: Ramesh Sivaraman Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: regression

Attachments: File gcache.tar.gz     File logs.tar.gz     File node1.err     File node2.err    

 Description   

Test case

Install  10.3.36 on vagrant boxes (2 nodes)
shutdown node2
remove 10.3.36 packages from node2
install10.4.26 packages after updating repo
   > server startup is failing after package installation 

Error info

2022-08-24 14:48:04 0 [Note] WSREP: Service thread queue flushed.
2022-08-24 14:48:04 0 [Note] WSREP: ####### Assign initial position for certification: 5c410a36-23a8-11ed-a44c-f6f37823dd10:3, protocol version: -1
2022-08-24 14:48:04 0 [ERROR] WSREP: Corrupt buffer header: addr: 0x7f722bd5b530, seqno: 7019267256999739392, size: 825111097, ctx: 0x559652a28678, flags: 14391. store: 46, type: 49
220824 14:48:04 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,

GDB stack

(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x000055906acb5508 in handle_fatal_signal ()
#2  <signal handler called>
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#4  0x00007f6b47065859 in __GI_abort () at abort.c:79
#5  0x00007f6b470d026e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f6b471fa298 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007f6b470d82fc in malloc_printerr (str=str@entry=0x7f6b471f84c1 "free(): invalid pointer") at malloc.c:5347
#7  0x00007f6b470d9b2c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173
#8  0x00007f6b4654069c in gcache::MemStore::discard (bh=0x7f6afffff528, this=0x55906e177620) at ./gcache/src/gcache_mem_store.hpp:136
#9  gcache::GCache::discard_buffer (this=0x55906e1774f0, bh=0x7f6afffff528, ptr=<optimized out>) at ./gcache/src/GCache_memops.cpp:18
#10 0x00007f6b46540cde in gcache::GCache::discard_tail (this=this@entry=0x55906e1774f0, seqno=seqno@entry=3)
    at ./gcache/src/GCache_memops.cpp:161
#11 0x00007f6b465265da in gcache::GCache::seqno_reset (this=this@entry=0x55906e1774f0, gtid=...) at ./gcache/src/GCache_seqno.cpp:31
#12 0x00007f6b463f5408 in galera::ReplicatorSMM::ReplicatorSMM (this=0x55906e177040, args=<optimized out>)
    at ./galerautils/src/gu_uuid.hpp:203
#13 0x00007f6b463c4f52 in galera_init (gh=0x55906e142ef0, args=0x7fff984bad20) at ./galera/src/wsrep_provider.cpp:48
#14 0x000055906b2b0cbc in wsrep::wsrep_provider_v26::wsrep_provider_v26(wsrep::server_state&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, wsrep::provider::services const&) ()
#15 0x000055906b2ada84 in wsrep::provider::make_provider(wsrep::server_state&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, wsrep::provider::services const&) ()
#16 0x000055906b298d43 in wsrep::server_state::load_provider(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, wsrep::provider::services const&) ()
#17 0x000055906af40db4 in wsrep_init() ()
#18 0x000055906af41416 in wsrep_init_startup(bool) ()
#19 0x000055906a9e2679 in ?? ()
#20 0x000055906a9e7666 in mysqld_main(int, char**) ()
#21 0x00007f6b47067083 in __libc_start_main (main=0x55906a9c2d30 <main>, argc=2, argv=0x7fff984bb908, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff984bb8f8) at ../csu/libc-start.c:308
#22 0x000055906a9db6be in _start ()
(gdb)



 Comments   
Comment by Ramesh Sivaraman [ 2022-08-25 ]

Did not see this issue in 10.4.25 > 10.4.26 rolling upgrade.

Comment by Jan Lindström (Inactive) [ 2022-09-14 ]

Workaround: Delete gcache file. Note that this forces SST.

Comment by Alexey [ 2022-10-09 ]

jplindst,
Codership's Galera 4.11 does not have this bug. This is what it shows when recovering provided galera.cache file:

2022-10-08 22:49:41 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (         0/1073741848 bytes) complete.
2022-10-08 22:49:44 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (1073741848/1073741848 bytes) complete.
2022-10-08 22:49:44 0 [Note] WSREP: Recovering GCache ring buffer: Recovery failed, need to do full reset.

This means that previous contents of the file is ignored
MariaDB's Galera version shows:

2022-10-09 11:46:07 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (         0/1073741848 bytes) complete.
2022-10-09 11:46:09 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (1073741848/1073741848 bytes) complete.
2022-10-09 11:46:09 0 [Note] WSREP: Recovering GCache ring buffer: didn't recover any events.

This means that the file is taken as is and it's structures were found valid. Since gcache buffer header format is different between 3.x and 4.x this inevitably leads to crash.

The diff between Codership's and MariaDB's gcache sources (only that part) is 3K lines, meaning that at least that part of Galera library shipped with MariaDB is terribly outdated. I suspect that is likewise the case with the rest of Galera library and suggest that MariaDB Galera repo is thoroughly updated.

Comment by Jan Lindström (Inactive) [ 2022-10-10 ]

Yurchenko are you comparing correct branches. In MariaDB you should use mariadb-4.x branch. Only real difference I could find on gcache sources is following:

diff --git a/gcache/src/gcache_rb_store.cpp b/gcache/src/gcache_rb_store.cpp
index ff9db4fb..bc68fa4e 100644
--- a/gcache/src/gcache_rb_store.cpp
+++ b/gcache/src/gcache_rb_store.cpp
@@ -1243,7 +1243,7 @@ namespace gcache
         size_t chain_count[] = { 0, 0, 0, 0 };
 
         chain_t chain(NONE);
-        const uint8_t* chain_start(start_);
+        const uint8_t* chain_start;
         size_t count;
 
         bool next(false);

This was done to silence compiler warning.

Comment by Jan Lindström (Inactive) [ 2022-10-11 ]

Problem was not reproducible with commit 1eac5b64

Comment by Jan Lindström (Inactive) [ 2022-10-11 ]

ramesh Can you test with latest MariaDB 4.x library is this still reproducable and if it is we would need gcache file and please use wsrep-debug=1 using debug builds.

Comment by Ramesh Sivaraman [ 2022-10-12 ]

jplindst Galera(using latest galera 4.x branch) debian package upgrade looks good.

Comment by Jan Lindström (Inactive) [ 2022-10-12 ]

Fixed on MariaDB Galera library 26.4.13

Comment by Vasyl Saienko [ 2023-03-28 ]

Hello Mariadb team,

Цe have faced same issue with 1:10.6.12+maria~ubu2004. The scenario is a bit different, we reboot nodes in cluster one by one. One of nodes failed to do IST. Attaching configs and logs from all nodes in archive. Could you please help to resolve it, thank you.

 [^logs.tar.gz] 
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - 2023-03-28  6:29:56 2 [ERROR] WSREP: Corrupt buffer header: addr: 0x7f617bfff518, seqno: 3543537071533338624, size: 8084
63664, ctx: 0x558bd59ab298, flags: 12337. store: 45, type: 48
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - 230328  6:29:56 [ERROR] mysqld got signal 6 ;
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - This could be because you hit a bug. It is also possible that this binary
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - or one of the libraries it was linked against is corrupt, improperly built,
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - or misconfigured. This error can also be caused by malfunctioning hardware.
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - To report this bug, see https://mariadb.com/kb/en/reporting-bugs
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - We will try our best to scrape up some info that will hopefully help
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - diagnose the problem, but since we have already crashed, 
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - something is definitely wrong and this may fail.
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - Server version: 10.6.12-MariaDB-1:10.6.12+maria~ubu2004 source revision: 4c79e15cc3716f69c044d4287ad2160da8101cdc
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - key_buffer_size=0
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - read_buffer_size=131072
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - max_used_connections=0
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - max_threads=8194
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - thread_count=3
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - It is possible that mysqld could use up to 
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18043994 K  bytes of memory
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - Hope that's ok; if not, decrease some variables in the equation.
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO -
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - Thread pointer: 0x7f6164000c58
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - Attempting backtrace. You can use the following information to find out
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - where mysqld died. If you see no messages after this, something went
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - terribly wrong...
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - stack_bottom = 0x7f61c139ad88 thread_stack 0x49000
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - Printing to addr2line failed
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - mysqld(my_print_stacktrace+0x32)[0x558bd39d90c2]
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - mysqld(handle_fatal_signal+0x485)[0x558bd34a0c55]
2023-03-28 06:29:56,842 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f61c3f70420]
2023-03-28 06:29:56,842 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f61c3a7400b]
2023-03-28 06:29:56,843 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f61c3a53859]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x3ebc7)[0x7f61c3572bc7]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x1cc3fe)[0x7f61c37003fe]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x1b12da)[0x7f61c36e52da]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x91ea0)[0x7f61c35c5ea0]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7d9f7)[0x7f61c35b19f7]
2023-03-28 06:29:56,846 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7e42f)[0x7f61c35b242f]
2023-03-28 06:29:56,846 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7ea7d)[0x7f61c35b2a7d]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0xb019b)[0x7f61c35e419b]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0xb0682)[0x7f61c35e4682]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7cef0)[0x7f61c35b0ef0]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x504a1)[0x7f61c35844a1]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x558bd3a763b2]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - mysqld(+0xcbe651)[0x558bd3771651]
2023-03-28 06:29:56,849 - OpenStack-Helm Mariadb - INFO - mysqld(_Z15start_wsrep_THDPv+0x26b)[0x558bd37603bb]
2023-03-28 06:29:56,850 - OpenStack-Helm Mariadb - INFO - mysqld(+0xc36766)[0x558bd36e9766]
2023-03-28 06:29:56,851 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f61c3f64609]
2023-03-28 06:29:56,852 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f61c3b50133]
2023-03-28 06:29:56,852 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,852 - OpenStack-Helm Mariadb - INFO - Trying to get some variables.
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Some pointers may be invalid and cause the dump to abort.
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Query (0x0): (null)
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Connection ID (thread ID): 2
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Status: NOT_KILLED
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
2023-03-28 06:29:56,856 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,856 - OpenStack-Helm Mariadb - INFO -

mysql@mariadb-server-0:/$ dpkg -l |grep -e mariadb -e  galera
ii  galera-4                   26.4.14-ubu2004                   amd64        Replication framework for transactional applications
ii  libdbd-mariadb-perl        1.11-3ubuntu2                     amd64        Perl5 database interface to the MariaDB/MySQL databases
ii  libmariadb3:amd64          1:10.6.12+maria~ubu2004           amd64        MariaDB database client library
ii  mariadb-backup             1:10.6.12+maria~ubu2004           amd64        Backup tool for MariaDB server
ii  mariadb-client-10.6        1:10.6.12+maria~ubu2004           amd64        MariaDB database client binaries
ii  mariadb-client-core-10.6   1:10.6.12+maria~ubu2004           amd64        MariaDB database core client binaries
ii  mariadb-common             1:10.6.12+maria~ubu2004           all          MariaDB common configuration files
ii  mariadb-server             1:10.6.12+maria~ubu2004           all          MariaDB database server (metapackage depending on the latest version)
ii  mariadb-server-10.6        1:10.6.12+maria~ubu2004           amd64        MariaDB database server binaries
ii  mariadb-server-core-10.6   1:10.6.12+maria~ubu2004           amd64        MariaDB database core server files

Generated at Thu Feb 08 10:08:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.