[MDEV-30947] Galera server crashes during IST 1:10.6.12+maria~ubu2004 Created: 2023-03-28  Updated: 2023-05-09  Resolved: 2023-05-09

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.36, 10.4.26
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Vasyl Saienko Assignee: Alexey
Resolution: Incomplete Votes: 1
Labels: regression

Attachments: File logs.tar.gz    
Issue Links:
Relates
relates to MDEV-30988 Galera server crashes during IST 1:10... Open

 Description   

The bug is exactly as https://jira.mariadb.org/browse/MDEV-29375 but reproduced during powering off/powering on node with mariadb.
One of nodes failed to do IST. Attaching configs and logs from all nodes in archive. Could you please help to resolve it, thank you.

2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - 2023-03-28  6:29:56 2 [ERROR] WSREP: Corrupt buffer header: addr: 0x7f617bfff518, seqno: 3543537071533338624, size: 8084
63664, ctx: 0x558bd59ab298, flags: 12337. store: 45, type: 48
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - 230328  6:29:56 [ERROR] mysqld got signal 6 ;
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - This could be because you hit a bug. It is also possible that this binary
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - or one of the libraries it was linked against is corrupt, improperly built,
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - or misconfigured. This error can also be caused by malfunctioning hardware.
2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - To report this bug, see https://mariadb.com/kb/en/reporting-bugs
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - We will try our best to scrape up some info that will hopefully help
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - diagnose the problem, but since we have already crashed, 
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - something is definitely wrong and this may fail.
2023-03-28 06:29:56,838 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - Server version: 10.6.12-MariaDB-1:10.6.12+maria~ubu2004 source revision: 4c79e15cc3716f69c044d4287ad2160da8101cdc
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - key_buffer_size=0
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - read_buffer_size=131072
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - max_used_connections=0
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - max_threads=8194
2023-03-28 06:29:56,839 - OpenStack-Helm Mariadb - INFO - thread_count=3
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - It is possible that mysqld could use up to 
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18043994 K  bytes of memory
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - Hope that's ok; if not, decrease some variables in the equation.
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO -
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - Thread pointer: 0x7f6164000c58
2023-03-28 06:29:56,840 - OpenStack-Helm Mariadb - INFO - Attempting backtrace. You can use the following information to find out
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - where mysqld died. If you see no messages after this, something went
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - terribly wrong...
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - stack_bottom = 0x7f61c139ad88 thread_stack 0x49000
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - Printing to addr2line failed
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - mysqld(my_print_stacktrace+0x32)[0x558bd39d90c2]
2023-03-28 06:29:56,841 - OpenStack-Helm Mariadb - INFO - mysqld(handle_fatal_signal+0x485)[0x558bd34a0c55]
2023-03-28 06:29:56,842 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f61c3f70420]
2023-03-28 06:29:56,842 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f61c3a7400b]
2023-03-28 06:29:56,843 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f61c3a53859]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x3ebc7)[0x7f61c3572bc7]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x1cc3fe)[0x7f61c37003fe]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x1b12da)[0x7f61c36e52da]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x91ea0)[0x7f61c35c5ea0]
2023-03-28 06:29:56,844 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7d9f7)[0x7f61c35b19f7]
2023-03-28 06:29:56,846 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7e42f)[0x7f61c35b242f]
2023-03-28 06:29:56,846 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7ea7d)[0x7f61c35b2a7d]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0xb019b)[0x7f61c35e419b]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0xb0682)[0x7f61c35e4682]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x7cef0)[0x7f61c35b0ef0]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - /usr/lib/galera/libgalera_smm.so(+0x504a1)[0x7f61c35844a1]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x558bd3a763b2]
2023-03-28 06:29:56,847 - OpenStack-Helm Mariadb - INFO - mysqld(+0xcbe651)[0x558bd3771651]
2023-03-28 06:29:56,849 - OpenStack-Helm Mariadb - INFO - mysqld(_Z15start_wsrep_THDPv+0x26b)[0x558bd37603bb]
2023-03-28 06:29:56,850 - OpenStack-Helm Mariadb - INFO - mysqld(+0xc36766)[0x558bd36e9766]
2023-03-28 06:29:56,851 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f61c3f64609]
2023-03-28 06:29:56,852 - OpenStack-Helm Mariadb - INFO - /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f61c3b50133]
2023-03-28 06:29:56,852 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,852 - OpenStack-Helm Mariadb - INFO - Trying to get some variables.
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Some pointers may be invalid and cause the dump to abort.
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Query (0x0): (null)
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Connection ID (thread ID): 2
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Status: NOT_KILLED
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,853 - OpenStack-Helm Mariadb - INFO - Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
2023-03-28 06:29:56,856 - OpenStack-Helm Mariadb - INFO - 
2023-03-28 06:29:56,856 - OpenStack-Helm Mariadb - INFO -

mysql@mariadb-server-0:/$ dpkg -l |grep -e mariadb -e  galera
ii  galera-4                   26.4.14-ubu2004                   amd64        Replication framework for transactional applications
ii  libdbd-mariadb-perl        1.11-3ubuntu2                     amd64        Perl5 database interface to the MariaDB/MySQL databases
ii  libmariadb3:amd64          1:10.6.12+maria~ubu2004           amd64        MariaDB database client library
ii  mariadb-backup             1:10.6.12+maria~ubu2004           amd64        Backup tool for MariaDB server
ii  mariadb-client-10.6        1:10.6.12+maria~ubu2004           amd64        MariaDB database client binaries
ii  mariadb-client-core-10.6   1:10.6.12+maria~ubu2004           amd64        MariaDB database core client binaries
ii  mariadb-common             1:10.6.12+maria~ubu2004           all          MariaDB common configuration files
ii  mariadb-server             1:10.6.12+maria~ubu2004           all          MariaDB database server (metapackage depending on the latest version)
ii  mariadb-server-10.6        1:10.6.12+maria~ubu2004           amd64        MariaDB database server binaries
ii  mariadb-server-core-10.6   1:10.6.12+maria~ubu2004           amd64



 Comments   
Comment by Alexey [ 2023-03-29 ]

Hello, thanks for the logs. This is a very curious situation.
First we have:

2023-03-28 06:29:54,657 - OpenStack-Helm Mariadb - INFO - 2023-03-28  6:29:54 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 3543537071533338624-3543537071533338624

3543537071533338624 - is clearly bogus seqno. But that's what was read from GCache file.

Then we start receiving normal certification index preamble:

2023-03-28 06:29:56,836 - OpenStack-Helm Mariadb - INFO - 2023-03-28  6:29:56 2 [Note] WSREP: Prepared IST receiver for 0-134969, listening at: tcp://192.168.201.108:4568

and get the same bogus seqno:

2023-03-28 06:29:56,837 - OpenStack-Helm Mariadb - INFO - 2023-03-28  6:29:56 2 [ERROR] WSREP: Corrupt buffer header: addr: 0x7f617bfff518, seqno: 3543537071533338624, size: 808463664, ctx: 0x558bd59ab298, flags: 12337. store: 45, type: 48

as if we are reading the same stale data from GCache mmapped file- i.e. it was not overwritten by the data received from IST.

So while this requires Galera code inspection, it would be also instructive to know
1) if there are any custom OS kernel configuration on the node.
2) what file system is used for the data directory
3) immediately before this incident there were several other attempts to join this node to cluster and they all failed, but the error log for this node starts only from the most recent restart. Why previous join attempts failed? Are there any logs from then?

Kind regards,
Alex

Comment by Vasyl Saienko [ 2023-04-22 ]

Hello Alexey,

1) We are using pretty standard kernel config from Ubuntu Focal. The kernel version is

Linux mariadb-server-0 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

2) The data directory is ext4 filesystem.

3) We are running mariadb in kubernetes, the data directory is mounted from host as ext4 filesystem. The pod was restarted several times, and unfortunately I don't have logs from the previous restart attempts. Please note that we didn't saw this issue when running older version of mariadb/galera. So this looks like a regression in some component.

root@51b3aeef9f10:/# dpkg -l |grep -e galera -e mariadb
ii  galera-4                   26.4.11-focal                     amd64        Replication framework for transactional applications
ii  libdbd-mariadb-perl        1.11-3ubuntu2                     amd64        Perl5 database interface to the MariaDB/MySQL databases
ii  libmariadb3:amd64          1:10.6.7+maria~focal              amd64        MariaDB database client library
ii  mariadb-backup             1:10.6.7+maria~focal              amd64        Backup tool for MariaDB server
ii  mariadb-client-10.6        1:10.6.7+maria~focal              amd64        MariaDB database client binaries
ii  mariadb-client-core-10.6   1:10.6.7+maria~focal              amd64        MariaDB database core client binaries
ii  mariadb-common             1:10.6.7+maria~focal              all          MariaDB common configuration files
ii  mariadb-server             1:10.6.7+maria~focal              all          MariaDB database server (metapackage depending on the latest version)
ii  mariadb-server-10.6        1:10.6.7+maria~focal              amd64        MariaDB database server binaries
ii  mariadb-server-core-10.6   1:10.6.7+maria~focal              amd64        MariaDB database core server files

Generated at Thu Feb 08 10:20:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.