[MDEV-27122] MariaDB 10.4.15 + Galera - buffer overflow detected crash Created: 2021-11-25  Updated: 2023-02-06  Resolved: 2023-02-06

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.15
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: David Pivonka Assignee: Julius Goryavsky
Resolution: Incomplete Votes: 1
Labels: crash
Environment:

On-premise server with Ubuntu 20.04 and 10.4.15-MariaDB-1:10.4.15+maria~focal.


Attachments: Text File mariadb-galera-crash.txt     File mysqld.err.2022-03-17-00-00-01.gz    

 Description   

Our MariaDB with Galera just crashed once on one of three nodes. After the crash, the node automatically restarted and joined the cluster without any issues.

2021-11-18  8:08:22 0 [Note] WSREP: Created page /vol/data/mysql/gcache.page.000000 of size 134217728 bytes
2021-11-18  8:08:25 0 [ERROR] WSREP: Corrupt buffer header: addr: 0x7eed3c1148c0, seqno: 816452028672646535, size: 2332593895, ctx: 0x4a74b7c4e451547, flags: 31383. store: 65, type: 126
211118  8:08:25 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.4.15-MariaDB-1:10.4.15+maria~focal
key_buffer_size=134217728
read_buffer_size=1048576
max_used_connections=1265
max_threads=5129
thread_count=1022
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 26518309 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
*** buffer overflow detected ***: terminated
2021-11-18  8:17:30 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: ab2876f2-a340-11eb-bd2f-b6d12d17fb57:673606716
2021-11-18  8:17:30 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2021-11-18  8:17:30 0 [Note] WSREP: wsrep_load(): Galera 26.4.5(rb3764ab6) by Codership Oy <info@codership.com> loaded successfully.
2021-11-18  8:17:30 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2021-11-18  8:17:30 0 [Note] WSREP: Found saved state: ab2876f2-a340-11eb-bd2f-b6d12d17fb57:-1, safe_to_bootstrap: 0
2021-11-18  8:17:30 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: ab2876f2-a340-11eb-bd2f-b6d12d17fb57
Seqno: -1 - -1
Offset: -1
Synced: 0



 Comments   
Comment by David Pivonka [ 2022-03-22 ]

Hello, same thing happened today, there is a log file with mentioned error which started on line that starts with "2022-03-16 7:55:43"
mysqld.err.2022-03-17-00-00-01.gz

Comment by Gaurav Tomar [ 2022-04-12 ]

We are also facing the same issue on the 10.5.15.10 EE version on Ubuntu 20.04.

Steps to reproduce the issue on 10.5.15.10:

1. Setup a Galera cluster of 3 nodes
2. comment the wsrep_node_address and restart the MariaDB daemon on one node.
3. uncomment the wsrep_node_address and again restart the MariaDB daemon.

Error

2022-04-12 12:55:14 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 145, STRv: 3
2022-04-12 12:55:14 2 [Note] WSREP: IST receiver addr using tcp://prd-galeragandalf011.phonepe.nm5:4568
2022-04-12 12:55:14 2 [Note] WSREP: Prepared IST receiver for 0-145, listening at: tcp://10.22.27.170:4568
2022-04-12 12:55:14 0 [Note] WSREP: Member 2.0 (prd-galeragandalf011) requested state transfer from 'prd-galeragandalf013'. Selected 0.0 (prd-galeragandalf013)(SYNCED) as donor.
2022-04-12 12:55:14 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 145)
2022-04-12 12:55:14 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2022-04-12 12:55:14 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
2022-04-12 12:55:14 2 [ERROR] WSREP: Corrupt buffer header: addr: 0x7f2001758518, seqno: 3185219421952815104, size: 845242925, ctx: 0x561083182888, flags: 11577. store: 49, type: 49
220412 12:55:14 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.5.15-10-MariaDB-enterprise-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=0
max_threads=10002
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61839401 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f1ff0000c58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f240c10ed98 thread_stack 0x49000
Printing to addr2line failed
/usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5610814e5f82]
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x561080f1fe05]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f240e83c3c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f240e34003b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f240e31f859]
/usr/lib/libgalera_smm.so(+0x3fb0f)[0x7f240c662b0f]
/usr/lib/libgalera_smm.so(+0x1ceece)[0x7f240c7f1ece]
/usr/lib/libgalera_smm.so(+0x1b47ca)[0x7f240c7d77ca]
/usr/lib/libgalera_smm.so(+0x93be0)[0x7f240c6b6be0]
/usr/lib/libgalera_smm.so(+0x7f6e7)[0x7f240c6a26e7]
/usr/lib/libgalera_smm.so(+0x8011f)[0x7f240c6a311f]
/usr/lib/libgalera_smm.so(+0x8076d)[0x7f240c6a376d]
/usr/lib/libgalera_smm.so(+0xb2d5b)[0x7f240c6d5d5b]
/usr/lib/libgalera_smm.so(+0xb3242)[0x7f240c6d6242]
/usr/lib/libgalera_smm.so(+0x7ebe0)[0x7f240c6a1be0]
/usr/lib/libgalera_smm.so(+0x51f81)[0x7f240c674f81]
/usr/sbin/mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x561081584012]
/usr/sbin/mariadbd(+0xc86ff7)[0x56108120bff7]
/usr/sbin/mariadbd(_Z15start_wsrep_THDPv+0x283)[0x5610811fb783]
/usr/sbin/mariadbd(+0xbfd97f)[0x56108118297f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f240e830609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f240e41c163]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 2
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,
firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache
=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,con
dition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway.
Query:
 
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             2061693              2061693              processes
Max open files            500000               500000               files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       2061693              2061693              signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: |/usr/share/apport/apport %p %s %c %d %P %E

Comment by Jan Lindström (Inactive) [ 2023-01-02 ]

davidpivonka Can you try with more recent version of MariaDB and galera library. If you can still reproduce we would need full error log, instructions to reproduce and proper stack trace.

Generated at Thu Feb 08 09:50:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.