[MDEV-31110] Mariadb 10.11.2 crashes sporadically with Fatal glibc error: malloc.c:3617 (_mid_memalign): assertion failed: !p || chunk_is_mmapped (mem2chunk (p)) || ar_ptr == arena_for_chunk (mem2chunk (p)) Created: 2023-04-21  Updated: 2023-11-13  Resolved: 2023-11-13

Status: Closed
Project: MariaDB Server
Component/s: Platform SUSE
Affects Version/s: 10.11.2
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Artem Russakovskii Assignee: Sergei Golubchik
Resolution: Incomplete Votes: 0
Labels: crash
Environment:

OpenSUSE 15.4
glibc 2.37.9000.233.g16439f419b-lp154.3626.1



 Description   

Hi there,

After successfully using Mariadb for years, sometime after the last round of updates from 10.9 to 10.10 and 10.11, we're now experiencing almost daily crashes. I'm including the details below.

It's possible this is related to glibc and nor Mariadb, but I don't have enough expertise to determine that. Also, we updated glibc to the latest version, but the crashes still keep happening.

https://forums.factorio.com/viewtopic.php?f=7&p=583557 is suspiciously familiar, and one of the comments says a glibc fix was released, but I am not sure in which version, or if that's true at all.

Fatal glibc error: malloc.c:3617 (_mid_memalign): assertion failed: !p || chunk_is_mmapped (mem2chunk (p)) || ar_ptr == arena_for_chunk (mem2chunk (p))
230421  5:00:20 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.11.2-MariaDB-log source revision: cafba8761af55ae16cc69c9b53a341340a845b36
key_buffer_size=536870912
read_buffer_size=262144
max_used_connections=304
max_threads=2050
thread_count=195
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5301674 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f76c001acb8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f76f8968d28 thread_stack 0x49000
??:0(my_print_stacktrace)[0x55f5c6895bdd]
??:0(handle_fatal_signal)[0x55f5c6394065]
??:0(__restore_rt)[0x7f76f96570a0]
??:0(__pthread_kill_implementation)[0x7f76f96a7cec]
??:0(__GI_raise)[0x7f76f9656fe2]
??:0(__GI_abort)[0x7f76f963f34f]
??:0(_IO_peekc_locked.cold)[0x7f76f96400c9]
??:0(__libc_assert_fail)[0x7f76f964f183]
??:0(_mid_memalign.isra.0)[0x7f76f96b72a2]
??:0(std::unique_lock<std::mutex>::unlock())[0x55f5c669f5b9]
??:0(std::thread::thread<void (&)()>(void (&)()))[0x55f5c67dc011]
??:0(std::thread::thread<void (&)()>(void (&)()))[0x55f5c67ae679]
??:0(std::unique_lock<std::mutex>::unlock())[0x55f5c66c7076]
??:0(std::unique_lock<std::mutex>::unlock())[0x55f5c66f6aad]
??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55f5c664e840]
??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55f5c664eb7b]
??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55f5c664f8a0]
??:0(Sql_cmd_truncate_table::handler_truncate(THD*, TABLE_LIST*, bool))[0x55f5c628377a]
??:0(Sql_cmd_truncate_table::truncate_table(THD*, TABLE_LIST*))[0x55f5c6284360]
??:0(Sql_cmd_truncate_table::execute(THD*))[0x55f5c62845de]
??:0(mysql_execute_command(THD*, bool))[0x55f5c615bc41]
??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55f5c6162fae]
??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x55f5c6157fb5]
??:0(do_command(THD*, bool))[0x55f5c6156a67]
??:0(do_handle_one_connection(CONNECT*, bool))[0x55f5c6267a0f]
??:0(handle_one_connection)[0x55f5c6267da4]
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55f5c659697d]
??:0(start_thread)[0x7f76f96a5eb4]
??:0(__clone3)[0x7f76f972d1c8]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f76c000c7c0): truncate table wp_wfKnownFileList
 
Connection ID (thread ID): 4669297
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             128319               128319               processes
Max open files            1048576              1048576              files
Max locked memory         8388608              8388608              bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       128319               128319               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: |/bin/false
 
Kernel version: Linux version 6.0.10-x86_64-linode158 (maker@build.linode.com) (gcc (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP PREEMPT_DYNAMIC Thu Dec 1 13:16:43 EST 2022



 Comments   
Comment by Artem Russakovskii [ 2023-05-18 ]

Any updates? All our servers continue to crash every few days.

Comment by Sergei Golubchik [ 2023-05-27 ]

There's not enough information for us to fix it yet. Try to install debug symbols to get a proper stack trace and/or produce a core after a crash.

Comment by Artem Russakovskii [ 2023-06-01 ]

Sergei, thanks, got it.

I installed debug symbols and observed another crash, but it oddly contained almost no stack trace. Here it is:

Fatal glibc error: malloc.c:3617 (_mid_memalign): assertion failed: !p || chunk_is_mmapped (mem2chunk (p)) || ar_ptr == arena_for_chunk (mem2chunk (p))
230601  2:03:17 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.11.3-MariaDB-log source revision: 0bb31039f54bd6a0dc8f0fc7d40e6b58a51998b0
key_buffer_size=536870912
read_buffer_size=262144
max_used_connections=30
max_threads=2050
thread_count=19
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5301706 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f546c313328
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f614c0d5d28 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x3d)[0x560692c23a5d]
/usr/sbin/mysqld(handle_fatal_signal+0x575)[0x5606927198f5]

Any reason why you think the rest is missing compared to the original report?

I'm in the process of enabling cores and hope that I'll be able to provide more info when the next core is dumped.

Comment by Artem Russakovskii [ 2023-06-02 ]

Another crash this morning on one of the slaves. Here's the output. I also got a core file which I will attempt to process next and post the output here.

Fatal glibc error: malloc.c:3617 (_mid_memalign): assertion failed: !p || chunk_is_mmapped (mem2chunk (p)) || ar_ptr == arena_for_chunk (mem2chunk (p))
230602  2:00:08 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.11.3-MariaDB-log source revision: 0bb31039f54bd6a0dc8f0fc7d40e6b58a51998b0
key_buffer_size=536870912
read_buffer_size=262144
max_used_connections=59
max_threads=2050
thread_count=36
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5301706 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f273400c7b8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f34f3bfed28 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x3d)[0x561e039dea5d]
/usr/sbin/mysqld(handle_fatal_signal+0x575)[0x561e034d48f5]
/lib64/libc.so.6(+0x570a0)[0x7f34f48570a0]
/lib64/libc.so.6(+0xa7cec)[0x7f34f48a7cec]
/lib64/libc.so.6(raise+0x14)[0x7f34f4856fe2]
/lib64/libc.so.6(abort+0xd5)[0x7f34f483f34f]
/lib64/libc.so.6(+0x400c9)[0x7f34f48400c9]
/lib64/libc.so.6(+0x4f183)[0x7f34f484f183]
/lib64/libc.so.6(+0xb72a2)[0x7f34f48b72a2]
/usr/sbin/mysqld(+0xd13256)[0x561e037ea256]
/usr/sbin/mysqld(+0xe48509)[0x561e0391f509]
2023-06-02  2:02:20 577241 [Note] InnoDB: Number of transaction pools: 2
/usr/sbin/mysqld(+0xe48b72)[0x561e0391fb72]
/usr/sbin/mysqld(+0xe48c48)[0x561e0391fc48]
/usr/sbin/mysqld(+0xe349be)[0x561e0390b9be]
/usr/sbin/mysqld(+0xe350b3)[0x561e0390c0b3]
/usr/sbin/mysqld(+0xe26aee)[0x561e038fdaee]
/usr/sbin/mysqld(+0xcaa2e6)[0x561e037812e6]
/usr/sbin/mysqld(+0xcb6e5b)[0x561e0378de5b]
/usr/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcijP11st_mem_rootP4ListI6StringE+0x4a)[0x561e034daa1a]
/usr/sbin/mysqld(_Z21open_table_from_shareP3THDP11TABLE_SHAREPK25st_mysql_const_lex_stringjjjP5TABLEbP4ListI6StringE+0xaed)[0x561e03375e0d]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP18Open_table_context+0xb8e)[0x561e0322f5ce]
/usr/sbin/mysqld(_Z11open_tablesP3THDRK14DDL_options_stPP10TABLE_LISTPjjP19Prelocking_strategy+0xa27)[0x561e032326e7]
/usr/sbin/mysqld(_Z29mysqld_show_create_get_fieldsP3THDP10TABLE_LISTP4ListI4ItemEP6String+0x1c0)[0x561e03312fd0]
/usr/sbin/mysqld(_Z18mysqld_show_createP3THDP10TABLE_LIST+0x1c3)[0x561e03313a43]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x3281)[0x561e032992c1]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x1fe)[0x561e0329e4be]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjb+0x1635)[0x561e03293bc5]
/usr/sbin/mysqld(_Z10do_commandP3THDb+0x127)[0x561e03291fd7]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECTb+0x3ff)[0x561e033a4bdf]
include/os0file.inl:232(fil_node_t::read_page0())[0x561e033a4f74]
fil/fil0fil.cc:381(fil_node_open_file_low(fil_node_t*))[0x561e036e012d]
/lib64/libc.so.6(+0xa5eb4)[0x7f34f48a5eb4]
/lib64/libc.so.6(+0x12d1c8)[0x7f34f492d1c8]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f27340282f0): show create table `wp_commentmeta`
 
Connection ID (thread ID): 575245
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             257143               257143               processes
Max open files            1048576              1048576              files
Max locked memory         8388608              8388608              bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       257143               257143               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: /tmp/cores/core.%e.%p.%h.%t
 
Kernel version: Linux version 6.0.10-x86_64-linode158 (maker@build.linode.com) (gcc (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP PREEMPT_DYNAMIC Thu Dec 1 13:16:43 EST 2022

Comment by Artem Russakovskii [ 2023-06-03 ]

I have a sneaking suspicion that the issue is with glibc or the way mariadb interacts with glibc. The issues started roughly when glibc was updated on the system, and I think the newest version may be buggy. You see, we're using OpenSUSE, and at some point I saw a new repo called gcc next https://download.opensuse.org/repositories/devel:/gcc:/next/ which installed a newer glibc and a bunch of other libs (2.31 -> currently 2.37.9000.286.ge275690332).

At some point, the gcc next repo's maintainers removed the OpenSUSE 15.4 repo, and the updates stopped. Perhaps that glibc version is buggy and is no longer getting updated, so we're stuck with a bug.

So, I just downgraded glibc and related libs back to 2.31, which is coming from https://download.opensuse.org/distribution/leap/15.4/repo/oss/. I'm really hoping I'm right and the crashes will stop.

Comment by Artem Russakovskii [ 2023-06-07 ]

Sergei: I generated a full all-threads gdb dump and shared it with your s***@mariadb.org email via Dropbox. Please let me know if you find anything good inside.

Comment by Artem Russakovskii [ 2023-06-20 ]

Well, ever since I downgraded glibc 2 weeks ago, the crashes stopped, so either the newer versions of glibc have a bug or Mariadb crashes because of some changes in glibc that aren't necessarily bugs, and if these changes go live widely in a stable release, these crashes may start popping up more and more for people.

Comment by Sergei Golubchik [ 2023-10-13 ]

The file appears to be already deleted.

I have glibc 2.37 on my laptop and haven't seen those crashes. May be it takes some special kind of load to trigger them.
Anyway, without any way to repeat it there isn't much we can do, unfortunately

Generated at Thu Feb 08 10:21:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.