Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.3.27, 10.4.18
Description
While running into memory allocation failures on server startup due to memory overcommitting being disallowed, e.g. via:
echo 2 > /proc/sys/vm/overcommit_memory
the MariaDB server does not terminate gracefully when not being able to allocate requested memory, and may even get stuck in the critical signal handerl completely.
The actual error log output can differ depending on where exactly the memory allocation failure happens.
E.g. on a system with MariaDB 10.3.27, CentOS 6, 64GB RAM, 2GB swap, overcommit_memory=2 and overcommit_ratio=50, the server can still start with innodb_buffer_pool_memory=29GB.
With 30GB buffer pool it fails with
2021-03-01 17:09:31 0 [Note] InnoDB: Completed initialization of buffer pool
|
2021-03-01 17:09:32 0x7fe94304b7e0 InnoDB: Assertion failure in file /home/jenkins/workspace/MariaDBE-Custom-RPM/label/rhel-6/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/innobase/os/os0thread.cc line 140
|
InnoDB: Failing assertion: ret == 0
|
InnoDB: We intentionally generate a memory trap.
|
With 31GB buffer pool and above this changes to:
2021-03-01 17:15:03 0 [Note] InnoDB: Initializing buffer pool, total size = 31G, instances = 8, chunk size = 128M
|
2021-03-01 17:15:04 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12
|
2021-03-01 17:15:04 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
|
2021-03-01 17:15:04 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
|
2021-03-01 17:15:04 0 [Note] InnoDB: Starting shutdown...
|
*** glibc detected *** /usr/sbin/mysqld: double free or corruption (out): 0x000055da3a4740a0 ***
|
The same setup on CentOS 7 fails a bit differently:
With 31GB pool:
Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 2021-03-01 18:18:05 0 [Note] InnoDB: Initializing buffer pool, total size = 3
|
Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 2021-03-01 18:18:05 0 [Note] InnoDB: Completed initialization of buffer pool
|
Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 210301 18:18:05 [ERROR] mysqld got signal 11 ;
|
With 32GB:
Mar 01 18:23:27 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:27 0 [Note] InnoDB: Initializing buffer pool, total size = 3
|
Mar 01 18:23:28 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:28 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12
|
Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [ERROR] InnoDB: Cannot allocate memory for the buffer p
|
Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [ERROR] InnoDB: Plugin initialization aborted with erro
|
Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [Note] InnoDB: Starting shutdown...
|
Mar 01 18:23:29 localhost.localdomain mysqld[2182]: *** Error in `/usr/sbin/mysqld': double free or corruption (out): 0x000056052
|
In the worst case though, the mysqld process just seems to get stuck in the signal handler, e.g. on RHEL 8 with 32GB
1 18:57:53 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:53 0 [Note] InnoDB: Initializing buffer pool, total size = 32G, instances = 8, chunk size = 128M
|
Mar 01 18:57:53 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:53 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [Note] InnoDB: Starting shutdown...
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: double free or corruption (out)
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 210301 18:57:54 [ERROR] mysqld got signal 6 ;
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: This could be because you hit a bug. It is also possible that this binary
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: or one of the libraries it was linked against is corrupt, improperly built,
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: or misconfigured. This error can also be caused by malfunctioning hardware.
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: We will try our best to scrape up some info that will hopefully help
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: diagnose the problem, but since we have already crashed,
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: something is definitely wrong and this may fail.
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Server version: 10.3.27-10-MariaDB-enterprise
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: key_buffer_size=134217728
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: read_buffer_size=131072
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: max_used_connections=0
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: max_threads=153
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: thread_count=0
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: It is possible that mysqld could use up to
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467449 K bytes of memory
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Hope that's ok; if not, decrease some variables in the equation.
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Thread pointer: 0x0
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Attempting backtrace. You can use the following information to find out
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: where mysqld died. If you see no messages after this, something went
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: terribly wrong...
|
Mar 01 18:57:54 localhost.localdomain mysqld[2744]: stack_bottom = 0x0 thread_stack 0x49000
|
After this the mysqld process is still running, and only stops half an hour later after exceeding both systemd startup and shutdown timeout of 900 seconds each:
Mar 01 19:12:53 localhost.localdomain systemd[1]: mariadb.service: start operation timed out. Terminating.
|
Mar 01 19:27:53 localhost.localdomain systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
|