[MDEV-25019] memory allocation failures during startup cause server failure in different, confusing ways - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.3.27, 10.4.18
Fix Version/s: 10.2.38, 10.3.29, 10.4.19
Component/s: Storage Engine - InnoDB
Labels:
- not-10.5

Description

While running into memory allocation failures on server startup due to memory overcommitting being disallowed, e.g. via:

echo 2 > /proc/sys/vm/overcommit_memory

the MariaDB server does not terminate gracefully when not being able to allocate requested memory, and may even get stuck in the critical signal handerl completely.

The actual error log output can differ depending on where exactly the memory allocation failure happens.

E.g. on a system with MariaDB 10.3.27, CentOS 6, 64GB RAM, 2GB swap, overcommit_memory=2 and overcommit_ratio=50, the server can still start with innodb_buffer_pool_memory=29GB.

With 30GB buffer pool it fails with

2021-03-01 17:09:31 0 [Note] InnoDB: Completed initialization of buffer pool

2021-03-01 17:09:32 0x7fe94304b7e0  InnoDB: Assertion failure in file /home/jenkins/workspace/MariaDBE-Custom-RPM/label/rhel-6/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/innobase/os/os0thread.cc line 140

InnoDB: Failing assertion: ret == 0

InnoDB: We intentionally generate a memory trap.

With 31GB buffer pool and above this changes to:

2021-03-01 17:15:03 0 [Note] InnoDB: Initializing buffer pool, total size = 31G, instances = 8, chunk size = 128M

2021-03-01 17:15:04 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12

2021-03-01 17:15:04 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool

2021-03-01 17:15:04 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error

2021-03-01 17:15:04 0 [Note] InnoDB: Starting shutdown...

*** glibc detected *** /usr/sbin/mysqld: double free or corruption (out): 0x000055da3a4740a0 ***

The same setup on CentOS 7 fails a bit differently:

With 31GB pool:

Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 2021-03-01 18:18:05 0 [Note] InnoDB: Initializing buffer pool, total size = 3

Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 2021-03-01 18:18:05 0 [Note] InnoDB: Completed initialization of buffer pool

Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 210301 18:18:05 [ERROR] mysqld got signal 11 ;

With 32GB:

Mar 01 18:23:27 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:27 0 [Note] InnoDB: Initializing buffer pool, total size = 3

Mar 01 18:23:28 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:28 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12

Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [ERROR] InnoDB: Cannot allocate memory for the buffer p

Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [ERROR] InnoDB: Plugin initialization aborted with erro

Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [Note] InnoDB: Starting shutdown...

Mar 01 18:23:29 localhost.localdomain mysqld[2182]: *** Error in `/usr/sbin/mysqld': double free or corruption (out): 0x000056052

In the worst case though, the mysqld process just seems to get stuck in the signal handler, e.g. on RHEL 8 with 32GB

1 18:57:53 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:53 0 [Note] InnoDB: Initializing buffer pool, total size = 32G, instances = 8, chunk size = 128M

Mar 01 18:57:53 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:53 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [Note] InnoDB: Starting shutdown...

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: double free or corruption (out)

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 210301 18:57:54 [ERROR] mysqld got signal 6 ;

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: This could be because you hit a bug. It is also possible that this binary

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: or one of the libraries it was linked against is corrupt, improperly built,

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: or misconfigured. This error can also be caused by malfunctioning hardware.

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: We will try our best to scrape up some info that will hopefully help

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: diagnose the problem, but since we have already crashed,

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: something is definitely wrong and this may fail.

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Server version: 10.3.27-10-MariaDB-enterprise

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: key_buffer_size=134217728

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: read_buffer_size=131072

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: max_used_connections=0

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: max_threads=153

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: thread_count=0

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: It is possible that mysqld could use up to

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467449 K  bytes of memory

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Hope that's ok; if not, decrease some variables in the equation.

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Thread pointer: 0x0

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Attempting backtrace. You can use the following information to find out

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: where mysqld died. If you see no messages after this, something went

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: terribly wrong...

Mar 01 18:57:54 localhost.localdomain mysqld[2744]: stack_bottom = 0x0 thread_stack 0x49000

After this the mysqld process is still running, and only stops half an hour later after exceeding both systemd startup and shutdown timeout of 900 seconds each:

Mar 01 19:12:53 localhost.localdomain systemd[1]: mariadb.service: start operation timed out. Terminating.

Mar 01 19:27:53 localhost.localdomain systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.

Attachments

Activity

People

Assignee:: Thirunarayanan Balathandayuthapani

Reporter:: Hartmut Holzgraefe

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2021-03-01 19:34

Updated:: 2021-06-24 08:00

Resolved:: 2021-04-07 11:39

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.