Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25019

memory allocation failures during startup cause server failure in different, confusing ways

    XMLWordPrintable

    Details

      Description

      While running into memory allocation failures on server startup due to memory overcommitting being disallowed, e.g. via:

      echo 2 > /proc/sys/vm/overcommit_memory

      the MariaDB server does not terminate gracefully when not being able to allocate requested memory, and may even get stuck in the critical signal handerl completely.

      The actual error log output can differ depending on where exactly the memory allocation failure happens.

      E.g. on a system with MariaDB 10.3.27, CentOS 6, 64GB RAM, 2GB swap, overcommit_memory=2 and overcommit_ratio=50, the server can still start with innodb_buffer_pool_memory=29GB.

      With 30GB buffer pool it fails with

      2021-03-01 17:09:31 0 [Note] InnoDB: Completed initialization of buffer pool
      2021-03-01 17:09:32 0x7fe94304b7e0  InnoDB: Assertion failure in file /home/jenkins/workspace/MariaDBE-Custom-RPM/label/rhel-6/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/innobase/os/os0thread.cc line 140
      InnoDB: Failing assertion: ret == 0
      InnoDB: We intentionally generate a memory trap.
      

      With 31GB buffer pool and above this changes to:

      2021-03-01 17:15:03 0 [Note] InnoDB: Initializing buffer pool, total size = 31G, instances = 8, chunk size = 128M
      2021-03-01 17:15:04 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12
      2021-03-01 17:15:04 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
      2021-03-01 17:15:04 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
      2021-03-01 17:15:04 0 [Note] InnoDB: Starting shutdown...
      *** glibc detected *** /usr/sbin/mysqld: double free or corruption (out): 0x000055da3a4740a0 ***
      

      The same setup on CentOS 7 fails a bit differently:

      With 31GB pool:

      Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 2021-03-01 18:18:05 0 [Note] InnoDB: Initializing buffer pool, total size = 3
      Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 2021-03-01 18:18:05 0 [Note] InnoDB: Completed initialization of buffer pool
      Mar 01 18:18:05 localhost.localdomain mysqld[2504]: 210301 18:18:05 [ERROR] mysqld got signal 11 ;
      

      With 32GB:

      Mar 01 18:23:27 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:27 0 [Note] InnoDB: Initializing buffer pool, total size = 3
      Mar 01 18:23:28 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:28 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12
      Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [ERROR] InnoDB: Cannot allocate memory for the buffer p
      Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [ERROR] InnoDB: Plugin initialization aborted with erro
      Mar 01 18:23:29 localhost.localdomain mysqld[2182]: 2021-03-01 18:23:29 0 [Note] InnoDB: Starting shutdown...
      Mar 01 18:23:29 localhost.localdomain mysqld[2182]: *** Error in `/usr/sbin/mysqld': double free or corruption (out): 0x000056052
      

      In the worst case though, the mysqld process just seems to get stuck in the signal handler, e.g. on RHEL 8 with 32GB

      1 18:57:53 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:53 0 [Note] InnoDB: Initializing buffer pool, total size = 32G, instances = 8, chunk size = 128M
      Mar 01 18:57:53 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:53 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 2021-03-01 18:57:54 0 [Note] InnoDB: Starting shutdown...
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: double free or corruption (out)
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: 210301 18:57:54 [ERROR] mysqld got signal 6 ;
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: This could be because you hit a bug. It is also possible that this binary
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: or one of the libraries it was linked against is corrupt, improperly built,
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: or misconfigured. This error can also be caused by malfunctioning hardware.
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: We will try our best to scrape up some info that will hopefully help
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: diagnose the problem, but since we have already crashed,
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: something is definitely wrong and this may fail.
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Server version: 10.3.27-10-MariaDB-enterprise
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: key_buffer_size=134217728
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: read_buffer_size=131072
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: max_used_connections=0
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: max_threads=153
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: thread_count=0
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: It is possible that mysqld could use up to
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467449 K  bytes of memory
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Hope that's ok; if not, decrease some variables in the equation.
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Thread pointer: 0x0
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: Attempting backtrace. You can use the following information to find out
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: where mysqld died. If you see no messages after this, something went
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: terribly wrong...
      Mar 01 18:57:54 localhost.localdomain mysqld[2744]: stack_bottom = 0x0 thread_stack 0x49000
      

      After this the mysqld process is still running, and only stops half an hour later after exceeding both systemd startup and shutdown timeout of 900 seconds each:

      Mar 01 19:12:53 localhost.localdomain systemd[1]: mariadb.service: start operation timed out. Terminating.
      Mar 01 19:27:53 localhost.localdomain systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
      

        Attachments

          Activity

            People

            Assignee:
            thiru Thirunarayanan Balathandayuthapani
            Reporter:
            hholzgra Hartmut Holzgraefe
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: