Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21707

Mariadb doesn't exit correctly

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.4.10, 10.4.12
    • N/A
    • None
    • Linux 4.19.102-gentoo x86_64 AMD EPYC 7451

    Description

      Attempt to stop mysqld with signal 15 fails for members of galera cluster.
      Configuration is: 3 identical servers running galera nodes. All have similar config (attached).

      When signal 15 is sent to mysqld process, mysql writes down /var/lib/mysql/grastate.dat filling in all the fields, for example:

      # GALERA saved state
      version: 2.1
      uuid:    0b58193e-****-****-****-b2ddbb52b5f6
      seqno:   37261
      safe_to_bootstrap: 1
      

      then declines any attempts to connect but still remains present in memory and not exiting:

      # ps aux|grep 'sbin/mysqld'
      mysql    50683  0.5  0.6 68765724 24362700 ?    Sl   13:57   0:24 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf
       
      # strace -p 50683|head -n 20
      strace: Process 50683 attached
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
      

      This can last for days. Mysqld refuses signal 15 and if killed finally with signal 9, it fails to recover from binary log, like:

      [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      

      This always ends with SST.

      In mentioned hung state, mysqld doesn't perform any IO operations, used memory size remains constant, seems that process runs some infinite loop, but CPU is also not used.

      This happens on cluster with gigabytes of data, but this also was found on newly installed cluster with no data except default database.

      Expected behavior was - mysqld exits on signal 15 in reasonable time with flushing cached data and closing files first.

      Attachments

        Issue Links

          Activity

            People

              janlindstrom Jan Lindström
              euglorg Eugene
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.