Details

    Description

      안녕하세요.
      알 수 없는 원인으로 DB가 계속 kill 되는 issue가 있습니다.

      ==================================================================================================

      * 운영 서버 환경 (DB01, DB02 동일)
      OS : CentOS 7.8
      CPU : 20
      Memory : 30G

      ==================================================================================================

      * 장애 발생 시점의 운영 DB 구성 환경

      MMM

      DB01(MariaDB 10.5.15, slave) <---- DB02(MariaDB 10.5.13, master)

      ==================================================================================================

      * 장애 발생 history

      1. 11.24 09:00경 장애 최초 장애 발생 (DB02 서버)
      MMM에 의한 master <-> slave fail over 발생
      as-is : db01(slave), db02(master) / to-be : db01(master), db02(slave)

      ----------------------------------------------------------------------------------

      2. 11.25 09:00경 장애 발생 (DB02 서버)
      mariaDB 재기동으로 해결
      mariaDB 마이너 버전 업그레이드
      DB01 : 10.5.15 => 10.5.18 / DB02 : 10.5.13 => 10.5.18

      ----------------------------------------------------------------------------------

      3. 11.26 00:36경 장애발생 (DB02 서버)
      데이터 깨짐으로 crash 발생하여 리커버리 시도 계속 반복
      my.cnf 에 recovery 옵션 주어 기동 시도 - 실패
      mariaDB 재설치 후 기동 시도 - 실패
      mariaDB 초기화 후 DB01(master) dump 사용하여 DB02 재구성

      ==================================================================================================

      현재 재구성 후 문제는 해소되었으나, 원인 파악을 위해 테스트 서버에 동일하게 구성하여 core dump를 생성하고 gdb 툴로 디버깅 하였습니다.

      장애 발생 시 기록된 log와 core dump 디버깅 결과 파일 함께 첨부하도록 하겠습니다.

      감사합니다.

      Attachments

        Issue Links

          Activity

            The mariadb_error.7z contanis a file core_dump_gdb.txt that contains the stack trace of the crashing thread. This looks like a possible duplicate of MDEV-21098.

            Core dumps (such as the one in the archive) are useless without having a copy of the dynamic libraries and the server executable that created the core dump (all files listed by ldd /usr/sbin/mariadbd). In this particular case, a core dump would be useless also due to MDEV-10814.

            marko Marko Mäkelä added a comment - The mariadb_error.7z contanis a file core_dump_gdb.txt that contains the stack trace of the crashing thread. This looks like a possible duplicate of MDEV-21098 . Core dumps (such as the one in the archive) are useless without having a copy of the dynamic libraries and the server executable that created the core dump (all files listed by ldd /usr/sbin/mariadbd ). In this particular case, a core dump would be useless also due to MDEV-10814 .

            Does the database crash if you start the latest 10.6 server on the 10.5 data directory?

            marko Marko Mäkelä added a comment - Does the database crash if you start the latest 10.6 server on the 10.5 data directory?
            sucong 김수빈 added a comment -

            hi.

            We will upgrade from 10.5 to 10.6 version by referring to the answers you provided, and then monitor it.

            Regarding core dump, when executing ldd /usr/sbin/mariadbd, is it possible to analyze core dump by attaching the output library files (files with .so extension) together?

            check please.. thank you.

            sucong 김수빈 added a comment - hi. We will upgrade from 10.5 to 10.6 version by referring to the answers you provided, and then monitor it. Regarding core dump, when executing ldd /usr/sbin/mariadbd, is it possible to analyze core dump by attaching the output library files (files with .so extension) together? check please.. thank you.

            sucong, yes, generally, for analyzing a core dump, a copy of the code is needed. Not only the mariadbd executable (or its download location), but all dynamic libraries listed by ldd. Note that linux-vdso.so.1 is a virtual shared library that is provided by the Linux kernel. I haven’t even encountered a case where a mismatch in that library (due to a Linux kernel version mismatch) would have been an issue.

            However, for this particular case, I do not expect a normal core dump to be useful, because it would exclude a copy of the buffer pool and therefore also exclude a copy of the corrupted page that is causing the assertion to fail. To analyze this type of problems, I find it most convenient to attach a debugger to the server first (say, gdb -p $(pgrep mariadbd)), and then execute an SQL statement that would cause the crash, and then, once the signal was trapped by the debugger, issue some debugging commands. I see that https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ does not currently mention this.

            In any case, I think that it is a good idea to upgrade to MariaDB Server 10.6.10 or later. You may want to avoid 10.6.11 because of MDEV-29988.

            marko Marko Mäkelä added a comment - sucong , yes, generally, for analyzing a core dump, a copy of the code is needed. Not only the mariadbd executable (or its download location), but all dynamic libraries listed by ldd . Note that linux-vdso.so.1 is a virtual shared library that is provided by the Linux kernel. I haven’t even encountered a case where a mismatch in that library (due to a Linux kernel version mismatch) would have been an issue. However, for this particular case, I do not expect a normal core dump to be useful, because it would exclude a copy of the buffer pool and therefore also exclude a copy of the corrupted page that is causing the assertion to fail. To analyze this type of problems, I find it most convenient to attach a debugger to the server first (say, gdb -p $(pgrep mariadbd) ), and then execute an SQL statement that would cause the crash, and then, once the signal was trapped by the debugger, issue some debugging commands. I see that https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ does not currently mention this. In any case, I think that it is a good idea to upgrade to MariaDB Server 10.6.10 or later. You may want to avoid 10.6.11 because of MDEV-29988 .

            sucong, did the upgrade to MariaDB 10.6.10 or later help?

            marko Marko Mäkelä added a comment - sucong , did the upgrade to MariaDB 10.6.10 or later help?

            People

              marko Marko Mäkelä
              sucong 김수빈
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.