[MDEV-30316] mariadb server crash Created: 2022-12-29  Updated: 2023-02-27  Resolved: 2023-02-27

Status: Closed
Project: MariaDB Server
Component/s: Replication, Server
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: 김수빈 Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: crash, replication

Attachments: File mariadb_error.7z    
Issue Links:
Duplicate
duplicates MDEV-21098 Crash in rec_get_offsets_func() due t... Closed

 Description   

안녕하세요.
알 수 없는 원인으로 DB가 계속 kill 되는 issue가 있습니다.

==================================================================================================

* 운영 서버 환경 (DB01, DB02 동일)
OS : CentOS 7.8
CPU : 20
Memory : 30G

==================================================================================================

* 장애 발생 시점의 운영 DB 구성 환경

MMM

DB01(MariaDB 10.5.15, slave) <---- DB02(MariaDB 10.5.13, master)

==================================================================================================

* 장애 발생 history

1. 11.24 09:00경 장애 최초 장애 발생 (DB02 서버)
MMM에 의한 master <-> slave fail over 발생
as-is : db01(slave), db02(master) / to-be : db01(master), db02(slave)

----------------------------------------------------------------------------------

2. 11.25 09:00경 장애 발생 (DB02 서버)
mariaDB 재기동으로 해결
mariaDB 마이너 버전 업그레이드
DB01 : 10.5.15 => 10.5.18 / DB02 : 10.5.13 => 10.5.18

----------------------------------------------------------------------------------

3. 11.26 00:36경 장애발생 (DB02 서버)
데이터 깨짐으로 crash 발생하여 리커버리 시도 계속 반복
my.cnf 에 recovery 옵션 주어 기동 시도 - 실패
mariaDB 재설치 후 기동 시도 - 실패
mariaDB 초기화 후 DB01(master) dump 사용하여 DB02 재구성

==================================================================================================

현재 재구성 후 문제는 해소되었으나, 원인 파악을 위해 테스트 서버에 동일하게 구성하여 core dump를 생성하고 gdb 툴로 디버깅 하였습니다.

장애 발생 시 기록된 log와 core dump 디버깅 결과 파일 함께 첨부하도록 하겠습니다.

감사합니다.



 Comments   
Comment by Marko Mäkelä [ 2022-12-29 ]

The mariadb_error.7z contanis a file core_dump_gdb.txt that contains the stack trace of the crashing thread. This looks like a possible duplicate of MDEV-21098.

Core dumps (such as the one in the archive) are useless without having a copy of the dynamic libraries and the server executable that created the core dump (all files listed by ldd /usr/sbin/mariadbd). In this particular case, a core dump would be useless also due to MDEV-10814.

Comment by Marko Mäkelä [ 2022-12-29 ]

Does the database crash if you start the latest 10.6 server on the 10.5 data directory?

Comment by 김수빈 [ 2023-01-10 ]

hi.

We will upgrade from 10.5 to 10.6 version by referring to the answers you provided, and then monitor it.

Regarding core dump, when executing ldd /usr/sbin/mariadbd, is it possible to analyze core dump by attaching the output library files (files with .so extension) together?

check please.. thank you.

Comment by Marko Mäkelä [ 2023-01-10 ]

sucong, yes, generally, for analyzing a core dump, a copy of the code is needed. Not only the mariadbd executable (or its download location), but all dynamic libraries listed by ldd. Note that linux-vdso.so.1 is a virtual shared library that is provided by the Linux kernel. I haven’t even encountered a case where a mismatch in that library (due to a Linux kernel version mismatch) would have been an issue.

However, for this particular case, I do not expect a normal core dump to be useful, because it would exclude a copy of the buffer pool and therefore also exclude a copy of the corrupted page that is causing the assertion to fail. To analyze this type of problems, I find it most convenient to attach a debugger to the server first (say, gdb -p $(pgrep mariadbd)), and then execute an SQL statement that would cause the crash, and then, once the signal was trapped by the debugger, issue some debugging commands. I see that https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ does not currently mention this.

In any case, I think that it is a good idea to upgrade to MariaDB Server 10.6.10 or later. You may want to avoid 10.6.11 because of MDEV-29988.

Comment by Marko Mäkelä [ 2023-01-26 ]

sucong, did the upgrade to MariaDB 10.6.10 or later help?

Generated at Thu Feb 08 10:15:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.