[MDEV-24423] MariaDB 10.3 crashing and restarting intermittently - segfault at 0 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.3.27
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB
Labels:
- crash
- innodb
- zabbix
Environment:
Debian 10 / 4 vCore / 10G RAM / EMC VNX LUNs

Description

Hello,

We are using a regular Debian 10 server with latest MariaDB 10.3.27. It use to work nice for months, but since 1 week, we are facing some regular crashes after a few hours of run. Then applications (zabbix, etc...) loss the DB connections and some transactions are broken.

System specs : - 4 vCPU - 10G of RAM - Disks are some LUNs on an EMC VNX

Here is an example of the syslog messages:

Dec 16 19:44:48 mysqlbddvprd1 kernel: [503847.749484] show_signal_msg: 18 callbacks suppressed

Dec 16 19:44:48 mysqlbddvprd1 kernel: [503847.749487] mysqld[60145]: segfault at 0 ip 0000557197badfb3 sp 00007f2dbbe2d310 error 6 in mysqld[5571973f0000+80a000]

Dec 16 19:44:48 mysqlbddvprd1 kernel: [503847.749491] Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 52 2f 84 ff 49 89 c7 49 39 c4 0f 84 06 01 00 00 e8 21 18 00 00 41 8b 4d 00 <89> 08 85 c9 74 37 49 83 ff ff 0f 84 ad 00 00 00 f6 c3 06 75 28 4d

Dec 16 19:44:48 mysqlbddvprd1 systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV

Dec 16 19:44:48 mysqlbddvprd1 systemd[1]: mariadb.service: Failed with result 'signal'.

Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: mariadb.service: Service RestartSec=5s expired, scheduling restart.

Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.

Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: Stopped MariaDB 10.3.27 database server.

Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: Starting MariaDB 10.3.27 database server...

Dec 16 19:44:53 mysqlbddvprd1 mysqld[43693]: 2020-12-16 19:44:53 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 43693 ...

Dec 16 19:45:00 mysqlbddvprd1 systemd[1]: Started MariaDB 10.3.27 database server.

Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43750]: Upgrading MySQL tables if necessary.

Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored

Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: Looking for 'mysql' as: /usr/bin/mysql

Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck

Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade

Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43765]: Checking for insecure root accounts.

Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43769]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables

And here is a part of the conf file we use: /etc/mysql/mariadb.conf.d/50-server.cnf

# * Fine Tuning

myisam_recover_options  = BACKUP

max_connections         = 150

# * Fine Tuning for InnoDB

innodb_buffer_pool_size = 7G            # Go up to 70% to 80% of your available RAM

innodb_buffer_pool_instances = 4        # Bigger if huge InnoDB Buffer Pool or high concurrency

innodb_file_per_table   = 1             # Is the recommended way nowadays

innodb_flush_method     = O_DIRECT

innodb_write_io_threads = 8             # If you have a strong I/O system or SSD

innodb_read_io_threads  = 8             # If you have a strong I/O system or SSD

innodb_io_capacity      = 1000          # If you have a strong I/O system or SSD

innodb_flush_log_at_trx_commit = 1      # 1 for durability, 0 or 2 for performance

innodb_log_buffer_size  = 8M            # Bigger if innodb_flush_log_at_trx_commit = 0

innodb_log_file_size    = 128M          # Bigger means more write throughput but longer recovery time

# * Query Cache Configuration

query_cache_type        = 0

query_cache_size        = 0

Error.log files are linked.
Any comments are welcome.

Best regards,

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

error.log
3.63 MB
2020-12-16 19:26
error.log.1
3.77 MB
2020-12-16 19:26

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2021-01-11 23:26

Long semaphore wait crash

Elena Stepanova added a comment - 2021-01-11 23:26 Long semaphore wait crash

Greg1258 added a comment - 2021-01-12 10:40 - edited

Hello,

We have a the same error on MariaDB 10.2.36 (crash in a long transaction).
To be more precise, our transaction which cause the server to crash contains a single query, an "INSERT multiple" (hundreds rows).

Seems linked to : https://jira.mariadb.org/browse/MDEV-24375

Could it be also linked to this performance regression : https://jira.mariadb.org/browse/MDEV-24272 ?

Greg1258 added a comment - 2021-01-12 10:40 - edited Hello, We have a the same error on MariaDB 10.2.36 (crash in a long transaction). To be more precise, our transaction which cause the server to crash contains a single query, an "INSERT multiple" (hundreds rows). Seems linked to : https://jira.mariadb.org/browse/MDEV-24375 Could it be also linked to this performance regression : https://jira.mariadb.org/browse/MDEV-24272 ?

D (Inactive) added a comment - 2021-01-18 07:37

Hello,

have the same problems on multiple systems, long semaphore wait, crashes at various repeating intervals.
Also the "error 6" and the segfault of mysqld is happening at the DMESG - sometimes, but not everywhere.

Seems like the last two minor versions (both November 2020 releases) are affected:

10.5.7, 10.5.8 = affected
10.3.26, 10.3.27 = affected
Workaround, downgrade down two (pre-November 2020) versions.
The 10.5.6 or 10.3.25 releases seems to have none of these problems.
Similarly, also the other versions of 10.1, 10.2, 10.4 could be affected too, but don't have any of these.

D (Inactive) added a comment - 2021-01-18 07:37 Hello, have the same problems on multiple systems, long semaphore wait, crashes at various repeating intervals. Also the "error 6" and the segfault of mysqld is happening at the DMESG - sometimes, but not everywhere. Seems like the last two minor versions (both November 2020 releases) are affected: 10.5.7, 10.5.8 = affected 10.3.26, 10.3.27 = affected Workaround, downgrade down two (pre-November 2020) versions. The 10.5.6 or 10.3.25 releases seems to have none of these problems. Similarly, also the other versions of 10.1, 10.2, 10.4 could be affected too, but don't have any of these.

Sara Artiglieri added a comment - 2021-02-04 13:13

Hello, I got the same problems on two machines. Both have CentOS8 (8.3.1-5) and 10.3.27 MariaDB.

Maria crashes every day at the same hour.

[Warning] InnoDB: A long semaphore wait:

In fact problems started when MariaDB was upgraded to version 10.3.27.

Sara

Sara Artiglieri added a comment - 2021-02-04 13:13 Hello, I got the same problems on two machines. Both have CentOS8 (8.3.1-5) and 10.3.27 MariaDB. Maria crashes every day at the same hour. [Warning] InnoDB: A long semaphore wait: In fact problems started when MariaDB was upgraded to version 10.3.27. Sara

D (Inactive) added a comment - 2021-03-16 09:46

The most recent update "Release date: 22 Feb 2021" - in my case the v10.5.9, seems to have fixed the aforementioned problems.
So far no repeating crashes or similar problems like before.

D (Inactive) added a comment - 2021-03-16 09:46 The most recent update "Release date: 22 Feb 2021" - in my case the v10.5.9, seems to have fixed the aforementioned problems. So far no repeating crashes or similar problems like before.

Marko Mäkelä added a comment - 2023-04-14 13:41

Can anyone enable core dumps or attach a debugger to a hung server, to produce fully resolved stack traces of all threads during the hang? Without such output, it is impossible to diagnose hangs.

In MariaDB Server 10.6, the "long semaphore wait" diagnostics was replaced with a simple watchdog on dict_sys.latch.

Marko Mäkelä added a comment - 2023-04-14 13:41 Can anyone enable core dumps or attach a debugger to a hung server, to produce fully resolved stack traces of all threads during the hang? Without such output, it is impossible to diagnose hangs. In MariaDB Server 10.6, the "long semaphore wait" diagnostics was replaced with a simple watchdog on dict_sys.latch .

People

Assignee:: Marko Mäkelä

Reporter:: Stéphane BOCQUET

Votes:: 3 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2020-12-16 19:29

Updated:: 2023-05-12 19:31

Resolved:: 2023-05-12 19:31

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.