[MDEV-24423] MariaDB 10.3 crashing and restarting intermittently - segfault at 0 Created: 2020-12-16  Updated: 2023-05-12  Resolved: 2023-05-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.27
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Stéphane BOCQUET Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 3
Labels: crash, innodb, zabbix
Environment:

Debian 10 / 4 vCore / 10G RAM / EMC VNX LUNs


Attachments: Text File error.log     File error.log.1    

 Description   

Hello,

We are using a regular Debian 10 server with latest MariaDB 10.3.27. It use to work nice for months, but since 1 week, we are facing some regular crashes after a few hours of run. Then applications (zabbix, etc...) loss the DB connections and some transactions are broken.

System specs : - 4 vCPU - 10G of RAM - Disks are some LUNs on an EMC VNX

Here is an example of the syslog messages:

Dec 16 19:44:48 mysqlbddvprd1 kernel: [503847.749484] show_signal_msg: 18 callbacks suppressed
Dec 16 19:44:48 mysqlbddvprd1 kernel: [503847.749487] mysqld[60145]: segfault at 0 ip 0000557197badfb3 sp 00007f2dbbe2d310 error 6 in mysqld[5571973f0000+80a000]
Dec 16 19:44:48 mysqlbddvprd1 kernel: [503847.749491] Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 52 2f 84 ff 49 89 c7 49 39 c4 0f 84 06 01 00 00 e8 21 18 00 00 41 8b 4d 00 <89> 08 85 c9 74 37 49 83 ff ff 0f 84 ad 00 00 00 f6 c3 06 75 28 4d
Dec 16 19:44:48 mysqlbddvprd1 systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
Dec 16 19:44:48 mysqlbddvprd1 systemd[1]: mariadb.service: Failed with result 'signal'.
Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: mariadb.service: Service RestartSec=5s expired, scheduling restart.
Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.
Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: Stopped MariaDB 10.3.27 database server.
 
Dec 16 19:44:53 mysqlbddvprd1 systemd[1]: Starting MariaDB 10.3.27 database server...
Dec 16 19:44:53 mysqlbddvprd1 mysqld[43693]: 2020-12-16 19:44:53 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 43693 ...
Dec 16 19:45:00 mysqlbddvprd1 systemd[1]: Started MariaDB 10.3.27 database server.
 
Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43750]: Upgrading MySQL tables if necessary.
Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: Looking for 'mysql' as: /usr/bin/mysql
Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43753]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade
Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43765]: Checking for insecure root accounts.
Dec 16 19:45:00 mysqlbddvprd1 /etc/mysql/debian-start[43769]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables

And here is a part of the conf file we use: /etc/mysql/mariadb.conf.d/50-server.cnf

#
# * Fine Tuning
#
myisam_recover_options  = BACKUP
max_connections         = 150
 
#
# * Fine Tuning for InnoDB
#
innodb_buffer_pool_size = 7G            # Go up to 70% to 80% of your available RAM
innodb_buffer_pool_instances = 4        # Bigger if huge InnoDB Buffer Pool or high concurrency
 
innodb_file_per_table   = 1             # Is the recommended way nowadays
innodb_flush_method     = O_DIRECT
innodb_write_io_threads = 8             # If you have a strong I/O system or SSD
innodb_read_io_threads  = 8             # If you have a strong I/O system or SSD
innodb_io_capacity      = 1000          # If you have a strong I/O system or SSD
 
innodb_flush_log_at_trx_commit = 1      # 1 for durability, 0 or 2 for performance
innodb_log_buffer_size  = 8M            # Bigger if innodb_flush_log_at_trx_commit = 0
innodb_log_file_size    = 128M          # Bigger means more write throughput but longer recovery time
 
#
# * Query Cache Configuration
#
query_cache_type        = 0
query_cache_size        = 0

Error.log files are linked.
Any comments are welcome.

Best regards,



 Comments   
Comment by Elena Stepanova [ 2021-01-11 ]

Long semaphore wait crash

Comment by Greg1258 [ 2021-01-12 ]

Hello,

We have a the same error on MariaDB 10.2.36 (crash in a long transaction).
To be more precise, our transaction which cause the server to crash contains a single query, an "INSERT multiple" (hundreds rows).

Seems linked to : https://jira.mariadb.org/browse/MDEV-24375

Could it be also linked to this performance regression : https://jira.mariadb.org/browse/MDEV-24272 ?

Comment by D (Inactive) [ 2021-01-18 ]

Hello,

have the same problems on multiple systems, long semaphore wait, crashes at various repeating intervals.
Also the "error 6" and the segfault of mysqld is happening at the DMESG - sometimes, but not everywhere.

Seems like the last two minor versions (both November 2020 releases) are affected:

  • 10.5.7, 10.5.8 = affected
  • 10.3.26, 10.3.27 = affected
    Workaround, downgrade down two (pre-November 2020) versions.
    The 10.5.6 or 10.3.25 releases seems to have none of these problems.
    Similarly, also the other versions of 10.1, 10.2, 10.4 could be affected too, but don't have any of these.
Comment by Sara Artiglieri [ 2021-02-04 ]

Hello, I got the same problems on two machines. Both have CentOS8 (8.3.1-5) and 10.3.27 MariaDB.

Maria crashes every day at the same hour.

[Warning] InnoDB: A long semaphore wait:

In fact problems started when MariaDB was upgraded to version 10.3.27.

Sara

Comment by D (Inactive) [ 2021-03-16 ]

The most recent update "Release date: 22 Feb 2021" - in my case the v10.5.9, seems to have fixed the aforementioned problems.
So far no repeating crashes or similar problems like before.

Comment by Marko Mäkelä [ 2023-04-14 ]

Can anyone enable core dumps or attach a debugger to a hung server, to produce fully resolved stack traces of all threads during the hang? Without such output, it is impossible to diagnose hangs.

In MariaDB Server 10.6, the "long semaphore wait" diagnostics was replaced with a simple watchdog on dict_sys.latch.

Generated at Thu Feb 08 09:29:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.