[MDEV-32178] MariaDB crashed with mysqld got signal 11 Created: 2023-09-15  Updated: 2024-01-23

Status: Open
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.7, 10.11.5
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: mansi dadheech Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Master-Slave Replication between 3 nodes. Have 2 Masters and 1 Slave in MMM cluster.

[root@DB1~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"

[root@DB1~]# free -mh
total used free shared buff/cache available
Mem: 503Gi 82Gi 409Gi 2.0Mi 11Gi 417Gi
Swap: 31Gi 0B 31Gi

Disk:
/dev/mapper/rhel-var ext4 5.7T 1.2T 4.2T 22% /var


Attachments: Text File DB1messages.txt     Text File DB1my.cnf.txt     Text File my.cnf_10.11.5.txt     Text File my.cnf_10.7.8.txt    

 Description   

Our environment have three DB replication cluster including 2 master nodes and 1 slave. [When writer points to 1st database 2nd one will behave like a slave].
Earlier we had MariaDB version 10.7.4, we were facing crashes at that time as well, so we have upgraded MariaDB version to 10.7.8 and changes innodb_flush_method to fsync from O_Direct.

But after doing those changes we are still facing these crashes on production nodes.

MariaDB [(none)]> select @@innodb_flush_method;
+-----------------------+
| @@innodb_flush_method |
+-----------------------+
| fsync                 |
+-----------------------+
 
 
 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 1135222
Server version: 10.7.8-MariaDB-log MariaDB Server

Logs during the crash are like:

230915 13:38:39 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.7.8-MariaDB-log source revision: bc656c4fa54c12ceabd857e8ae134f8979d82944
key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=459
max_threads=802
thread_count=463
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831789 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f7918034a58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7953ffebd8 thread_stack 0x49000
??:0(my_print_stacktrace)[0x556f01e903fe]
??:0(handle_fatal_signal)[0x556f01998db5]
??:0(__restore_rt)[0x7fbd3e7f4cf0]
??:0(Pushdown_query::execute(JOIN*))[0x556f017fcbba]
??:0(JOIN::exec_inner())[0x556f017e2d09]
??:0(JOIN::exec())[0x556f017e3627]
??:0(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x556f017e188f]
??:0(handle_select(THD*, LEX*, select_result*, unsigned long))[0x556f017e201b]
??:0(LEX::mark_first_table_as_inserting())[0x556f0176d15d]
??:0(mysql_execute_command(THD*, bool))[0x556f01775551]
??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x556f0176832f]
??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x556f017720ad]
??:0(do_command(THD*, bool))[0x556f017737d7]
??:0(do_handle_one_connection(CONNECT*, bool))[0x556f01881177]
??:0(handle_one_connection)[0x556f018814bd]
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x556f01b9c6ed]
??:0(start_thread)[0x7fbd3e7ea1ca]
:0(__GI___clone)[0x7fbd3db3ae73]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f7919053bc0): Select UserIndex, NASIP, NASPort, GroupIndex ,NetworkServiceName From ActiveSessions Where UserIndex=55288821 And MainGroupIndex=1009
 
Connection ID (thread ID): 1569281
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             2061710              2061710              processes 
Max open files            32768                32768                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       2061710              2061710              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
 
Kernel version: Linux version 4.18.0-477.13.1.el8_8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)) #1 SMP Thu May 18 10:27:05 EDT 2023

Main issue is, after crashing it writes in log that cores are generating but we are not able to find the core file as well.

[root@DB1 ~]# coredumpctl gdb 1721871
           PID: 1721871 (mariadbd)
           UID: 993 (mysql)
           GID: 989 (mysql)
        Signal: 11 (SEGV)
     Timestamp: Fri 2023-09-15 13:38:41 (4h 52min ago)
  Command Line: /usr/sbin/mariadbd
    Executable: /usr/sbin/mariadbd
 Control Group: /
         Slice: -.slice
       Boot ID: 583beffe0ac24952864fddfa37351800
    Machine ID: 8b546902a59548fbad7672d7b993bc35
      Hostname: DB1
       Storage: none
       Message: Process 1721871 (mariadbd) of user 993 dumped core.
 
Coredump entry has no core attached (neither internally in the journal nor externally on disk).



 Comments   
Comment by Alice Sherepa [ 2023-09-15 ]

Is it possible for you to try a more recent MariaDb version - (10.7 is EOL) ?

Comment by mansi dadheech [ 2023-09-18 ]

Is there any other workaround?

Comment by Alice Sherepa [ 2023-09-18 ]

It is hard to guess without the proper way to repeat the crash.
The bug might be already fixed or it could still exist, but then the fix also will not go into 10.7.
Could you please add SHOW CREATE TABLE ActiveSessions? Is the crash repeatable (and is it on master and slave or only somewhere)?
Please add the output of EXPLAIN EXTENDED for that query. Also what non-default configuration do you use (you could attach your .cnf file(s)). You could try to set optimizer_switch='index_condition_pushdown=off' and see if it helps, but that is just a guess.
https://mariadb.com/kb/en/enabling-core-dumps/

Comment by mansi dadheech [ 2023-09-20 ]

| ActiveSessions | CREATE TABLE `activesessions` (
  `ALEPOSESSIONID` varchar(255) NOT NULL,
  `PARENTSESSIONID` varchar(255) DEFAULT NULL,
  `NASIDENTIFIER` varchar(255) DEFAULT NULL,
  `NASPORT` bigint(20) DEFAULT NULL,
  `ACCTSESSIONID` varchar(255) DEFAULT NULL,
  `NASINDEX` int(11) DEFAULT 0,
  `USERINDEX` int(11) DEFAULT 0,
  `USERID` varchar(255) NOT NULL,
  `GROUPINDEX` smallint(6) DEFAULT 0,
  `MAINGROUPINDEX` smallint(6) DEFAULT 0,
  `SERVICE` smallint(6) DEFAULT 0,
  `USERIP` bigint(20) DEFAULT NULL,
  `STARTDATETIME` datetime DEFAULT NULL,
  `CALLERID` varchar(255) DEFAULT NULL,
  `CALLEDSTATIONID` varchar(255) DEFAULT NULL,
  `LASTUPDATETIME` datetime DEFAULT NULL,
  `INTERIMSESSIONTIME` int(11) DEFAULT 0,
  `INTERIMBYTES` bigint(20) DEFAULT 0,
  `INTERIMINPUTOCTETS` bigint(20) DEFAULT 0,
  `INTERIMOUTPUTOCTETS` bigint(20) DEFAULT 0,
  `SESSIONEXPIRYTIME` datetime DEFAULT NULL,
  `PRECLEANUPACTION` smallint(6) DEFAULT 0,
  `EXTERNALCHARGINGIDENTIFIER` varchar(255) DEFAULT NULL,
  `UNIQUEUSERSESSIONID` varchar(255) DEFAULT NULL,
  `SESSIONTIMEOUT` int(11) DEFAULT NULL,
  `NASPORTTYPE` int(11) DEFAULT NULL,
  `USERSPEED` varchar(255) DEFAULT NULL,
  `RADIUSSERVERNAME` varchar(255) DEFAULT NULL,
  `NASPORTID` varchar(255) DEFAULT NULL,
  `CLASS` varchar(255) DEFAULT NULL,
  `FRAMEDINTERFACEID` varchar(255) DEFAULT NULL,
  `FRAMEDIPV6PREFIX` varchar(255) DEFAULT NULL,
  `DELEGATEIPV6PREFIX` varchar(255) DEFAULT NULL,
  `FRAMEDPOOL` varchar(255) DEFAULT NULL,
  `FILTERID` varchar(255) DEFAULT NULL,
  `BILLINGSYSTEMCORRELATIONID` varchar(255) DEFAULT NULL,
  `CHARGINGSYSTEMCORRELATIONID` varchar(255) DEFAULT NULL,
  `CRMSYSTEMCORRELATIONID` varchar(255) DEFAULT NULL,
  `ACCTMULTISESSIONID` varchar(253) DEFAULT NULL,
  `FRAMEDIPV6ADDRESS` varchar(64) DEFAULT NULL,
  `NETWORKSERVICENAME` varchar(64) DEFAULT NULL,
  `NASIP` varchar(255) DEFAULT NULL,
  `CUSTOMFIELD1` varchar(255) DEFAULT NULL,
  `CUSTOMFIELD2` varchar(255) DEFAULT NULL,
  `CUSTOMFIELD3` varchar(255) DEFAULT NULL,
  `CUSTOMFIELD4` varchar(255) DEFAULT NULL,
  `UPLINKSPEED` varchar(255) DEFAULT NULL,
  `DOWNLINKSPEED` varchar(255) DEFAULT NULL,
  `SPEEDHINT` varchar(255) DEFAULT NULL,
  `CUSTOMFIELD5` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`ALEPOSESSIONID`),
  KEY `ACTIVESESSIONSGROUPINDEX` (`GROUPINDEX`),
  KEY `ACTIVESESSIONSMAINGROUPINDEX` (`MAINGROUPINDEX`),
  KEY `ACTIVESESSIONSUSERINDEX` (`USERINDEX`),
  KEY `ACTIVESESSIONSUSERID` (`USERID`),
  KEY `ACTIVESESSIONPARENTSESSIONID` (`PARENTSESSIONID`),
  KEY `UQ_ACTIVESESSIONSUSERIPINDEX` (`USERIP`),
  KEY `UQ_ACTIVESESSIONSUNIQUEUSERSESSIONID` (`UNIQUEUSERSESSIONID`),
  KEY `NASIP_IDX` (`NASIP`),
  KEY `CALLERIDSESSIONS_IDX` (`CALLERID`),
  KEY `FRAMEDIPV6ADDRESS_IDX_ACTIVESESSIONS` (`FRAMEDIPV6ADDRESS`),
  KEY `UQ_ACTIVESESSIONSNAS` (`NASIDENTIFIER`,`NASPORT`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci |

Yes it is repeatable on both master and slave. my.cnf has already been attached. Core is already enabled but it is not able to generate.

And is this issue similar to this one? MDEV-32179

Comment by mansi dadheech [ 2023-09-21 ]

As per the official document optimizer_switch='index_condition_pushdown=off' has a performance impact. Can it be used in production system?

Comment by mansi dadheech [ 2023-09-25 ]

Any update on this, please?

Comment by Sergei Golubchik [ 2023-09-28 ]

Yes, it can. But as you'll disable an optimization strategy, queries might become slower.

Comment by Sergei Golubchik [ 2023-09-28 ]

10.7 has reached EOL, there will be no more 10.7 releases. Any bug you have hit in 10.7 can only be fixed in 10.10 or a later version.

If you can repeat this but in a non-EOL-ed release, please, add a comment here and we'll reopen the issue.

Comment by mansi dadheech [ 2024-01-18 ]

Hello,

Again, we have faced this issue with a different error in mysqld logs. As suggested we have changed innodb_flush_method to fsync. We are getting this issue on both versions of MariaDB [10.7.8 and 10.11.5]. If this was a bug in 10.7.x versions it should be fixed in the 10.11.x versions.

Got the following issue in both the versions:

In 10.7.8:

Maria db is crased with the similar error but not recovered yet:
240115 10:26:10 [ERROR] mysqld got signal 11 ;
 
This could be because you hit a bug. It is also possible that this binary
 
or one of the libraries it was linked against is corrupt, improperly built,
 
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
 
diagnose the problem, but since we have already crashed,
 
something is definitely wrong and this may fail.
 
Server version: 10.7.8-MariaDB-log source revision: bc656c4fa54c12ceabd857e8ae134f8979d82944
 
key_buffer_size=67108864
 
read_buffer_size=131072
 
max_used_connections=456
 
max_threads=802
 
thread_count=457
 
It is possible that mysqld could use up to
 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831789 K  bytes of memory
 
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f410c000c58
 
Attempting backtrace. You can use the following information to find out
 
where mysqld died. If you see no messages after this, something went
 
terribly wrong...
 
stack_bottom = 0x7f42ac315bd8 thread_stack 0x49000
 
Can't start addr2line
 
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x5573d1d583fe]
 
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x5573d1860db5]
 
/lib64/libpthread.so.0(+0x12cf0)[0x7f91918b8cf0]
 
/usr/sbin/mariadbd(_Z10MYSQLparseP3THD+0xf6b7)[0x5573d17fff77]
 
/usr/sbin/mariadbd(_Z9parse_sqlP3THDP12Parser_stateP19Object_creation_ctxb+0x136)[0x5573d16343ac]
 
 
/usr/sbin/mariadbd(_Z11mysql_parseP3THDPcjP12Parser_state+0xef)[0x5573d16301c2]
 
/usr/sbin/mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjb+0xf83)[0x5573d163a0ad]
 
/usr/sbin/mariadbd(_Z10do_commandP3THDb+0x12f)[0x5573d163b7d7]
 
/usr/sbin/mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x3f7)[0x5573d1749177]
 
/usr/sbin/mariadbd(handle_one_connection+0x5d)[0x5573d17494bd]
 
/usr/sbin/mariadbd(+0xc016ed)[0x5573d1a646ed]
 
/lib64/libpthread.so.0(+0x81ca)[0x7f91918ae1ca]
 
/lib64/libc.so.6(clone+0x43)[0x7f9190bfee73]
 
MariaDB Community Bug Reporting
Guidelines for reporting bugs in MariaDB software.

In 10.11.5

2024-01-18  0:42:02 6415776 [Warning] Aborted connection 6415776 to db: 'unconnected' user: 'gtid' host: '10.76.25.17' (Got an error writing communication packets)
2024-01-18  0:52:56 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
240118  0:52:56 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.11.5-MariaDB-log source revision: 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c
key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=648
max_threads=802
thread_count=60
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831964 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
Can't start addr2line
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x555f8aba612e]
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x555f8a69f5b5]
/lib64/libpthread.so.0(+0x12cf0)[0x7f4156b47cf0]
/lib64/libc.so.6(gsignal+0x10f)[0x7f4155ea3acf]
/lib64/libc.so.6(abort+0x127)[0x7f4155e76ea5]
/usr/sbin/mariadbd(+0x690f45)[0x555f8a2f4f45]
/usr/sbin/mariadbd(+0x688c88)[0x555f8a2ecc88]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x38)[0x555f8ab3a698]
/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x2b)[0x555f8ab3b3bb]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x555f8ab38eff]
/lib64/libstdc++.so.6(+0xc2b13)[0x7f415665eb13]
/lib64/libpthread.so.0(+0x81ca)[0x7f4156b3d1ca]
/lib64/libc.so.6(clone+0x43)[0x7f4155e8ee73]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             2013326              2013326              processes
Max open files            32768                32768                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       2013326              2013326              signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
 
Kernel version: Linux version 4.18.0-477.15.1.el8_8.x86_64  [^my.cnf_10.7.8.txt]  [^my.cnf_10.11.5.txt] (mockbuild@x86-64-02.build.eng.rdu2.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)) #1 SMP Fri Jun 2 08:27:19 EDT 2023

PFA my.cnf of both the versions.

Comment by mansi dadheech [ 2024-01-23 ]

Hello @Marko Mäkelä,

Any update on this issue?

Thanks

Generated at Thu Feb 08 10:29:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.