[MDEV-32178] MariaDB crashed with mysqld got signal 11 - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.11.5, 10.7(EOL)
Fix Version/s: N/A
Component/s: OTHER
Labels:
None
Environment:

Hide
Master-Slave Replication between 3 nodes. Have 2 Masters and 1 Slave in MMM cluster.

[root@DB1~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"

[root@DB1~]# free -mh
total used free shared buff/cache available
Mem: 503Gi 82Gi 409Gi 2.0Mi 11Gi 417Gi
Swap: 31Gi 0B 31Gi

Disk:
/dev/mapper/rhel-var ext4 5.7T 1.2T 4.2T 22% /var

Show
Master-Slave Replication between 3 nodes. Have 2 Masters and 1 Slave in MMM cluster. [ root@DB1 ~]# cat /etc/os-release NAME="Red Hat Enterprise Linux" VERSION="8.6 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.6" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos" HOME_URL=" https://www.redhat.com/ " DOCUMENTATION_URL=" https://access.redhat.com/documentation/red_hat_enterprise_linux/8/ " BUG_REPORT_URL=" https://bugzilla.redhat.com/ " REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_BUGZILLA_PRODUCT_VERSION=8.6 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.6" [ root@DB1 ~]# free -mh total used free shared buff/cache available Mem: 503Gi 82Gi 409Gi 2.0Mi 11Gi 417Gi Swap: 31Gi 0B 31Gi Disk: /dev/mapper/rhel-var ext4 5.7T 1.2T 4.2T 22% /var

Description

Our environment have three DB replication cluster including 2 master nodes and 1 slave. [When writer points to 1st database 2nd one will behave like a slave].
Earlier we had MariaDB version 10.7.4, we were facing crashes at that time as well, so we have upgraded MariaDB version to 10.7.8 and changes innodb_flush_method to fsync from O_Direct.

But after doing those changes we are still facing these crashes on production nodes.

MariaDB [(none)]> select @@innodb_flush_method;

+-----------------------+

| @@innodb_flush_method |

+-----------------------+

| fsync                 |

+-----------------------+

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 1135222

Server version: 10.7.8-MariaDB-log MariaDB Server

Logs during the crash are like:

230915 13:38:39 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.7.8-MariaDB-log source revision: bc656c4fa54c12ceabd857e8ae134f8979d82944

key_buffer_size=67108864

read_buffer_size=131072

max_used_connections=459

max_threads=802

thread_count=463

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831789 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f7918034a58

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f7953ffebd8 thread_stack 0x49000

??:0(my_print_stacktrace)[0x556f01e903fe]

??:0(handle_fatal_signal)[0x556f01998db5]

??:0(__restore_rt)[0x7fbd3e7f4cf0]

??:0(Pushdown_query::execute(JOIN*))[0x556f017fcbba]

??:0(JOIN::exec_inner())[0x556f017e2d09]

??:0(JOIN::exec())[0x556f017e3627]

??:0(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x556f017e188f]

??:0(handle_select(THD*, LEX*, select_result*, unsigned long))[0x556f017e201b]

??:0(LEX::mark_first_table_as_inserting())[0x556f0176d15d]

??:0(mysql_execute_command(THD*, bool))[0x556f01775551]

??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x556f0176832f]

??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x556f017720ad]

??:0(do_command(THD*, bool))[0x556f017737d7]

??:0(do_handle_one_connection(CONNECT*, bool))[0x556f01881177]

??:0(handle_one_connection)[0x556f018814bd]

??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x556f01b9c6ed]

??:0(start_thread)[0x7fbd3e7ea1ca]

:0(__GI___clone)[0x7fbd3db3ae73]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7f7919053bc0): Select UserIndex, NASIP, NASPort, GroupIndex ,NetworkServiceName From ActiveSessions Where UserIndex=55288821 And MainGroupIndex=1009

Connection ID (thread ID): 1569281

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /var/lib/mysql

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             2061710              2061710              processes

Max open files            32768                32768                files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       2061710              2061710              signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

Kernel version: Linux version 4.18.0-477.13.1.el8_8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)) #1 SMP Thu May 18 10:27:05 EDT 2023

Main issue is, after crashing it writes in log that cores are generating but we are not able to find the core file as well.

[root@DB1 ~]# coredumpctl gdb 1721871

           PID: 1721871 (mariadbd)

           UID: 993 (mysql)

           GID: 989 (mysql)

        Signal: 11 (SEGV)

     Timestamp: Fri 2023-09-15 13:38:41 (4h 52min ago)

  Command Line: /usr/sbin/mariadbd

    Executable: /usr/sbin/mariadbd

 Control Group: /

         Slice: -.slice

       Boot ID: 583beffe0ac24952864fddfa37351800

    Machine ID: 8b546902a59548fbad7672d7b993bc35

      Hostname: DB1

       Storage: none

       Message: Process 1721871 (mariadbd) of user 993 dumped core.

Coredump entry has no core attached (neither internally in the journal nor externally on disk).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

DB1messages.txt
2023-09-15 13:47
6 kB
mansi dadheech
DB1my.cnf.txt
2023-09-15 13:47
5 kB
mansi dadheech
my.cnf_10.11.5.txt
2024-01-18 07:29
6 kB
mansi dadheech
my.cnf_10.7.8.txt
2024-01-18 07:29
6 kB
mansi dadheech

Activity

People

Assignee:: Marko Mäkelä

Reporter:: mansi dadheech

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2023-09-15 13:32

Updated:: 2024-03-27 20:44

Resolved:: 2024-03-27 20:44

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.