Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.11.5, 10.7(EOL)
-
None
-
Master-Slave Replication between 3 nodes. Have 2 Masters and 1 Slave in MMM cluster.
[root@DB1~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"
[root@DB1~]# free -mh
total used free shared buff/cache available
Mem: 503Gi 82Gi 409Gi 2.0Mi 11Gi 417Gi
Swap: 31Gi 0B 31Gi
Disk:
/dev/mapper/rhel-var ext4 5.7T 1.2T 4.2T 22% /var
Master-Slave Replication between 3 nodes. Have 2 Masters and 1 Slave in MMM cluster. [ root@DB1 ~]# cat /etc/os-release NAME="Red Hat Enterprise Linux" VERSION="8.6 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.6" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos" HOME_URL=" https://www.redhat.com/ " DOCUMENTATION_URL=" https://access.redhat.com/documentation/red_hat_enterprise_linux/8/ " BUG_REPORT_URL=" https://bugzilla.redhat.com/ " REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_BUGZILLA_PRODUCT_VERSION=8.6 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.6" [ root@DB1 ~]# free -mh total used free shared buff/cache available Mem: 503Gi 82Gi 409Gi 2.0Mi 11Gi 417Gi Swap: 31Gi 0B 31Gi Disk: /dev/mapper/rhel-var ext4 5.7T 1.2T 4.2T 22% /var
Description
Our environment have three DB replication cluster including 2 master nodes and 1 slave. [When writer points to 1st database 2nd one will behave like a slave].
Earlier we had MariaDB version 10.7.4, we were facing crashes at that time as well, so we have upgraded MariaDB version to 10.7.8 and changes innodb_flush_method to fsync from O_Direct.
But after doing those changes we are still facing these crashes on production nodes.
MariaDB [(none)]> select @@innodb_flush_method; |
+-----------------------+
|
| @@innodb_flush_method | |
+-----------------------+
|
| fsync |
|
+-----------------------+
|
|
|
|
Welcome to the MariaDB monitor. Commands end with ; or \g.
|
Your MariaDB connection id is 1135222 |
Server version: 10.7.8-MariaDB-log MariaDB Server |
|
Logs during the crash are like:
230915 13:38:39 [ERROR] mysqld got signal 11 ; |
This could be because you hit a bug. It is also possible that this binary |
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs |
|
We will try our best to scrape up some info that will hopefully help |
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail. |
|
Server version: 10.7.8-MariaDB-log source revision: bc656c4fa54c12ceabd857e8ae134f8979d82944 |
key_buffer_size=67108864 |
read_buffer_size=131072 |
max_used_connections=459 |
max_threads=802 |
thread_count=463 |
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831789 K bytes of memory |
Hope that's ok; if not, decrease some variables in the equation. |
|
Thread pointer: 0x7f7918034a58 |
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went |
terribly wrong...
|
stack_bottom = 0x7f7953ffebd8 thread_stack 0x49000 |
??:0(my_print_stacktrace)[0x556f01e903fe] |
??:0(handle_fatal_signal)[0x556f01998db5] |
??:0(__restore_rt)[0x7fbd3e7f4cf0] |
??:0(Pushdown_query::execute(JOIN*))[0x556f017fcbba] |
??:0(JOIN::exec_inner())[0x556f017e2d09] |
??:0(JOIN::exec())[0x556f017e3627] |
??:0(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x556f017e188f] |
??:0(handle_select(THD*, LEX*, select_result*, unsigned long))[0x556f017e201b] |
??:0(LEX::mark_first_table_as_inserting())[0x556f0176d15d] |
??:0(mysql_execute_command(THD*, bool))[0x556f01775551] |
??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x556f0176832f] |
??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x556f017720ad] |
??:0(do_command(THD*, bool))[0x556f017737d7] |
??:0(do_handle_one_connection(CONNECT*, bool))[0x556f01881177] |
??:0(handle_one_connection)[0x556f018814bd] |
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x556f01b9c6ed] |
??:0(start_thread)[0x7fbd3e7ea1ca] |
:0(__GI___clone)[0x7fbd3db3ae73] |
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f7919053bc0): Select UserIndex, NASIP, NASPort, GroupIndex ,NetworkServiceName From ActiveSessions Where UserIndex=55288821 And MainGroupIndex=1009 |
|
Connection ID (thread ID): 1569281 |
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off |
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains |
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /var/lib/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes |
Max core file size unlimited unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes 2061710 2061710 processes |
Max open files 32768 32768 files |
Max locked memory 65536 65536 bytes |
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 2061710 2061710 signals |
Max msgqueue size 819200 819200 bytes |
Max nice priority 0 0 |
Max realtime priority 0 0 |
Max realtime timeout unlimited unlimited us
|
Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
|
|
Kernel version: Linux version 4.18.0-477.13.1.el8_8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)) #1 SMP Thu May 18 10:27:05 EDT 2023 |
Main issue is, after crashing it writes in log that cores are generating but we are not able to find the core file as well.
[root@DB1 ~]# coredumpctl gdb 1721871 |
PID: 1721871 (mariadbd) |
UID: 993 (mysql) |
GID: 989 (mysql) |
Signal: 11 (SEGV) |
Timestamp: Fri 2023-09-15 13:38:41 (4h 52min ago) |
Command Line: /usr/sbin/mariadbd
|
Executable: /usr/sbin/mariadbd
|
Control Group: /
|
Slice: -.slice
|
Boot ID: 583beffe0ac24952864fddfa37351800
|
Machine ID: 8b546902a59548fbad7672d7b993bc35
|
Hostname: DB1
|
Storage: none
|
Message: Process 1721871 (mariadbd) of user 993 dumped core. |
|
Coredump entry has no core attached (neither internally in the journal nor externally on disk).
|
|