Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
10.6.11
-
None
-
Red Hat Enterprise Linux release 8.7 (Ootpa)
MariaDB-server-10.6.11-1.el8.x86_64
Description
Hello,
We had exact same MariaDB data(copied from slaves as full binary), and synced on three intel servers and two AMD servers as slaves and one AMD server as master.
The three intel servers have different specs(lower than AMD) while the AMD are all dual EPYC 7773X with 1 TB of RAM.
Two major issues were noted only on AMD servers.
First of all we noticed the AMD servers have much poor performance compared to Intel servers(when used as master, the slave are really unneccessary and have little load), nearly being unresponsive with high load, while intel servers having much lower specs were much more stable(when used as master).
We noticed that it occurs exactly when the both CPUs are used due to load increase.
We tested in production with
kernel.numa_balancing=0
Then with
kernel.numa_balancing=1
and ExecStart=numactl --interleave=all /usr/sbin/mariadbd......
Both options had really poor performance exactly when the both CPUs are used due to load increase. Of note, we did not use such options on Intel servers, because we never had any problems, and didn't know of the recommendation of such options in MariaDB official documentation.
We had much better stability with
ExecStart=numactl --cpunodebind=1..... with or without interleave option.
Still we suspect in several metrics and stabililty during high load that the server has much more slow queries and spikes compared to intel servers.
Second of all, we ran optimize command on one large about 120GB tables on all of the 5 slaves.
All the intel servers finished relatively quickly, while ALL two AMD servers crashed with
[ERROR] mysqld got signal 11 ;
Here is the error log:
221129 16:48:25 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.6.11-MariaDB-log
|
key_buffer_size=33554432
|
read_buffer_size=131072
|
max_used_connections=29
|
max_threads=65537
|
thread_count=13
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 142902243 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x7ef7fc0008e8
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7efb1007bb18 thread_stack 0x49000
|
??:0(my_print_stacktrace)[0x564074e6533e]
|
??:0(handle_fatal_signal)[0x564074945295]
|
??:0(__restore_rt)[0x7f7c8e59ccf0]
|
??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x564074ccff0c]
|
??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x564074cd191c]
|
??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x564074cd2243]
|
??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x564074c31d2d]
|
??:0(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool))[0x5640747ce4fc]
|
??:0(mysql_recreate_table(THD*, TABLE_LIST*, bool))[0x5640747cfea0]
|
??:0(MDL_ticket::~MDL_ticket())[0x56407483a38d]
|
??:0(fill_check_table_metadata_fields(THD*, List<Item>*))[0x56407483c851]
|
??:0(Sql_cmd_optimize_table::execute(THD*))[0x56407483d7ed]
|
??:0(mysql_execute_command(THD*, bool))[0x564074732519]
|
??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x564074724703]
|
??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x56407472e7a6]
|
??:0(do_command(THD*, bool))[0x56407472fed0]
|
??:0(tp_callback(TP_connection*))[0x5640748d48d3]
|
??:0(get_event(worker_thread_t*, thread_group_t*, timespec*))[0x564074ac30b0]
|
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x564074b6713d]
|
??:0(start_thread)[0x7f7c8e5921ca]
|
:0(__GI___clone)[0x7f7c8d8e2e73]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7ef7fc013130): optimize table xe_comments_list
|
|
Connection ID (thread ID): 1790600
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
|
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /db/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes
|
Max core file size 0 unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes 4127465 4127465 processes
|
Max open files 300000 300000 files
|
Max locked memory 65536 65536 bytes
|
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 4127465 4127465 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Max realtime timeout unlimited unlimited us
|
Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
|
|
Kernel version: Linux version 4.18.0-425.3.1.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC)) #1 SMP Fri Sep 30 11:45:06 EDT 2022
|
|
2022-11-29 16:48:33 0 [Note] mariadbd: Aria engine: starting recovery
|
recovered pages: 0% 14% 48% 64% 100% (0.0 seconds); tables to flush: 1 0
|
(0.0 seconds);
|
2022-11-29 16:48:33 0 [Note] mariadbd: Aria engine: recovery done
|
2022-11-29 16:48:33 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2022-11-29 16:48:33 0 [Note] InnoDB: Number of pools: 1
|
2022-11-29 16:48:33 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2022-11-29 16:48:33 0 [Note] InnoDB: Using Linux native AIO
|
2022-11-29 16:48:33 0 [Note] InnoDB: Initializing buffer pool, total size = 549755813888, chunk size = 134217728
|
2022-11-29 16:48:35 0 [Note] InnoDB: Completed initialization of buffer pool
|
2022-11-29 16:48:36 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=35451806789271,35458406739008
|
2022-11-29 16:48:50 0 [Note] InnoDB: Read redo log up to LSN=35453331680768
|
2022-11-29 16:49:05 0 [Note] InnoDB: Read redo log up to LSN=35454931545600
|
2022-11-29 16:49:20 0 [Note] InnoDB: Read redo log up to LSN=35456493792768
|
2022-11-29 16:49:35 0 [Note] InnoDB: Read redo log up to LSN=35458012786176
|
2022-11-29 16:49:39 0 [Note] InnoDB: Starting final batch to recover 306852 pages from redo log.
|
2022-11-29 16:49:50 0 [Note] InnoDB: To recover: 232421 pages from log
|
2022-11-29 16:50:05 0 [Note] InnoDB: To recover: 205875 pages from log
|
2022-11-29 16:50:20 0 [Note] InnoDB: To recover: 186210 pages from log
|
2022-11-29 16:50:35 0 [Note] InnoDB: To recover: 169740 pages from log
|
2022-11-29 16:50:50 0 [Note] InnoDB: To recover: 155305 pages from log
|
2022-11-29 16:51:05 0 [Note] InnoDB: To recover: 142354 pages from log
|
2022-11-29 16:51:20 0 [Note] InnoDB: To recover: 130471 pages from log
|
2022-11-29 16:51:35 0 [Note] InnoDB: To recover: 119584 pages from log
|
2022-11-29 16:51:50 0 [Note] InnoDB: To recover: 109353 pages from log
|
2022-11-29 16:52:05 0 [Note] InnoDB: To recover: 99638 pages from log
|
2022-11-29 16:52:20 0 [Note] InnoDB: To recover: 90445 pages from log
|
2022-11-29 16:52:35 0 [Note] InnoDB: To recover: 81628 pages from log
|
2022-11-29 16:52:50 0 [Note] InnoDB: To recover: 73197 pages from log
|
2022-11-29 16:53:05 0 [Note] InnoDB: To recover: 65106 pages from log
|
2022-11-29 16:53:20 0 [Note] InnoDB: To recover: 57337 pages from log
|
2022-11-29 16:53:35 0 [Note] InnoDB: To recover: 49803 pages from log
|
2022-11-29 16:53:50 0 [Note] InnoDB: To recover: 42528 pages from log
|
2022-11-29 16:54:05 0 [Note] InnoDB: To recover: 35449 pages from log
|
2022-11-29 16:54:20 0 [Note] InnoDB: To recover: 28553 pages from log
|
2022-11-29 16:54:35 0 [Note] InnoDB: To recover: 21831 pages from log
|
2022-11-29 16:54:50 0 [Note] InnoDB: To recover: 15281 pages from log
|
2022-11-29 16:55:05 0 [Note] InnoDB: To recover: 8870 pages from log
|
2022-11-29 16:55:20 0 [Note] InnoDB: To recover: 2608 pages from log
|
2022-11-29 16:55:24 0 [Note] InnoDB: Last binlog file '/db/mysql/mysql-bin.001023', position 155517573
|
2022-11-29 16:55:24 0 [Note] InnoDB: 128 rollback segments are active.
|
2022-11-29 16:55:24 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
|
2022-11-29 16:55:24 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2022-11-29 16:55:24 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2022-11-29 16:55:24 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2022-11-29 16:55:24 0 [Note] InnoDB: 10.6.11 started; log sequence number 35458407103175; transaction id 79095061155
|
2022-11-29 16:55:24 0 [Note] InnoDB: Loading buffer pool(s) from /db/mysql/ib_buffer_pool
|
2022-11-29 16:55:24 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2022-11-29 16:55:24 0 [Note] Recovering after a crash using /db/mysql/mysql-bin
|
2022-11-29 16:55:24 0 [Note] Starting table crash recovery...
|
2022-11-29 16:55:24 0 [Note] Crash table recovery finished.
|
2022-11-29 16:55:25 0 [Note] DDL_LOG: Crash recovery executed 1 entries
|
2022-11-29 16:55:25 0 [Note] Server socket created on IP: '0.0.0.0'.
|
2022-11-29 16:55:25 0 [Note] Server socket created on IP: '::'.
|
2022-11-29 16:55:25 5 [Note] Slave I/O thread: Start asynchronous replication to master 'slave_user@[REDACTED]:3306' in log 'mysql-bin.000176' at position 331827117
|
2022-11-29 16:55:25 5 [Note] Slave I/O thread: connected to master 'slave_user@[REDACTED]:3306',replication started in log 'mysql-bin.000176' at position 331827117
|
2022-11-29 16:55:25 0 [Note] /usr/sbin/mariadbd: ready for connections.
|
Version: '10.6.11-MariaDB-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
|
2022-11-29 16:55:25 6 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000176' at position 331827117, relay log '/db/mysql/relay-bin.000266' position: 238805436
|
2022-11-29 16:56:22 0 [Note] InnoDB: Buffer pool(s) load completed at 221129 16:56:22
|
Of note we used one of the AMD servers to binary copy(after shutting down mariadb completely) and rsync mirror all the database data to the Intel and other AMD servers a week ago, and had no replication errors. So the Intel server is actually is a copy of the AMD server data and had no error, so it perhaps some bug is related with AMD servers and MariaDB.