Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30125

"[ERROR] mysqld got signal 11 ;" only on AMD EPYC 7773X servers versus intel servers, and suspected poor performance/stability on AMD servers versus intel servers

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.6.11
    • N/A
    • N/A
    • None
    • Red Hat Enterprise Linux release 8.7 (Ootpa)
      MariaDB-server-10.6.11-1.el8.x86_64

    Description

      Hello,
      We had exact same MariaDB data(copied from slaves as full binary), and synced on three intel servers and two AMD servers as slaves and one AMD server as master.
      The three intel servers have different specs(lower than AMD) while the AMD are all dual EPYC 7773X with 1 TB of RAM.

      Two major issues were noted only on AMD servers.

      First of all we noticed the AMD servers have much poor performance compared to Intel servers(when used as master, the slave are really unneccessary and have little load), nearly being unresponsive with high load, while intel servers having much lower specs were much more stable(when used as master).
      We noticed that it occurs exactly when the both CPUs are used due to load increase.
      We tested in production with
      kernel.numa_balancing=0

      Then with
      kernel.numa_balancing=1
      and ExecStart=numactl --interleave=all /usr/sbin/mariadbd......

      Both options had really poor performance exactly when the both CPUs are used due to load increase. Of note, we did not use such options on Intel servers, because we never had any problems, and didn't know of the recommendation of such options in MariaDB official documentation.

      We had much better stability with
      ExecStart=numactl --cpunodebind=1..... with or without interleave option.

      Still we suspect in several metrics and stabililty during high load that the server has much more slow queries and spikes compared to intel servers.

      Second of all, we ran optimize command on one large about 120GB tables on all of the 5 slaves.
      All the intel servers finished relatively quickly, while ALL two AMD servers crashed with
      [ERROR] mysqld got signal 11 ;

      Here is the error log:

      221129 16:48:25 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.6.11-MariaDB-log
      key_buffer_size=33554432
      read_buffer_size=131072
      max_used_connections=29
      max_threads=65537
      thread_count=13
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 142902243 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7ef7fc0008e8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7efb1007bb18 thread_stack 0x49000
      ??:0(my_print_stacktrace)[0x564074e6533e]
      ??:0(handle_fatal_signal)[0x564074945295]
      ??:0(__restore_rt)[0x7f7c8e59ccf0]
      ??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x564074ccff0c]
      ??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x564074cd191c]
      ??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x564074cd2243]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x564074c31d2d]
      ??:0(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool))[0x5640747ce4fc]
      ??:0(mysql_recreate_table(THD*, TABLE_LIST*, bool))[0x5640747cfea0]
      ??:0(MDL_ticket::~MDL_ticket())[0x56407483a38d]
      ??:0(fill_check_table_metadata_fields(THD*, List<Item>*))[0x56407483c851]
      ??:0(Sql_cmd_optimize_table::execute(THD*))[0x56407483d7ed]
      ??:0(mysql_execute_command(THD*, bool))[0x564074732519]
      ??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x564074724703]
      ??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x56407472e7a6]
      ??:0(do_command(THD*, bool))[0x56407472fed0]
      ??:0(tp_callback(TP_connection*))[0x5640748d48d3]
      ??:0(get_event(worker_thread_t*, thread_group_t*, timespec*))[0x564074ac30b0]
      ??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x564074b6713d]
      ??:0(start_thread)[0x7f7c8e5921ca]
      :0(__GI___clone)[0x7f7c8d8e2e73]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7ef7fc013130): optimize table xe_comments_list
       
      Connection ID (thread ID): 1790600
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /db/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units     
      Max cpu time              unlimited            unlimited            seconds   
      Max file size             unlimited            unlimited            bytes     
      Max data size             unlimited            unlimited            bytes     
      Max stack size            8388608              unlimited            bytes     
      Max core file size        0                    unlimited            bytes     
      Max resident set          unlimited            unlimited            bytes     
      Max processes             4127465              4127465              processes 
      Max open files            300000               300000               files     
      Max locked memory         65536                65536                bytes     
      Max address space         unlimited            unlimited            bytes     
      Max file locks            unlimited            unlimited            locks     
      Max pending signals       4127465              4127465              signals   
      Max msgqueue size         819200               819200               bytes     
      Max nice priority         0                    0                    
      Max realtime priority     0                    0                    
      Max realtime timeout      unlimited            unlimited            us        
      Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
       
      Kernel version: Linux version 4.18.0-425.3.1.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC)) #1 SMP Fri Sep 30 11:45:06 EDT 2022
       
      2022-11-29 16:48:33 0 [Note] mariadbd: Aria engine: starting recovery
      recovered pages: 0% 14% 48% 64% 100% (0.0 seconds); tables to flush: 1 0
       (0.0 seconds); 
      2022-11-29 16:48:33 0 [Note] mariadbd: Aria engine: recovery done
      2022-11-29 16:48:33 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2022-11-29 16:48:33 0 [Note] InnoDB: Number of pools: 1
      2022-11-29 16:48:33 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2022-11-29 16:48:33 0 [Note] InnoDB: Using Linux native AIO
      2022-11-29 16:48:33 0 [Note] InnoDB: Initializing buffer pool, total size = 549755813888, chunk size = 134217728
      2022-11-29 16:48:35 0 [Note] InnoDB: Completed initialization of buffer pool
      2022-11-29 16:48:36 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=35451806789271,35458406739008
      2022-11-29 16:48:50 0 [Note] InnoDB: Read redo log up to LSN=35453331680768
      2022-11-29 16:49:05 0 [Note] InnoDB: Read redo log up to LSN=35454931545600
      2022-11-29 16:49:20 0 [Note] InnoDB: Read redo log up to LSN=35456493792768
      2022-11-29 16:49:35 0 [Note] InnoDB: Read redo log up to LSN=35458012786176
      2022-11-29 16:49:39 0 [Note] InnoDB: Starting final batch to recover 306852 pages from redo log.
      2022-11-29 16:49:50 0 [Note] InnoDB: To recover: 232421 pages from log
      2022-11-29 16:50:05 0 [Note] InnoDB: To recover: 205875 pages from log
      2022-11-29 16:50:20 0 [Note] InnoDB: To recover: 186210 pages from log
      2022-11-29 16:50:35 0 [Note] InnoDB: To recover: 169740 pages from log
      2022-11-29 16:50:50 0 [Note] InnoDB: To recover: 155305 pages from log
      2022-11-29 16:51:05 0 [Note] InnoDB: To recover: 142354 pages from log
      2022-11-29 16:51:20 0 [Note] InnoDB: To recover: 130471 pages from log
      2022-11-29 16:51:35 0 [Note] InnoDB: To recover: 119584 pages from log
      2022-11-29 16:51:50 0 [Note] InnoDB: To recover: 109353 pages from log
      2022-11-29 16:52:05 0 [Note] InnoDB: To recover: 99638 pages from log
      2022-11-29 16:52:20 0 [Note] InnoDB: To recover: 90445 pages from log
      2022-11-29 16:52:35 0 [Note] InnoDB: To recover: 81628 pages from log
      2022-11-29 16:52:50 0 [Note] InnoDB: To recover: 73197 pages from log
      2022-11-29 16:53:05 0 [Note] InnoDB: To recover: 65106 pages from log
      2022-11-29 16:53:20 0 [Note] InnoDB: To recover: 57337 pages from log
      2022-11-29 16:53:35 0 [Note] InnoDB: To recover: 49803 pages from log
      2022-11-29 16:53:50 0 [Note] InnoDB: To recover: 42528 pages from log
      2022-11-29 16:54:05 0 [Note] InnoDB: To recover: 35449 pages from log
      2022-11-29 16:54:20 0 [Note] InnoDB: To recover: 28553 pages from log
      2022-11-29 16:54:35 0 [Note] InnoDB: To recover: 21831 pages from log
      2022-11-29 16:54:50 0 [Note] InnoDB: To recover: 15281 pages from log
      2022-11-29 16:55:05 0 [Note] InnoDB: To recover: 8870 pages from log
      2022-11-29 16:55:20 0 [Note] InnoDB: To recover: 2608 pages from log
      2022-11-29 16:55:24 0 [Note] InnoDB: Last binlog file '/db/mysql/mysql-bin.001023', position 155517573
      2022-11-29 16:55:24 0 [Note] InnoDB: 128 rollback segments are active.
      2022-11-29 16:55:24 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      2022-11-29 16:55:24 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2022-11-29 16:55:24 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2022-11-29 16:55:24 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2022-11-29 16:55:24 0 [Note] InnoDB: 10.6.11 started; log sequence number 35458407103175; transaction id 79095061155
      2022-11-29 16:55:24 0 [Note] InnoDB: Loading buffer pool(s) from /db/mysql/ib_buffer_pool
      2022-11-29 16:55:24 0 [Note] Plugin 'FEEDBACK' is disabled.
      2022-11-29 16:55:24 0 [Note] Recovering after a crash using /db/mysql/mysql-bin
      2022-11-29 16:55:24 0 [Note] Starting table crash recovery...
      2022-11-29 16:55:24 0 [Note] Crash table recovery finished.
      2022-11-29 16:55:25 0 [Note] DDL_LOG: Crash recovery executed 1 entries
      2022-11-29 16:55:25 0 [Note] Server socket created on IP: '0.0.0.0'.
      2022-11-29 16:55:25 0 [Note] Server socket created on IP: '::'.
      2022-11-29 16:55:25 5 [Note] Slave I/O thread: Start asynchronous replication to master 'slave_user@[REDACTED]:3306' in log 'mysql-bin.000176' at position 331827117
      2022-11-29 16:55:25 5 [Note] Slave I/O thread: connected to master 'slave_user@[REDACTED]:3306',replication started in log 'mysql-bin.000176' at position 331827117
      2022-11-29 16:55:25 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.6.11-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
      2022-11-29 16:55:25 6 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000176' at position 331827117, relay log '/db/mysql/relay-bin.000266' position: 238805436
      2022-11-29 16:56:22 0 [Note] InnoDB: Buffer pool(s) load completed at 221129 16:56:22
      

      Of note we used one of the AMD servers to binary copy(after shutting down mariadb completely) and rsync mirror all the database data to the Intel and other AMD servers a week ago, and had no replication errors. So the Intel server is actually is a copy of the AMD server data and had no error, so it perhaps some bug is related with AMD servers and MariaDB.

      Attachments

        Activity

          People

            danblack Daniel Black
            FK F K
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.