Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36700

MariaDB process crashes on one node in a 3-nodes Galera Cluster with "mysql got signal 11" when it's doing a certain select statement with multiple inner joins, while at the same time, Galera is updating the cluster status.

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.20
    • None
    • Galera
    • Operating System: Ubuntu Jammy running on a VM.
      RAM: 32 GB.
      Galera 26.4.12 (codership Galera and not MariaDB fork of Galera.)

    Description

      We have a Galera cluster consisting of 3 MariaDB nodes, during this month the segmentation error described in this report has happened twice.

      the first occurence

      The Cluster had 3 nodes, but one of them was in a broken state(mariadb/2) but it kept trying to join the cluster to no avail.

      In one of the attempts mariadb/0 was doing a complex select statement, and then the process crashed with signal 11, this crash happened at the same time that Galera was trying to update the Cluster status to forget the ip address of mariadb/2:

      logs from mariadb/0:

      2025-04-08 15:40:54 0 [Note] WSREP: declaring mariadb/1 at ssl://xxx.xxx.xx.26:4567 stable
      2025-04-08 15:40:54 0 [Note] WSREP: forgetting mariadb/2 (ssl://xxx.xxx.xx.27:4567)
      250408 15:40:54 [ERROR] mysqld got signal 11 ;
      Sorry, we probably made a mistake, and this is a bug. 
       
      Your assistance in bug reporting will enable us to fix this for the next release.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.6.20-MariaDB-log source revision: f00711bba2cd383825d0be1867f7d7d7f641c9e4
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=111
      max_threads=1502
      thread_count=114
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3439059 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
      

      and the Stack trace:

      mysys/stacktrace.c:216(my_print_stacktrace)[0x55cf7f6152fe]
      sql/signal_handler.cc:247(handle_fatal_signal)[0x55cf7f03cfe7]
      /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f0d07c1c520]
      /lib/x86_64-linux-gnu/libc.so.6(+0x1a0741)[0x7f0d07d7a741]
      bits/string3.h:51(memcpy)[0x55cf7f25dddc]
      maria/ma_blockrec.c:3577(allocate_and_write_block_record)[0x55cf7f25fe73]
      maria/ma_write.c:157(maria_write)[0x55cf7f26bb84]
      sql/sql_class.h:7707(handler::ha_write_tmp_row(unsigned char*))[0x55cf7ee75a9f]
      sql/sql_select.cc:23940(end_write(JOIN*, st_join_table*, bool))[0x55cf7ee6ae47]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55cf7ee3b81b]
      sql/sql_select.cc:22392(sub_select(JOIN*, st_join_table*, bool))[0x55cf7ee41ae7]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55cf7ee3b81b]
      sql/sql_select.cc:22392(sub_select(JOIN*, st_join_table*, bool))[0x55cf7ee41ae7]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55cf7ee3b81b]
      sql/sql_select.cc:22392(sub_select(JOIN*, st_join_table*, bool))[0x55cf7ee41ae7]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55cf7ee3b81b]
      sql/sql_select.cc:22392(sub_select(JOIN*, st_join_table*, bool))[0x55cf7ee41ae7]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55cf7ee3b81b]
      sql/sql_select.cc:22392(sub_select(JOIN*, st_join_table*, bool))[0x55cf7ee41ae7]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55cf7ee3b81b]
      sql/sql_select.cc:22392(sub_select(JOIN*, st_join_table*, bool))[0x55cf7ee41b45]
      sql/sql_select.cc:21908(JOIN::exec_inner())[0x55cf7ee73336]
      sql/sql_select.cc:4715(JOIN::exec())[0x55cf7ee73679]
      sql/sql_select.cc:5195(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55cf7ee718c6]
      sql/sql_select.cc:585(handle_select(THD*, LEX*, select_result*, unsigned long))[0x55cf7ee72124]
      sql/sql_parse.cc:6408(execute_sqlcom_select(THD*, TABLE_LIST*))[0x55cf7ecbfa00]
      sql/sql_parse.cc:3999(mysql_execute_command(THD*, bool))[0x55cf7ee121b1]
      sql/sql_parse.cc:8195(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55cf7ee1468b]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55cf7ee14e21]
      sql/sql_parse.cc:1895(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x55cf7ee1740a]
      sql/sql_parse.cc:1423(do_command(THD*, bool))[0x55cf7ee1804e]
      sql/sql_connect.cc:1407(do_handle_one_connection(CONNECT*, bool))[0x55cf7ef12a2f]
      sql/sql_connect.cc:1325(handle_one_connection)[0x55cf7ef12cc4]
      perfschema/pfs.cc:2204(pfs_spawn_thread)[0x55cf7f2b686c]
      /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f0d07c6eac3]
      /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f0d07d00850]
      Query (0x7f06e4010c00): select <readacted> 
      the second occurence 
      

      Second Occurence:

      The cluster this time was healthy, and we triggered a restart where we take down the nodes one by one so no downtime occurs.

      • mariadb/0 was successfully restarted.
      • When mariadb/1 shut down for the restart, mariadb/2 crashed with the same error while doing the same select query as the first occurrence, while Galera was also trying to forget the IP address of the node that just left at the same time.

      *logs from mariadb/2*

      2025-04-23  9:05:23 0 [Note] WSREP: declaring mariadb/0 at ssl://xxx.xxx.xx.28:4567 stable
      2025-04-23  9:05:23 0 [Note] WSREP: forgetting mariadb/1 (ssl://xxx.xxx.xx.26:4567)
      250423  9:05:23 [ERROR] mysqld got signal 11 ;
      Sorry, we probably made a mistake, and this is a bug.
      

      *The stack trace*

       
      mysys/stacktrace.c:216(my_print_stacktrace)[0x55d093c152fe]
      sql/signal_handler.cc:247(handle_fatal_signal)[0x55d09363cfe7]
      /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f257c783520]
      /lib/x86_64-linux-gnu/libc.so.6(+0x1a0741)[0x7f257c8e1741]
      bits/string3.h:51(memcpy)[0x55d093861d78]
      maria/ma_blockrec.c:5513(_ma_scan_block_record)[0x55d09386284a]
      sql/handler.cc:3532(handler::ha_rnd_next(unsigned char*))[0x55d093643a07]
      sql/filesort.cc:914(filesort(THD*, TABLE*, Filesort*, Filesort_tracker*, JOIN*, unsigned long long))[0x55d09363b83b]
      sql/sql_select.cc:25929(create_sort_index(THD*, JOIN*, st_join_table*, Filesort*))[0x55d09344c613]
      sql/sql_select.cc:23436(st_join_table::sort_table())[0x55d09344c93e]
      sql/sql_select.cc:23373(join_init_read_record(st_join_table*))[0x55d09344ca00]
      sql/sql_select.cc:31573(AGGR_OP::end_send())[0x55d0934529f3]
      sql/sql_select.cc:22073(sub_select_postjoin_aggr(JOIN*, st_join_table*, bool))[0x55d093452bb1]
      sql/sql_select.cc:21909(JOIN::exec_inner())[0x55d093473231]
      sql/sql_select.cc:4715(JOIN::exec())[0x55d093473679]
      sql/sql_select.cc:5195(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55d0934718c6]
      sql/sql_select.cc:585(handle_select(THD*, LEX*, select_result*, unsigned long))[0x55d093472124]
      sql/sql_parse.cc:6408(execute_sqlcom_select(THD*, TABLE_LIST*))[0x55d0932bfa00]
      sql/sql_parse.cc:3999(mysql_execute_command(THD*, bool))[0x55d0934121b1]
      sql/sql_parse.cc:8195(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55d09341468b]
      sql/sql_class.h:4563(THD::get_stmt_da())[0x55d093414e21]
      sql/sql_parse.cc:1895(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x55d09341740a]
      sql/sql_parse.cc:1423(do_command(THD*, bool))[0x55d09341804e]
      sql/sql_connect.cc:1407(do_handle_one_connection(CONNECT*, bool))[0x55d093512a2f]
      sql/sql_connect.cc:1325(handle_one_connection)[0x55d093512cc4]
      perfschema/pfs.cc:2204(pfs_spawn_thread)[0x55d0938b686c]
      /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f257c7d5ac3]
      /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f257c867850]
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f2110030040): select
      

      The stacktrace is the same in both crashes and the circumstances are the same so we believe it's the same bug.

      *Important note*: We run Upstream Galera from codership and not from the MariaDB fork of Galera.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              moe_dev Mohammad Abdulhai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.