Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
11.4.4
-
None
-
None
Description
We have a sporadic bug with two of our mariadb databases. Each affected mariadb galera cluster is running with 3 replicas as a statefulset inside kubernetes. When the statefulset is restarted, sometimes the whole cluster crashes because another replica is restarting and one of the remaining two running replicas crashes with the following trace log:
2025-03-20T13:59:06.64224658Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: declaring 25f9f2bb-9537 at ssl://10.224.96.112:4567 stable
|
2025-03-20T13:59:06.642303762Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: forgetting 363e33a0-9d2c (ssl://10.225.211.126:4567)
|
2025-03-20T13:59:06.642734268Z stderr F 250320 13:59:06 [ERROR] mysqld got signal 11 ;
|
2025-03-20T13:59:06.642813253Z stderr F Sorry, we probably made a mistake, and this is a bug.
|
2025-03-20T13:59:06.642825969Z stderr F
|
2025-03-20T13:59:06.642837695Z stderr F Your assistance in bug reporting will enable us to fix this for the next release.
|
2025-03-20T13:59:06.64284839Z stderr F To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
2025-03-20T13:59:06.642858358Z stderr F
|
2025-03-20T13:59:06.642870589Z stderr F We will try our best to scrape up some info that will hopefully help
|
2025-03-20T13:59:06.642881836Z stderr F diagnose the problem, but since we have already crashed,
|
2025-03-20T13:59:06.642893398Z stderr F something is definitely wrong and this may fail.
|
2025-03-20T13:59:06.642903946Z stderr F
|
2025-03-20T13:59:06.642915553Z stderr F Server version: 11.4.4-MariaDB-log source revision: e9a502df08bad16aa8a354e854f3c014b1380e32
|
2025-03-20T13:59:06.64292668Z stderr F key_buffer_size=33554432
|
2025-03-20T13:59:06.642937378Z stderr F read_buffer_size=131072
|
2025-03-20T13:59:06.64294791Z stderr F max_used_connections=182
|
2025-03-20T13:59:06.642958723Z stderr F max_threads=2002
|
2025-03-20T13:59:06.642989803Z stderr F thread_count=184
|
2025-03-20T13:59:06.64300108Z stderr F It is possible that mysqld could use up to
|
2025-03-20T13:59:06.643011866Z stderr F key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4442693 K bytes of memory
|
2025-03-20T13:59:06.643022561Z stderr F Hope that's ok; if not, decrease some variables in the equation.
|
2025-03-20T13:59:06.643032527Z stderr F
|
2025-03-20T13:59:06.643043335Z stderr F WSREP: Suppressing further logging
|
2025-03-20T13:59:06.643066766Z stderr F WSREP: Shutting down network communications
|
2025-03-20T13:59:06.643076987Z stderr F
|
2025-03-20T13:59:06.6430881Z stderr F Thread pointer: 0x7f9824000c68
|
2025-03-20T13:59:06.643098798Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Node 1635ca57-848c state prim
|
2025-03-20T13:59:06.643109405Z stderr F Attempting backtrace. You can use the following information to find out
|
2025-03-20T13:59:06.643120104Z stderr F where mysqld died. If you see no messages after this, something went
|
2025-03-20T13:59:06.643130709Z stderr F terribly wrong...
|
2025-03-20T13:59:06.646835918Z stderr F stack_bottom = 0x7f99992f2000 thread_stack 0x49000
|
2025-03-20T13:59:06.746768805Z stderr F Printing to addr2line failed
|
2025-03-20T13:59:06.747750936Z stderr F /opt/bitnami/mariadb/sbin/mysqld(my_print_stacktrace+0x2e)[0x556a53722dee]
|
2025-03-20T13:59:06.749385269Z stderr F /opt/bitnami/mariadb/sbin/mysqld(handle_fatal_signal+0x2c3)[0x556a53267653]
|
2025-03-20T13:59:06.836347072Z stderr F /lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f9aecf45050]
|
2025-03-20T13:59:06.920166459Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0xc89a28)[0x556a53440a28]
|
2025-03-20T13:59:06.921527343Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN8ha_maria4openEPKcij+0x66)[0x556a5341d366]
|
2025-03-20T13:59:06.922859619Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcijP11st_mem_rootP4ListI6StringE+0x6a)[0x556a5326df9a]
|
2025-03-20T13:59:06.923956458Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z14open_tmp_tableP5TABLE+0x34)[0x556a530609a4]
|
2025-03-20T13:59:06.925122108Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0x80aeca)[0x556a52fc1eca]
|
2025-03-20T13:59:06.92600811Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z27mysql_handle_single_derivedP3LEXP10TABLE_LISTj+0x9a)[0x556a52fc2d4a]
|
2025-03-20T13:59:06.926952812Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN13st_join_table12preread_initEv+0x78)[0x556a53055248]
|
2025-03-20T13:59:06.928014567Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x430)[0x556a53055740]
|
2025-03-20T13:59:06.928976362Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x103a)[0x556a5308353a]
|
2025-03-20T13:59:06.929972166Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN4JOIN4execEv+0x2e)[0x556a530838fe]
|
2025-03-20T13:59:06.930982447Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x128)[0x556a530818a8]
|
2025-03-20T13:59:06.931971481Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resulty+0x13f)[0x556a5308207f]
|
2025-03-20T13:59:06.932931325Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0x8451b0)[0x556a52ffc1b0]
|
2025-03-20T13:59:06.933969509Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x3ecb)[0x556a53009edb]
|
2025-03-20T13:59:06.934910899Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x1c6)[0x556a5300b4c6]
|
2025-03-20T13:59:06.935895195Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0x854cba)[0x556a5300bcba]
|
2025-03-20T13:59:06.936871457Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjb+0x2535)[0x556a5300e915]
|
2025-03-20T13:59:06.937914348Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z10do_commandP3THDb+0x13f)[0x556a5300f29f]
|
2025-03-20T13:59:06.938925397Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECTb+0x37d)[0x556a5312b6dd]
|
2025-03-20T13:59:06.940056043Z stderr F /opt/bitnami/mariadb/sbin/mysqld(handle_one_connection+0x5d)[0x556a5312ba3d]
|
2025-03-20T13:59:06.941314897Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0xce60b7)[0x556a5349d0b7]
|
2025-03-20T13:59:06.973108692Z stderr F 2025-03-20 13:59:06 0 [Note] /opt/bitnami/mariadb/sbin/mysqld (initiated by: unknown): Normal shutdown
|
2025-03-20T13:59:06.973133973Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Shutdown replication
|
2025-03-20T13:59:06.973156065Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Server status change synced -> disconnecting
|
2025-03-20T13:59:06.973403795Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Closing send monitor...
|
2025-03-20T13:59:06.973425872Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Closed send monitor.
|
2025-03-20T13:59:06.973573029Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: gcomm: terminating thread
|
2025-03-20T13:59:06.973656905Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: gcomm: joining thread
|
2025-03-20T13:59:06.975294385Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: gcomm: closing backend
|
2025-03-20T13:59:07.02183498Z stderr F 2025-03-20 13:59:07 0 [Note] WSREP: (1635ca57-848c, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.224.96.112:4567
|
2025-03-20T13:59:07.021885322Z stderr F /lib/x86_64-linux-gnu/libc.so.6(+0x891c4)[0x7f9aecf921c4]
|
2025-03-20T13:59:07.021914836Z stderr F /lib/x86_64-linux-gnu/libc.so.6(+0x10985c)[0x7f9aed01285c]
|
2025-03-20T13:59:07.021931321Z stderr F
|
2025-03-20T13:59:07.021950784Z stderr F Trying to get some variables.
|
2025-03-20T13:59:07.021963749Z stderr F Some pointers may be invalid and cause the dump to abort.
|
2025-03-20T13:59:07.029232569Z stderr F Query (0x7f9824013100): SELECT anon_1.instances_created_at AS anon_1_instances_created_at, anon_1.instances_updated_at AS anon_1_instances_updated_at, anon_1.instances_deleted_at AS anon_1_instances_deleted_at, anon_1.instances_deleted AS anon_1_instances_deleted, anon_1.instances_id AS anon_1_instances_id, anon_1.instances_user_id AS anon_1_instances_user_id, anon_1.instances_project_id AS anon_1_instances_project_id, anon_1.instances_image_ref AS anon_1_instances_image_ref, anon_1.instances_kernel_id AS anon_1_instances_kernel_id, anon_1.instances_ramdisk_id AS anon_1_instances_ramdisk_id, anon_1.instances_hostname AS anon_1_instances_hostname, anon_1.instances_launch_index AS anon_1_instances_launch_index, anon_1.instances_key_name AS anon_1_instances_key_name, anon_1.instances_key_data AS anon_1_instances_key_data, anon_1.instances_power_state AS anon_1_instances_power_state, anon_1.instances_vm_state AS anon_1_instances_vm_state, anon_1.instances_task_state AS anon_1_instances_task_state, anon_1.instances_memory_mb AS anon_1_instances_memory_mb, anon_1.instances_vcpus AS anon_1_instances_vcpus, anon_1.instances_root_gb AS anon_1_instances_root_gb, anon_1.instances_ephemeral_gb AS anon_1_instances_ephemeral_gb, anon_1.instances_ephemeral_key_uuid AS anon_1_instances_ephemeral_key_uuid, anon_1.instances_host AS anon_1_instances_host, anon_1.instances_node AS anon_1_instances_node, anon_1.instances_instance_type_id AS anon_1_instances_instance_type_id, anon_1.instances_user_data AS anon_1_instances_user_data, anon_1.instances_reservation_id AS anon_1_instances_reservation_id, anon_1.instances_launched_at AS anon_1_instances_launched_at, anon_1.instances_terminated_at AS anon_1_instances_terminated_at, anon_1.instances_availability_zone AS anon_1_instances_availability_zone, anon_1.instances_display_name AS anon_1_instances_display_name, anon_1.instances_display_description AS anon_1_instances_display_description, anon_1.instances_launched_on AS anon_1_instances_launched_on, anon_1.instances_locked AS anon_1_instances_locked, anon_1.instances_locked_by AS anon_1_instances_locked_by, anon_1.instances_os_type AS anon_1_instances_os_type, anon_1.instances_architecture AS anon_1_instances_architecture, anon_1.instances_vm_mode AS anon_1_instances_vm_mode, anon_1.instances_uuid AS anon_1_instances_uuid, anon_1.instances_root_device_name AS anon_1_instances_root_device_name, anon_1.instances_default_ephemeral_device AS anon_1_instances_default_ephemeral_device, anon_1.instances_default_swap_device AS anon_1_instances_default_swap_device, anon_1.instances_config_drive AS anon_1_instances_config_drive, anon_1.instances_access_ip_v4 AS anon_1_instances_access_ip_v4, anon_1.instances_access_ip_v6 AS anon_1_instances_access_ip_v6, anon_1.instances_auto_disk_config AS anon_1_instances_auto_disk_config, anon_1.instances_progress AS anon_1_instances_progress, anon_1.instances_shutdown_terminate AS anon_1_instances_shutdown_terminate, anon_1.instances_disable_terminate AS anon_1_instances_disable_terminate, anon_1.instances_cell_name AS anon_1_instances_cell_name, anon_1.instances_cleaned AS anon_1_instances_cleaned, anon_1.instances_hidden AS anon_1_instances_hidden, instance_info_caches_1.created_at AS instance_info_caches_1_created_at, instance_info_caches_1.updated_at AS instance_info_caches_1_updated_at, instance_info_caches_1.deleted_at AS instance_info_caches_1_deleted_at, instance_info_caches_1.deleted AS instance_info_caches_1_deleted, instance_info_caches_1.id AS instance_info_caches_1_id, instance_info_caches_1.network_info AS instance_info_caches_1_network_info, instance_info_caches_1.instance_uuid AS instance_info_caches_1_instance_uuid, instance_extra_1.flavor AS instance_extra_1_flavor, instance_extra_1.created_at AS instance_extra_1_created_at, instance_extra_1.updated_at AS instance_extra_1_updated_at, instance_extra_1.deleted_at AS instance_extra_1_deleted_at, instance_extra_1.deleted AS instance_extra_1_deleted, instance_extra_1.id AS instance_extra_1_id, instance_extra_1.instance_uuid AS instance_extra_1_instance_uuid, security_groups_1.created_at AS security_groups_1_created_at, security_groups_1.updated_at AS security_groups_1_updated_at, security_groups_1.deleted_at AS security_groups_1_deleted_at, security_groups_1.deleted AS security_groups_1_deleted, security_groups_1.id AS security_groups_1_id, security_groups_1.name AS security_groups_1_name, security_groups_1.description AS security_groups_1_description, security_groups_1.user_id AS security_groups_1_user_id, security_groups_1.project_id AS security_groups_1_project_id
|
2025-03-20T13:59:07.032838724Z stderr F FROM (SELECT instances.created_at AS instances_created_at, instances.updated_at AS instances_updated_at, instances.deleted_at AS instances_deleted_at, instances.deleted AS instances_deleted, instances.id AS instances_id, instances.user_id AS instances_user_id, instances.project_id AS instances_project_id, instances.image_ref AS instances_image_ref, instances.kernel_id AS instances_kernel_id, instances.ramdisk_id AS instances_ramdisk_id, instances.hostname AS instances_hostname, instances.launch_index AS instances_launch_index, instances.key_name AS instances_key_name, instances.key_data AS instances_key_data, instances.power_state AS instances_power_state, instances.vm_state AS instances_vm_state, instances.task_state AS instances_task_state, instances.memory_mb AS instances_memory_mb, instances.vcpus AS instances_vcpus, instances.root_gb AS instances_root_gb, instances.ephemeral_gb AS instances_ephemeral_gb, instances.ephemeral_key_uuid AS instances_ephemeral_key_uuid, instances.host AS instances_host, instances.node AS instances_node, instances.instance_type_id AS instances_instance_type_id, instances.user_data AS instances_user_data, instances.reservation_id AS instances_reservation_id, instances.launched_at AS instances_launched_at, instances.terminated_at AS instances_terminated_at, instances.availability_zone AS instances_availability_zone, instances.display_name AS instances_display_name, instances.display_description AS instances_display_description, instances.launched_on AS instances_launched_on, instances.locked AS instances_locked, instances.locked_by AS instances_locked_by, instances.os_type AS instances_os_type, instances.architecture AS instances_architecture, instances.vm_mode AS instances_vm_mode, instances.uuid AS instances_uuid, instances.root_device_name AS instances_root_device_name, instances.default_ephemeral_device AS instances_default_ephemeral_device, instances.default_swap_device AS instances_default_swap_device, instances.config_drive AS instances_config_drive, instances.access_ip_v4 AS instances_access_ip_v4, instances.access_ip_v6 AS instances_access_ip_v6, instances.auto_disk_config AS instances_auto_disk_config, instances.progress AS instances_progress, instances.shutdown_terminate AS instances_shutdown_terminate, instances.disable_terminate AS instances_disable_terminate, instances.cell_name AS instances_cell_name, instances.cleaned AS instances_cleaned, instances.hidden AS instances_hidden
|
2025-03-20T13:59:07.032888337Z stderr F FROM instances
|
2025-03-20T13:59:07.033441527Z stderr F WHERE instances.deleted = 0 AND (instances.vm_state != 'soft-delete' OR instances.vm_state IS NULL) AND (instances.hidden = false OR instances.hidden IS NULL) AND instances.project_id = 'b7c8434d1e264bd1a64f1226dc812b9e' AND (instances.display_name REGEXP 'k8s-svc-fdf88f04-ccce-4764-bda4-9ac7a53a52aa') ORDER BY instances.created_at DESC, instances.uuid ASC, instances.id DESC
|
2025-03-20T13:59:07.034577916Z stderr F LIMIT 550) AS anon_1 LEFT OUTER JOIN instance_info_caches AS instance_info_caches_1 ON instance_info_caches_1.instance_uuid = anon_1.instances_uuid LEFT OUTER JOIN instance_extra AS instance_extra_1 ON instance_extra_1.instance_uuid = anon_1.instances_uuid LEFT OUTER JOIN (security_group_instance_association AS security_group_instance_association_1 INNER JOIN security_groups AS security_groups_1 ON security_groups_1.id = security_group_instance_association_1.security_group_id AND security_group_instance_association_1.deleted = 0 AND security_groups_1.deleted = 0) ON security_group_instance_association_1.instance_uuid = anon_1.instances_uuid AND anon_1.instances_deleted = 0 ORDER BY anon_1.instances_created_at DESC, anon_1.instances_uuid ASC, anon_1.instances_id DESC
|
2025-03-20T13:59:07.034602408Z stderr F
|
2025-03-20T13:59:07.034615693Z stderr F Connection ID (thread ID): 1143
|
2025-03-20T13:59:07.034627261Z stderr F Status: NOT_KILLED
|
2025-03-20T13:59:07.034637186Z stderr F
|
2025-03-20T13:59:07.034703783Z stderr F Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=off,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on,cset_narrowing=off,sargable_casefold=on
|
2025-03-20T13:59:07.034766713Z stderr F
|
2025-03-20T13:59:07.034793323Z stderr F The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
|
2025-03-20T13:59:07.034805306Z stderr F information that should help you find out what is causing the crash.
|
2025-03-20T13:59:07.034837588Z stderr F Writing a core file...
|
2025-03-20T13:59:07.034852731Z stderr F Working directory at /bitnami/mariadb/data
|
2025-03-20T13:59:07.034864986Z stderr F Resource Limits:
|
2025-03-20T13:59:07.034877318Z stderr F Limit Soft Limit Hard Limit Units
|
2025-03-20T13:59:07.034889626Z stderr F Max cpu time unlimited unlimited seconds
|
2025-03-20T13:59:07.034901936Z stderr F Max file size unlimited unlimited bytes
|
2025-03-20T13:59:07.034913949Z stderr F Max data size unlimited unlimited bytes
|
2025-03-20T13:59:07.034926705Z stderr F Max stack size 8388608 unlimited bytes
|
2025-03-20T13:59:07.034938985Z stderr F Max core file size unlimited unlimited bytes
|
2025-03-20T13:59:07.034951205Z stderr F Max resident set unlimited unlimited bytes
|
2025-03-20T13:59:07.034963252Z stderr F Max processes unlimited unlimited processes
|
2025-03-20T13:59:07.034975093Z stderr F Max open files 1048576 1048576 files
|
2025-03-20T13:59:07.034986981Z stderr F Max locked memory 8388608 8388608 bytes
|
2025-03-20T13:59:07.034999105Z stderr F Max address space unlimited unlimited bytes
|
2025-03-20T13:59:07.035011091Z stderr F Max file locks unlimited unlimited locks
|
2025-03-20T13:59:07.035023157Z stderr F Max pending signals 767625 767625 signals
|
2025-03-20T13:59:07.035035927Z stderr F Max msgqueue size 819200 819200 bytes
|
2025-03-20T13:59:07.035048045Z stderr F Max nice priority 0 0
|
2025-03-20T13:59:07.035059858Z stderr F Max realtime priority 0 0
|
2025-03-20T13:59:07.035071518Z stderr F Max realtime timeout unlimited unlimited us
|
2025-03-20T13:59:07.035083151Z stderr F Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
|
2025-03-20T13:59:07.035094424Z stderr F
|
2025-03-20T13:59:07.035136595Z stderr F Kernel version: Linux version 6.1.44-custom (root@dbhost-2) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP PREEMPT_DYNAMIC Mon Sep 11 13:55:21 UTC 2023
|
2025-03-20T13:59:07.035148719Z stderr F
|
The crash log is identical to the one in this closed bug MDEV-33010( https://jira.mariadb.org/browse/MDEV-33010) except optimize is replaced with exec in our case.
The bug only happens in our prod cluster and despite trying to reproduce it in a test environment, it is not replicated. In the test setup we used the mariadb backup made some hours before the crash and ran the SELECT query printed in the logs for days while restarting the stateful set but the database was fine. The query returns a response every time. Please let me know if further details are needed.
Its fairly different from
MDEV-33010which was fixed one version before the version in this bug report.What is filesystem and storage are where MariaDB would create a temporary file inside the bitnami container (tmpdir)?
How is the current kernel customised? Anything filesystem/storage related?