Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36534

Crash in ha_maria::open on replicated temporary table

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 11.4.4
    • None
    • None

    Description

      We have a sporadic bug with two of our mariadb databases. Each affected mariadb galera cluster is running with 3 replicas as a statefulset inside kubernetes. When the statefulset is restarted, sometimes the whole cluster crashes because another replica is restarting and one of the remaining two running replicas crashes with the following trace log:

      2025-03-20T13:59:06.64224658Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: declaring 25f9f2bb-9537 at ssl://10.224.96.112:4567 stable
      2025-03-20T13:59:06.642303762Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: forgetting 363e33a0-9d2c (ssl://10.225.211.126:4567)
      2025-03-20T13:59:06.642734268Z stderr F 250320 13:59:06 [ERROR] mysqld got signal 11 ;
      2025-03-20T13:59:06.642813253Z stderr F Sorry, we probably made a mistake, and this is a bug.
      2025-03-20T13:59:06.642825969Z stderr F
      2025-03-20T13:59:06.642837695Z stderr F Your assistance in bug reporting will enable us to fix this for the next release.
      2025-03-20T13:59:06.64284839Z stderr F To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      2025-03-20T13:59:06.642858358Z stderr F
      2025-03-20T13:59:06.642870589Z stderr F We will try our best to scrape up some info that will hopefully help
      2025-03-20T13:59:06.642881836Z stderr F diagnose the problem, but since we have already crashed,
      2025-03-20T13:59:06.642893398Z stderr F something is definitely wrong and this may fail.
      2025-03-20T13:59:06.642903946Z stderr F
      2025-03-20T13:59:06.642915553Z stderr F Server version: 11.4.4-MariaDB-log source revision: e9a502df08bad16aa8a354e854f3c014b1380e32
      2025-03-20T13:59:06.64292668Z stderr F key_buffer_size=33554432
      2025-03-20T13:59:06.642937378Z stderr F read_buffer_size=131072
      2025-03-20T13:59:06.64294791Z stderr F max_used_connections=182
      2025-03-20T13:59:06.642958723Z stderr F max_threads=2002
      2025-03-20T13:59:06.642989803Z stderr F thread_count=184
      2025-03-20T13:59:06.64300108Z stderr F It is possible that mysqld could use up to
      2025-03-20T13:59:06.643011866Z stderr F key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4442693 K  bytes of memory
      2025-03-20T13:59:06.643022561Z stderr F Hope that's ok; if not, decrease some variables in the equation.
      2025-03-20T13:59:06.643032527Z stderr F
      2025-03-20T13:59:06.643043335Z stderr F WSREP: Suppressing further logging
      2025-03-20T13:59:06.643066766Z stderr F WSREP: Shutting down network communications
      2025-03-20T13:59:06.643076987Z stderr F
      2025-03-20T13:59:06.6430881Z stderr F Thread pointer: 0x7f9824000c68
      2025-03-20T13:59:06.643098798Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Node 1635ca57-848c state prim
      2025-03-20T13:59:06.643109405Z stderr F Attempting backtrace. You can use the following information to find out
      2025-03-20T13:59:06.643120104Z stderr F where mysqld died. If you see no messages after this, something went
      2025-03-20T13:59:06.643130709Z stderr F terribly wrong...
      2025-03-20T13:59:06.646835918Z stderr F stack_bottom = 0x7f99992f2000 thread_stack 0x49000
      2025-03-20T13:59:06.746768805Z stderr F Printing to addr2line failed
      2025-03-20T13:59:06.747750936Z stderr F /opt/bitnami/mariadb/sbin/mysqld(my_print_stacktrace+0x2e)[0x556a53722dee]
      2025-03-20T13:59:06.749385269Z stderr F /opt/bitnami/mariadb/sbin/mysqld(handle_fatal_signal+0x2c3)[0x556a53267653]
      2025-03-20T13:59:06.836347072Z stderr F /lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f9aecf45050]
      2025-03-20T13:59:06.920166459Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0xc89a28)[0x556a53440a28]
      2025-03-20T13:59:06.921527343Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN8ha_maria4openEPKcij+0x66)[0x556a5341d366]
      2025-03-20T13:59:06.922859619Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcijP11st_mem_rootP4ListI6StringE+0x6a)[0x556a5326df9a]
      2025-03-20T13:59:06.923956458Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z14open_tmp_tableP5TABLE+0x34)[0x556a530609a4]
      2025-03-20T13:59:06.925122108Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0x80aeca)[0x556a52fc1eca]
      2025-03-20T13:59:06.92600811Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z27mysql_handle_single_derivedP3LEXP10TABLE_LISTj+0x9a)[0x556a52fc2d4a]
      2025-03-20T13:59:06.926952812Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN13st_join_table12preread_initEv+0x78)[0x556a53055248]
      2025-03-20T13:59:06.928014567Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x430)[0x556a53055740]
      2025-03-20T13:59:06.928976362Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x103a)[0x556a5308353a]
      2025-03-20T13:59:06.929972166Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_ZN4JOIN4execEv+0x2e)[0x556a530838fe]
      2025-03-20T13:59:06.930982447Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x128)[0x556a530818a8]
      2025-03-20T13:59:06.931971481Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resulty+0x13f)[0x556a5308207f]
      2025-03-20T13:59:06.932931325Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0x8451b0)[0x556a52ffc1b0]
      2025-03-20T13:59:06.933969509Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x3ecb)[0x556a53009edb]
      2025-03-20T13:59:06.934910899Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x1c6)[0x556a5300b4c6]
      2025-03-20T13:59:06.935895195Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0x854cba)[0x556a5300bcba]
      2025-03-20T13:59:06.936871457Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjb+0x2535)[0x556a5300e915]
      2025-03-20T13:59:06.937914348Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z10do_commandP3THDb+0x13f)[0x556a5300f29f]
      2025-03-20T13:59:06.938925397Z stderr F /opt/bitnami/mariadb/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECTb+0x37d)[0x556a5312b6dd]
      2025-03-20T13:59:06.940056043Z stderr F /opt/bitnami/mariadb/sbin/mysqld(handle_one_connection+0x5d)[0x556a5312ba3d]
      2025-03-20T13:59:06.941314897Z stderr F /opt/bitnami/mariadb/sbin/mysqld(+0xce60b7)[0x556a5349d0b7]
      2025-03-20T13:59:06.973108692Z stderr F 2025-03-20 13:59:06 0 [Note] /opt/bitnami/mariadb/sbin/mysqld (initiated by: unknown): Normal shutdown
      2025-03-20T13:59:06.973133973Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Shutdown replication
      2025-03-20T13:59:06.973156065Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Server status change synced -> disconnecting
      2025-03-20T13:59:06.973403795Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Closing send monitor...
      2025-03-20T13:59:06.973425872Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: Closed send monitor.
      2025-03-20T13:59:06.973573029Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: gcomm: terminating thread
      2025-03-20T13:59:06.973656905Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: gcomm: joining thread
      2025-03-20T13:59:06.975294385Z stderr F 2025-03-20 13:59:06 0 [Note] WSREP: gcomm: closing backend
      2025-03-20T13:59:07.02183498Z stderr F 2025-03-20 13:59:07 0 [Note] WSREP: (1635ca57-848c, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.224.96.112:4567
      2025-03-20T13:59:07.021885322Z stderr F /lib/x86_64-linux-gnu/libc.so.6(+0x891c4)[0x7f9aecf921c4]
      2025-03-20T13:59:07.021914836Z stderr F /lib/x86_64-linux-gnu/libc.so.6(+0x10985c)[0x7f9aed01285c]
      2025-03-20T13:59:07.021931321Z stderr F
      2025-03-20T13:59:07.021950784Z stderr F Trying to get some variables.
      2025-03-20T13:59:07.021963749Z stderr F Some pointers may be invalid and cause the dump to abort.
      2025-03-20T13:59:07.029232569Z stderr F Query (0x7f9824013100): SELECT anon_1.instances_created_at AS anon_1_instances_created_at, anon_1.instances_updated_at AS anon_1_instances_updated_at, anon_1.instances_deleted_at AS anon_1_instances_deleted_at, anon_1.instances_deleted AS anon_1_instances_deleted, anon_1.instances_id AS anon_1_instances_id, anon_1.instances_user_id AS anon_1_instances_user_id, anon_1.instances_project_id AS anon_1_instances_project_id, anon_1.instances_image_ref AS anon_1_instances_image_ref, anon_1.instances_kernel_id AS anon_1_instances_kernel_id, anon_1.instances_ramdisk_id AS anon_1_instances_ramdisk_id, anon_1.instances_hostname AS anon_1_instances_hostname, anon_1.instances_launch_index AS anon_1_instances_launch_index, anon_1.instances_key_name AS anon_1_instances_key_name, anon_1.instances_key_data AS anon_1_instances_key_data, anon_1.instances_power_state AS anon_1_instances_power_state, anon_1.instances_vm_state AS anon_1_instances_vm_state, anon_1.instances_task_state AS anon_1_instances_task_state, anon_1.instances_memory_mb AS anon_1_instances_memory_mb, anon_1.instances_vcpus AS anon_1_instances_vcpus, anon_1.instances_root_gb AS anon_1_instances_root_gb, anon_1.instances_ephemeral_gb AS anon_1_instances_ephemeral_gb, anon_1.instances_ephemeral_key_uuid AS anon_1_instances_ephemeral_key_uuid, anon_1.instances_host AS anon_1_instances_host, anon_1.instances_node AS anon_1_instances_node, anon_1.instances_instance_type_id AS anon_1_instances_instance_type_id, anon_1.instances_user_data AS anon_1_instances_user_data, anon_1.instances_reservation_id AS anon_1_instances_reservation_id, anon_1.instances_launched_at AS anon_1_instances_launched_at, anon_1.instances_terminated_at AS anon_1_instances_terminated_at, anon_1.instances_availability_zone AS anon_1_instances_availability_zone, anon_1.instances_display_name AS anon_1_instances_display_name, anon_1.instances_display_description AS anon_1_instances_display_description, anon_1.instances_launched_on AS anon_1_instances_launched_on, anon_1.instances_locked AS anon_1_instances_locked, anon_1.instances_locked_by AS anon_1_instances_locked_by, anon_1.instances_os_type AS anon_1_instances_os_type, anon_1.instances_architecture AS anon_1_instances_architecture, anon_1.instances_vm_mode AS anon_1_instances_vm_mode, anon_1.instances_uuid AS anon_1_instances_uuid, anon_1.instances_root_device_name AS anon_1_instances_root_device_name, anon_1.instances_default_ephemeral_device AS anon_1_instances_default_ephemeral_device, anon_1.instances_default_swap_device AS anon_1_instances_default_swap_device, anon_1.instances_config_drive AS anon_1_instances_config_drive, anon_1.instances_access_ip_v4 AS anon_1_instances_access_ip_v4, anon_1.instances_access_ip_v6 AS anon_1_instances_access_ip_v6, anon_1.instances_auto_disk_config AS anon_1_instances_auto_disk_config, anon_1.instances_progress AS anon_1_instances_progress, anon_1.instances_shutdown_terminate AS anon_1_instances_shutdown_terminate, anon_1.instances_disable_terminate AS anon_1_instances_disable_terminate, anon_1.instances_cell_name AS anon_1_instances_cell_name, anon_1.instances_cleaned AS anon_1_instances_cleaned, anon_1.instances_hidden AS anon_1_instances_hidden, instance_info_caches_1.created_at AS instance_info_caches_1_created_at, instance_info_caches_1.updated_at AS instance_info_caches_1_updated_at, instance_info_caches_1.deleted_at AS instance_info_caches_1_deleted_at, instance_info_caches_1.deleted AS instance_info_caches_1_deleted, instance_info_caches_1.id AS instance_info_caches_1_id, instance_info_caches_1.network_info AS instance_info_caches_1_network_info, instance_info_caches_1.instance_uuid AS instance_info_caches_1_instance_uuid, instance_extra_1.flavor AS instance_extra_1_flavor, instance_extra_1.created_at AS instance_extra_1_created_at, instance_extra_1.updated_at AS instance_extra_1_updated_at, instance_extra_1.deleted_at AS instance_extra_1_deleted_at, instance_extra_1.deleted AS instance_extra_1_deleted, instance_extra_1.id AS instance_extra_1_id, instance_extra_1.instance_uuid AS instance_extra_1_instance_uuid, security_groups_1.created_at AS security_groups_1_created_at, security_groups_1.updated_at AS security_groups_1_updated_at, security_groups_1.deleted_at AS security_groups_1_deleted_at, security_groups_1.deleted AS security_groups_1_deleted, security_groups_1.id AS security_groups_1_id, security_groups_1.name AS security_groups_1_name, security_groups_1.description AS security_groups_1_description, security_groups_1.user_id AS security_groups_1_user_id, security_groups_1.project_id AS security_groups_1_project_id
      2025-03-20T13:59:07.032838724Z stderr F FROM (SELECT instances.created_at AS instances_created_at, instances.updated_at AS instances_updated_at, instances.deleted_at AS instances_deleted_at, instances.deleted AS instances_deleted, instances.id AS instances_id, instances.user_id AS instances_user_id, instances.project_id AS instances_project_id, instances.image_ref AS instances_image_ref, instances.kernel_id AS instances_kernel_id, instances.ramdisk_id AS instances_ramdisk_id, instances.hostname AS instances_hostname, instances.launch_index AS instances_launch_index, instances.key_name AS instances_key_name, instances.key_data AS instances_key_data, instances.power_state AS instances_power_state, instances.vm_state AS instances_vm_state, instances.task_state AS instances_task_state, instances.memory_mb AS instances_memory_mb, instances.vcpus AS instances_vcpus, instances.root_gb AS instances_root_gb, instances.ephemeral_gb AS instances_ephemeral_gb, instances.ephemeral_key_uuid AS instances_ephemeral_key_uuid, instances.host AS instances_host, instances.node AS instances_node, instances.instance_type_id AS instances_instance_type_id, instances.user_data AS instances_user_data, instances.reservation_id AS instances_reservation_id, instances.launched_at AS instances_launched_at, instances.terminated_at AS instances_terminated_at, instances.availability_zone AS instances_availability_zone, instances.display_name AS instances_display_name, instances.display_description AS instances_display_description, instances.launched_on AS instances_launched_on, instances.locked AS instances_locked, instances.locked_by AS instances_locked_by, instances.os_type AS instances_os_type, instances.architecture AS instances_architecture, instances.vm_mode AS instances_vm_mode, instances.uuid AS instances_uuid, instances.root_device_name AS instances_root_device_name, instances.default_ephemeral_device AS instances_default_ephemeral_device, instances.default_swap_device AS instances_default_swap_device, instances.config_drive AS instances_config_drive, instances.access_ip_v4 AS instances_access_ip_v4, instances.access_ip_v6 AS instances_access_ip_v6, instances.auto_disk_config AS instances_auto_disk_config, instances.progress AS instances_progress, instances.shutdown_terminate AS instances_shutdown_terminate, instances.disable_terminate AS instances_disable_terminate, instances.cell_name AS instances_cell_name, instances.cleaned AS instances_cleaned, instances.hidden AS instances_hidden
      2025-03-20T13:59:07.032888337Z stderr F FROM instances
      2025-03-20T13:59:07.033441527Z stderr F WHERE instances.deleted = 0 AND (instances.vm_state != 'soft-delete' OR instances.vm_state IS NULL) AND (instances.hidden = false OR instances.hidden IS NULL) AND instances.project_id = 'b7c8434d1e264bd1a64f1226dc812b9e' AND (instances.display_name REGEXP 'k8s-svc-fdf88f04-ccce-4764-bda4-9ac7a53a52aa') ORDER BY instances.created_at DESC, instances.uuid ASC, instances.id DESC
      2025-03-20T13:59:07.034577916Z stderr F  LIMIT 550) AS anon_1 LEFT OUTER JOIN instance_info_caches AS instance_info_caches_1 ON instance_info_caches_1.instance_uuid = anon_1.instances_uuid LEFT OUTER JOIN instance_extra AS instance_extra_1 ON instance_extra_1.instance_uuid = anon_1.instances_uuid LEFT OUTER JOIN (security_group_instance_association AS security_group_instance_association_1 INNER JOIN security_groups AS security_groups_1 ON security_groups_1.id = security_group_instance_association_1.security_group_id AND security_group_instance_association_1.deleted = 0 AND security_groups_1.deleted = 0) ON security_group_instance_association_1.instance_uuid = anon_1.instances_uuid AND anon_1.instances_deleted = 0 ORDER BY anon_1.instances_created_at DESC, anon_1.instances_uuid ASC, anon_1.instances_id DESC
      2025-03-20T13:59:07.034602408Z stderr F
      2025-03-20T13:59:07.034615693Z stderr F Connection ID (thread ID): 1143
      2025-03-20T13:59:07.034627261Z stderr F Status: NOT_KILLED
      2025-03-20T13:59:07.034637186Z stderr F
      2025-03-20T13:59:07.034703783Z stderr F Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=off,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on,cset_narrowing=off,sargable_casefold=on
      2025-03-20T13:59:07.034766713Z stderr F
      2025-03-20T13:59:07.034793323Z stderr F The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
      2025-03-20T13:59:07.034805306Z stderr F information that should help you find out what is causing the crash.
      2025-03-20T13:59:07.034837588Z stderr F Writing a core file...
      2025-03-20T13:59:07.034852731Z stderr F Working directory at /bitnami/mariadb/data
      2025-03-20T13:59:07.034864986Z stderr F Resource Limits:
      2025-03-20T13:59:07.034877318Z stderr F Limit                     Soft Limit           Hard Limit           Units
      2025-03-20T13:59:07.034889626Z stderr F Max cpu time              unlimited            unlimited            seconds
      2025-03-20T13:59:07.034901936Z stderr F Max file size             unlimited            unlimited            bytes
      2025-03-20T13:59:07.034913949Z stderr F Max data size             unlimited            unlimited            bytes
      2025-03-20T13:59:07.034926705Z stderr F Max stack size            8388608              unlimited            bytes
      2025-03-20T13:59:07.034938985Z stderr F Max core file size        unlimited            unlimited            bytes
      2025-03-20T13:59:07.034951205Z stderr F Max resident set          unlimited            unlimited            bytes
      2025-03-20T13:59:07.034963252Z stderr F Max processes             unlimited            unlimited            processes
      2025-03-20T13:59:07.034975093Z stderr F Max open files            1048576              1048576              files
      2025-03-20T13:59:07.034986981Z stderr F Max locked memory         8388608              8388608              bytes
      2025-03-20T13:59:07.034999105Z stderr F Max address space         unlimited            unlimited            bytes
      2025-03-20T13:59:07.035011091Z stderr F Max file locks            unlimited            unlimited            locks
      2025-03-20T13:59:07.035023157Z stderr F Max pending signals       767625               767625               signals
      2025-03-20T13:59:07.035035927Z stderr F Max msgqueue size         819200               819200               bytes
      2025-03-20T13:59:07.035048045Z stderr F Max nice priority         0                    0
      2025-03-20T13:59:07.035059858Z stderr F Max realtime priority     0                    0
      2025-03-20T13:59:07.035071518Z stderr F Max realtime timeout      unlimited            unlimited            us
      2025-03-20T13:59:07.035083151Z stderr F Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
      2025-03-20T13:59:07.035094424Z stderr F
      2025-03-20T13:59:07.035136595Z stderr F Kernel version: Linux version 6.1.44-custom (root@dbhost-2) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP PREEMPT_DYNAMIC Mon Sep 11 13:55:21 UTC 2023
      2025-03-20T13:59:07.035148719Z stderr F
      

      The crash log is identical to the one in this closed bug MDEV-33010( https://jira.mariadb.org/browse/MDEV-33010) except optimize is replaced with exec in our case.
      The bug only happens in our prod cluster and despite trying to reproduce it in a test environment, it is not replicated. In the test setup we used the mariadb backup made some hours before the crash and ran the SELECT query printed in the logs for days while restarting the stateful set but the database was fine. The query returns a response every time. Please let me know if further details are needed.

      Attachments

        Activity

          danblack Daniel Black added a comment - - edited

          Its fairly different from MDEV-33010 which was fixed one version before the version in this bug report.

          What is filesystem and storage are where MariaDB would create a temporary file inside the bitnami container (tmpdir)?

          How is the current kernel customised? Anything filesystem/storage related?

          danblack Daniel Black added a comment - - edited Its fairly different from MDEV-33010 which was fixed one version before the version in this bug report. What is filesystem and storage are where MariaDB would create a temporary file inside the bitnami container (tmpdir)? How is the current kernel customised? Anything filesystem/storage related?
          umarfarooq Umar Farooq added a comment -

          The tmp directory setup for the container is using tmpfs:

          # df -h
          Filesystem                                                          Size  Used Avail Use% Mounted on
          /dev/mapper/csi--lvm-pvc--4b825b41--0554--4072--9c12--18fc3cfec3df  246G  115G  119G  50% /bitnami/mariadb
          shm                                                                  64M     0   64M   0% /dev/shm
          tmpfs                                                               188G   36K  188G   1% /bitnami/mariadb/certs
          tmpfs                                                               188G  2.5M  188G   1% /opt/bitnami/mariadb/tmp
          

          The persistent volume has storageClassName as lvm-nvme and volumeMode set to Filesystem. So its a local nvme device on the underlying physical node that the affected databases are using.
          The custom kernel has no modification related to the filesystem/storage.

          umarfarooq Umar Farooq added a comment - The tmp directory setup for the container is using tmpfs: # df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/csi--lvm-pvc--4b825b41--0554--4072--9c12--18fc3cfec3df 246G 115G 119G 50% /bitnami/mariadb shm 64M 0 64M 0% /dev/shm tmpfs 188G 36K 188G 1% /bitnami/mariadb/certs tmpfs 188G 2.5M 188G 1% /opt/bitnami/mariadb/tmp The persistent volume has storageClassName as lvm-nvme and volumeMode set to Filesystem. So its a local nvme device on the underlying physical node that the affected databases are using. The custom kernel has no modification related to the filesystem/storage.
          umarfarooq Umar Farooq added a comment - - edited

          We discovered that the bug is not encountered when the statefulset is increased to 5 replicas. Only DBs with 3 replicas have this issue. We have not tested it with more than 5 replicas.

          umarfarooq Umar Farooq added a comment - - edited We discovered that the bug is not encountered when the statefulset is increased to 5 replicas. Only DBs with 3 replicas have this issue. We have not tested it with more than 5 replicas.
          danblack Daniel Black added a comment -

          Thanks for the further information. Glad you found at least a work around. As its a select query crashing this isn't replicated over the cluster so the increase in cluster replicas is adjusting the load somehow.

          You've listed max_used_connections: 2566 , could it be that its run out of file descriptors for a temporary table? (though 1048576 is the limit per the crash output). Bad handling on MariaDB part. Is a SHOW GLOBAL STATUS sharable showing differences between the 5 replica set and the 3 replia set.

          danblack Daniel Black added a comment - Thanks for the further information. Glad you found at least a work around. As its a select query crashing this isn't replicated over the cluster so the increase in cluster replicas is adjusting the load somehow. You've listed max_used_connections: 2566 , could it be that its run out of file descriptors for a temporary table? (though 1048576 is the limit per the crash output). Bad handling on MariaDB part. Is a SHOW GLOBAL STATUS sharable showing differences between the 5 replica set and the 3 replia set.

          People

            Unassigned Unassigned
            umarfarooq Umar Farooq
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.