[MDEV-32278] segmentation fault Created: 2023-09-28  Updated: 2023-12-01

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.3.38
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Marco Lerda Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Ubuntu 20.04 LTS
mysqld Ver 10.3.38-MariaDB-0ubuntu0.20.04.1 for debian-linux-gnu on x86_64 (Ubuntu 20.04)
using Galera in a cluster with 2 nodes and one arbiter.



 Description   

hello,
we have MariaDb server that serveral times a day goes in segmentation fault.
Our version of MariaDb is always updated with the latest available on Ubuntu,
The problem was present in every version of MariaDb, Ubuntu 20.04 LTS provided us.
Lately the frequancy of the issue increased (from twice a month to dozens a day) and the increase was not related to newer version of MariaDB.

Here one example of the log.
Thank you for your help.

Sep 28 10:00:14 mysql01 mysqld: 230928 10:00:14 [ERROR] mysqld got signal 11 ;
Sep 28 10:00:14 mysql01 mysqld: This could be because you hit a bug. It is also possible that this binary
Sep 28 10:00:14 mysql01 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Sep 28 10:00:14 mysql01 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 28 10:00:14 mysql01 mysqld:
Sep 28 10:00:14 mysql01 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Sep 28 10:00:14 mysql01 mysqld:
Sep 28 10:00:14 mysql01 mysqld: We will try our best to scrape up some info that will hopefully help
Sep 28 10:00:14 mysql01 mysqld: diagnose the problem, but since we have already crashed,
Sep 28 10:00:14 mysql01 mysqld: something is definitely wrong and this may fail.
Sep 28 10:00:14 mysql01 mysqld:
Sep 28 10:00:14 mysql01 mysqld: Server version: 10.3.38-MariaDB-0ubuntu0.20.04.1 source revision: c73985f2ce8a391582787f3e310a011c1a712bec
Sep 28 10:00:14 mysql01 mysqld: key_buffer_size=134217728
Sep 28 10:00:14 mysql01 mysqld: read_buffer_size=131072
Sep 28 10:00:14 mysql01 mysqld: max_used_connections=122
Sep 28 10:00:14 mysql01 mysqld: max_threads=1502
Sep 28 10:00:14 mysql01 mysqld: thread_count=100
Sep 28 10:00:14 mysql01 mysqld: It is possible that mysqld could use up to
Sep 28 10:00:14 mysql01 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3433348 K bytes of memory
Sep 28 10:00:14 mysql01 mysqld: Hope that's ok; if not, decrease some variables in the equation.
Sep 28 10:00:14 mysql01 mysqld:
Sep 28 10:00:14 mysql01 mysqld: Thread pointer: 0x7fdd3c016778
Sep 28 10:00:14 mysql01 mysqld: Attempting backtrace. You can use the following information to find out
Sep 28 10:00:14 mysql01 mysqld: where mysqld died. If you see no messages after this, something went
Sep 28 10:00:14 mysql01 mysqld: terribly wrong...
Sep 28 10:00:14 mysql01 mysqld: stack_bottom = 0x7fde949cfdd8 thread_stack 0x49000
Sep 28 10:00:15 mysql01 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x32)[0x5648f5de7be2]
Sep 28 10:00:15 mysql01 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x55d)[0x5648f58db1cd]
Sep 28 10:00:15 mysql01 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7fe1764db3c0]
Sep 28 10:00:15 mysql01 mysqld: [0x7fdd3c061d90]
Sep 28 10:00:15 mysql01 mysqld:
Sep 28 10:00:15 mysql01 mysqld: Trying to get some variables.
Sep 28 10:00:15 mysql01 mysqld: Some pointers may be invalid and cause the dump to abort.
Sep 28 10:00:15 mysql01 mysqld: Query (0x0): (null)
Sep 28 10:00:15 mysql01 mysqld: Connection ID (thread ID): 156081
Sep 28 10:00:15 mysql01 mysqld: Status: KILL_CONNECTION
Sep 28 10:00:15 mysql01 mysqld:
Sep 28 10:00:15 mysql01 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
Sep 28 10:00:15 mysql01 mysqld:
Sep 28 10:00:15 mysql01 mysqld: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Sep 28 10:00:15 mysql01 mysqld: information that should help you find out what is causing the crash.
Sep 28 10:00:15 mysql01 mysqld:
Sep 28 10:00:15 mysql01 mysqld: We think the query pointer is invalid, but we will try to print it anyway.
Sep 28 10:00:15 mysql01 mysqld: Query:
Sep 28 10:00:15 mysql01 mysqld:
Sep 28 10:00:15 mysql01 mysqld: Writing a core file...
Sep 28 10:00:15 mysql01 mysqld: Working directory at /var/lib/mysql
Sep 28 10:00:15 mysql01 mysqld: Resource Limits:
Sep 28 10:00:15 mysql01 mysqld: Limit Soft Limit Hard Limit Units
Sep 28 10:00:15 mysql01 mysqld: Max cpu time unlimited unlimited seconds
Sep 28 10:00:15 mysql01 mysqld: Max file size unlimited unlimited bytes
Sep 28 10:00:15 mysql01 mysqld: Max data size unlimited unlimited bytes
Sep 28 10:00:15 mysql01 mysqld: Max stack size 8388608 unlimited bytes
Sep 28 10:00:15 mysql01 mysqld: Max core file size 0 unlimited bytes
Sep 28 10:00:15 mysql01 kernel: [26244942.842063] mysqld[983414]: segfault at 7fdd3c061d90 ip 00007fdd3c061d90 sp 00007fde949cfc68 error 15
Sep 28 10:00:15 mysql01 kernel: [26244942.842070] Code: 00 00 08 ce bf 77 dc 7f 00 00 40 f2 f6 f5 48 56 00 00 00 00 00 00 00 00 00 00 00 00 00 26 00 00 00 00 95 02 00 00 00 00 00 00 <07> 00 02 00 01 00 07 00 07 00 01 00 07 00 06 00 30 69 01 3c dd 7f
Sep 28 10:00:15 mysql01 mysqld: Max resident set unlimited unlimited bytes
Sep 28 10:00:15 mysql01 mysqld: Max processes 96022 96022 processes
Sep 28 10:00:15 mysql01 mysqld: Max open files 65535 65535 files
Sep 28 10:00:15 mysql01 mysqld: Max locked memory 65536 65536 bytes
Sep 28 10:00:15 mysql01 mysqld: Max address space unlimited unlimited bytes
Sep 28 10:00:15 mysql01 mysqld: Max file locks unlimited unlimited locks
Sep 28 10:00:15 mysql01 mysqld: Max pending signals 96022 96022 signals
Sep 28 10:00:15 mysql01 mysqld: Max msgqueue size 819200 819200 bytes
Sep 28 10:00:15 mysql01 mysqld: Max nice priority 0 0
Sep 28 10:00:15 mysql01 mysqld: Max realtime priority 0 0
Sep 28 10:00:15 mysql01 mysqld: Max realtime timeout unlimited unlimited us
Sep 28 10:00:15 mysql01 mysqld: Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g – %E
Sep 28 10:00:15 mysql01 mysqld:
Sep 28 10:00:15 mysql01 mysqld: Kernel version: Linux version 5.4.0-132-generic (buildd@lcy02-amd64-059) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022



 Comments   
Comment by Daniel Black [ 2023-09-28 ]

Sorry to hear you having increased troubles.

Currently our 10.3 series is out of maintaince. Even if we identify a cause its not something we will fix on a 10.3 version.

If you can use these instructions to install the mariadb-server-core-10.3-dbg package, use apport-retrace to "Examine locally" and use the GDB commands:

set logging on
set pagination off
set print frame-arguments all
thread apply all bt full
set logging off

Attach that information here, and with that we might be able to determine if it already has been fixed or not and give greater confidence for when you upgrade.

ref; using apport

Comment by Marco Lerda [ 2023-10-03 ]

Thank you.
I followed the instruction but running:
apport-retrace -g /var/crash/_usr_sbin_mysqld.112.crash
says:
ERROR: report file does not contain one of the required fields: Package

Are there other ways to provide you the information you need?
Thank you.

Comment by Marco Lerda [ 2023-11-29 ]

Hello all,
eventually we found out that the problem was related to the usage of GET_LOCK() function even though the server crashes after several others queries.
After replacing database based locks with file system locks in our application the problem disappeared.
Thank you a lot for your time and your help.
We don't need further help, as we found a workaround.

Comment by Daniel Black [ 2023-11-30 ]

marco.lerda, thanks for getting back to us.

Can you describe your happen of GET_LOCK usages so we can check if is still happening?

Are multiple locks acquired by the same thread.

Was the aquire/release in the same connection?

Was METADATA_LOCK_INFO used (or any other channel of inspecting locks)?

Rough code pattern would be much appreciated.

Comment by Marco Lerda [ 2023-11-30 ]

Hello,
the locks were used by a lot of different procedures, but all of them were single threaded,
and they acquire and release the locks in the same connection.
Usually there is only an acquire at the beginning and a release at the end,
but sometimes the same lock is acquired and released multiple times by the same procedure (inside a loop).
We haven't used METADATA_LOCK_INFO.
I think it will be very difficult to reproduce because the same procedure sometimes produce the problem and sometimes not.
For sure it depends also on what's going on in the system as a whole.
The system serves a lot of web applications and scheduled procedures, but some days it crashed more and some days less without any apparent reason.

Comment by Marco Lerda [ 2023-12-01 ]

Hello,
here is another information that may be helpful.
We noticed that the procedure which causes the crash, after having acquired the lock,
ends with a deadlock error on a query "Deadlock: wsrep aborted transaction".
Maybe the problem happen when a deadlock occurs after a lock is acquired,
and that can also explain why it doesn't happen every time.
Maybe only in Galera clusters.
We know that GET_LOCK() is not managed by Galera, but we used that because we just needed a local lock.

Generated at Thu Feb 08 10:30:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.