[MDEV-20323] MariaDB asserts in srv_error_monitor_thread at srv0srv.cc:2410 Created: 2019-08-12  Updated: 2020-12-08  Resolved: 2020-05-18

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.41
Fix Version/s: 10.1.44

Type: Bug Priority: Major
Reporter: Sander Pilon Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None
Environment:

Ubuntu 18.4, kernel 4.15.0-45-generic.
MariaDB 10.1.41


Attachments: Text File gdb.txt    

 Description   

The 10.1 branch has been stable for us for a while now.
10.2 crashes for us on a regular basis, so upgrading is not an option. (See MDEV-16923, and others)

Unfortunately, 10.1.40 crashed for us with an update on JOINs.
So we upgraded to 10.1.41, but since we did that the server crashes multiple times a day with the following log message:

Server version: 10.1.41-MariaDB-1~bionic
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=192
max_threads=502
thread_count=45
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1233831 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fa04ec10008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f9f52fb3d68 thread_stack 0x49000
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
Fatal signal 6 while backtracing



 Comments   
Comment by Sander Pilon [ 2019-08-14 ]

... and again today.

Comment by Elena Stepanova [ 2019-08-15 ]

Is there any way for you to find out which query it crashes on?
Can you enable the coredump to see if it happens to produce a meaningful core file? (Unlikely, but not impossible)

Comment by Jaime Crespo [ 2019-10-25 ]

I believe I may be suffering the same issue. MariaDB 10.1.39 (which in code should be the same as 10.1.40) doesn't show the same symptoms, however 10.1.41 crashes approximatelly every 16 minutes since the last start with our workload.

First it shows threads getting stalled, and InnoDB starts spitting InnoDB contention messages:

Oct 24 13:25:16 clouddb1001 mysqld[1622]: =====================================
Oct 24 13:25:16 clouddb1001 mysqld[1622]: 2019-10-24 13:25:16 7f7a7d7ff700 INNODB MONITOR OUTPUT
Oct 24 13:25:16 clouddb1001 mysqld[1622]: =====================================
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Per second averages calculated from the last 20 seconds
Oct 24 13:25:16 clouddb1001 mysqld[1622]: -----------------
Oct 24 13:25:16 clouddb1001 mysqld[1622]: BACKGROUND THREAD
Oct 24 13:25:16 clouddb1001 mysqld[1622]: -----------------
Oct 24 13:25:16 clouddb1001 mysqld[1622]: srv_master_thread loops: 1684 srv_active, 0 srv_shutdown, 4 srv_idle
Oct 24 13:25:16 clouddb1001 mysqld[1622]: srv_master_thread log flush and writes: 1688
Oct 24 13:25:16 clouddb1001 mysqld[1622]: ----------
Oct 24 13:25:16 clouddb1001 mysqld[1622]: SEMAPHORES
Oct 24 13:25:16 clouddb1001 mysqld[1622]: ----------
Oct 24 13:25:16 clouddb1001 mysqld[1622]: OS WAIT ARRAY INFO: reservation count 152461
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140163862734592 has waited at dict0dict.cc line 984 for 949.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140200399333120 has waited at dict0dict.cc line 1138 for 949.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140200693630720 has waited at dict0dict.cc line 1138 for 948.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140200397285120 has waited at row0undo.cc line 300 for 947.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: S-lock on RW-latch at 0x55a4bf2cace0 '&dict_operation_lock'
Oct 24 13:25:16 clouddb1001 mysqld[1622]: a writer (thread id 140163879520000) has reserved it in mode  exclusive
Oct 24 13:25:16 clouddb1001 mysqld[1622]: number of readers 0, waiters flag 1, lock_word: 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time read locked in file row0purge.cc line 772
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time write locked in file srv0srv.cc line 2837
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Holder thread 0 file not yet reserved line 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140163854341888 has waited at row0purge.cc line 772 for 946.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: S-lock on RW-latch at 0x55a4bf2cace0 '&dict_operation_lock'
Oct 24 13:25:16 clouddb1001 mysqld[1622]: a writer (thread id 140163879520000) has reserved it in mode  exclusive
Oct 24 13:25:16 clouddb1001 mysqld[1622]: number of readers 0, waiters flag 1, lock_word: 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time read locked in file row0purge.cc line 772
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time write locked in file srv0srv.cc line 2837
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Holder thread 0 file not yet reserved line 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140164661778176 has waited at dict0boot.ic line 37 for 946.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140164058654464 has waited at row0ins.cc line 1875 for 943.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: S-lock on RW-latch at 0x55a4bf2cace0 '&dict_operation_lock'
Oct 24 13:25:16 clouddb1001 mysqld[1622]: a writer (thread id 140163879520000) has reserved it in mode  exclusive
Oct 24 13:25:16 clouddb1001 mysqld[1622]: number of readers 0, waiters flag 1, lock_word: 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time read locked in file row0purge.cc line 772
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time write locked in file srv0srv.cc line 2837
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Holder thread 0 file not yet reserved line 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140164057835264 has waited at dict0dict.cc line 1138 for 942.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140200396670720 has waited at dict0boot.ic line 37 for 934.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140163819337472 has waited at row0undo.cc line 300 for 933.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: S-lock on RW-latch at 0x55a4bf2cace0 '&dict_operation_lock'
Oct 24 13:25:16 clouddb1001 mysqld[1622]: a writer (thread id 140163879520000) has reserved it in mode  exclusive
Oct 24 13:25:16 clouddb1001 mysqld[1622]: number of readers 0, waiters flag 1, lock_word: 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time read locked in file row0purge.cc line 772
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time write locked in file srv0srv.cc line 2837
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Holder thread 0 file not yet reserved line 0
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140200399128320 has waited at dict0dict.cc line 1138 for 933.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140164662392576 has waited at dict0dict.cc line 1138 for 933.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140163818518272 has waited at dict0boot.ic line 37 for 932.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: --Thread 140163818723072 has waited at dict0dict.cc line 1138 for 932.00 seconds the semaphore:
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Mutex at 0x7f8305412568 '&dict_sys->mutex', lock var 1
Oct 24 13:25:16 clouddb1001 mysqld[1622]: Last time reserved by thread 140163879520000 in file not yet reserved line 0, waiters flag 1}}

Then it crashes due to an assert violation:

Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: ###### Diagnostic info printed to the standard error stream
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: Error: semaphore wait has lasted > 600 seconds
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: We intentionally crash the server, because it appears to be hung.
Oct 24 13:25:21 clouddb1001 mysqld[1622]: 2019-10-24 13:25:21 7f7a72bfd700  InnoDB: Assertion failure in thread 140163887912704 in file srv0srv.cc line 2410
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: We intentionally generate a memory trap.
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: If you get repeated assertion failures or crashes, even
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: immediately after the mysqld startup, there may be
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
Oct 24 13:25:21 clouddb1001 mysqld[1622]: InnoDB: about forcing recovery.
Oct 24 13:25:21 clouddb1001 mysqld[1622]: 191024 13:25:21 [ERROR] mysqld got signal 6 ;

And a buffer overflow?

Oct 24 13:25:21 clouddb1001 mysqld[1622]: Attempting backtrace. You can use the following information to find out
Oct 24 13:25:21 clouddb1001 mysqld[1622]: where mysqld died. If you see no messages after this, something went
Oct 24 13:25:21 clouddb1001 mysqld[1622]: terribly wrong...
Oct 24 13:25:21 clouddb1001 mysqld[1622]: stack_bottom = 0x0 thread_stack 0x30000
Oct 24 13:25:21 clouddb1001 mysqld[1622]: *** buffer overflow detected ***: /opt/wmf-mariadb101/bin/mysqld terminated
Oct 24 13:25:21 clouddb1001 mysqld[1622]: ======= Backtrace: =========
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f83070ddbfb]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f8307166437]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libc.so.6(+0xf7570)[0x7f8307164570]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libc.so.6(+0xf93aa)[0x7f83071663aa]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /opt/wmf-mariadb101/bin/mysqld(my_addr_resolve+0xd8)[0x55a4be1e6e18]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /opt/wmf-mariadb101/bin/mysqld(my_print_stacktrace+0x1bb)[0x55a4be1d016b]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /opt/wmf-mariadb101/bin/mysqld(handle_fatal_signal+0x3bd)[0x55a4bdd108cd]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7f83089c10e0]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7f830709ffff]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f83070a142a]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /opt/wmf-mariadb101/bin/mysqld(+0x91acbe)[0x55a4be077cbe]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f83089b74a4]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8307155d0f]
Oct 24 13:25:21 clouddb1001 mysqld[1622]: ======= Memory map: ========

This is a regression: this doesn't happen with the same workload on earlier versions. 10.1.42 is not available as of right now.

I am using 10.1.41 on Debian stretch compiled from source with mostly the same options than the binary tarball.

Comment by paladox [ 2019-10-25 ]

We also had the same issue whilst upgrading from mariadb 10.2.24 -> 10.2.27 a few days ago.

We had to downgrade mariadb to fix it.

2019-10-23 22:45:38 140643262588672 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
191023 22:45:38 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.2.27-MariaDB-10.2.27+maria~stretch-log
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=453
max_threads=502
thread_count=460
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1119425 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out  
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
*** buffer overflow detected ***: /usr/sbin/mysqld terminated                                
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7fec3fdecbfb]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fec3fe75437]
/lib/x86_64-linux-gnu/libc.so.6(+0xf7570)[0x7fec3fe73570]
/lib/x86_64-linux-gnu/libc.so.6(+0xf93aa)[0x7fec3fe753aa]
/usr/sbin/mysqld(my_addr_resolve+0xe2)[0x5590f0e5dda2]
/usr/sbin/mysqld(my_print_stacktrace+0x1bb)[0x5590f0e4465b]
/usr/sbin/mysqld(handle_fatal_signal+0x3ad)[0x5590f08de9ad]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7fec417100e0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7fec3fdaefff]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fec3fdb042a]
/usr/sbin/mysqld(+0x953c91)[0x5590f0bc7c91]
/usr/sbin/mysqld(+0x8fd1c6)[0x5590f0b711c6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7fec417064a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fec3fe64d0f]

Comment by George To [ 2019-11-30 ]

Hello MariaDB support,

We encountered the same - InnoDB: Assertion failure in thread 140581082752768 in file srv0srv.cc line 2410.

Do you have the procedure for coredump ?

Dec  1 03:14:32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex', lock var 1
Dec  1 03:14:32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0, waiters flag 1
Dec  1 03:14:32 xxxx mysqld: InnoDB: Warning: semaphore wait:
Dec  1 03:14:32 xxxx mysqld: --Thread 140581379458816 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore:
Dec  1 03:14:32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex', lock var 1
Dec  1 03:14:32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0, waiters flag 1
Dec  1 03:14:32 xxxx mysqld: InnoDB: Warning: semaphore wait:
Dec  1 03:14:32 xxxx mysqld: --Thread 140581378537216 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore:
Dec  1 03:14:32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex', lock var 1
Dec  1 03:14:32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0, waiters flag 1
Dec  1 03:14:32 xxxx mysqld: InnoDB: Warning: semaphore wait:
Dec  1 03:14:32 xxxx mysqld: --Thread 140581378230016 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore:
Dec  1 03:15:02 xxxx journal: Suppressed 921 messages from /system.slice/mysqld.service
Dec  1 03:15:02 xxxx mysqld: InnoDB: ###### Diagnostic info printed to the standard error stream
Dec  1 03:15:02 xxxx mysqld: InnoDB: Error: semaphore wait has lasted > 600 seconds
Dec  1 03:15:02 xxxx mysqld: InnoDB: We intentionally crash the server, because it appears to be hung.
Dec  1 03:15:02 xxxx mysqld: 2019-12-01 03:15:02 7fdb957fe700  InnoDB: Assertion failure in thread 140581082752768 in file srv0srv.cc line 2410
Dec  1 03:15:02 xxxx mysqld: InnoDB: We intentionally generate a memory trap.
Dec  1 03:15:02 xxxx mysqld: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Dec  1 03:15:02 xxxx mysqld: InnoDB: If you get repeated assertion failures or crashes, even
Dec  1 03:15:02 xxxx mysqld: InnoDB: immediately after the mysqld startup, there may be
Dec  1 03:15:02 xxxx mysqld: InnoDB: corruption in the InnoDB tablespace. Please refer to
Dec  1 03:15:02 xxxx mysqld: InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
Dec  1 03:15:02 xxxx mysqld: InnoDB: about forcing recovery.
Dec  1 03:15:02 xxxx mysqld: 191201  3:15:02 [ERROR] mysqld got signal 6 ;
Dec  1 03:15:02 xxxx mysqld: This could be because you hit a bug. It is also possible that this binary
Dec  1 03:15:02 xxxx mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Dec  1 03:15:02 xxxx mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Dec  1 03:15:02 xxxx mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Dec  1 03:15:02 xxxx mysqld: We will try our best to scrape up some info that will hopefully help
Dec  1 03:15:02 xxxx mysqld: diagnose the problem, but since we have already crashed,
Dec  1 03:15:02 xxxx mysqld: something is definitely wrong and this may fail.
Dec  1 03:15:02 xxxx mysqld: Server version: 10.1.41-MariaDB
Dec  1 03:15:02 xxxx mysqld: key_buffer_size=134217728
Dec  1 03:15:02 xxxx mysqld: read_buffer_size=2097152
Dec  1 03:15:02 xxxx mysqld: max_used_connections=174
Dec  1 03:15:02 xxxx mysqld: max_threads=502
Dec  1 03:15:02 xxxx mysqld: thread_count=173
Dec  1 03:15:02 xxxx mysqld: It is possible that mysqld could use up to
Dec  1 03:15:02 xxxx mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2197671 K  bytes of memory
Dec  1 03:15:02 xxxx mysqld: Hope that's ok; if not, decrease some variables in the equation.
Dec  1 03:15:02 xxxx mysqld: Thread pointer: 0x0
Dec  1 03:15:02 xxxx mysqld: Attempting backtrace. You can use the following information to find out
Dec  1 03:15:02 xxxx mysqld: where mysqld died. If you see no messages after this, something went
Dec  1 03:15:02 xxxx mysqld: terribly wrong...
Dec  1 03:15:02 xxxx mysqld: stack_bottom = 0x0 thread_stack 0x49000
Dec  1 03:15:02 xxxx mysqld: *** buffer overflow detected ***: /usr/sbin/mysqld terminated
Dec  1 03:15:02 xxxx mysqld: ======= Backtrace: =========
Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(__fortify_fail+0x37)[0x7fdc83d53527]
Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(+0x1166a2)[0x7fdc83d516a2]
Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(+0x118487)[0x7fdc83d53487]
Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(my_addr_resolve+0xd0)[0x55a270e96c40]
Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55a270e82e82]
Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x55a2709a55d5]
Dec  1 03:15:02 xxxx mysqld: /lib64/libpthread.so.0(+0xf5f0)[0x7fdc85bd45f0]
Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(gsignal+0x37)[0x7fdc83c71337]
Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(abort+0x148)[0x7fdc83c72a28]
Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(+0x941359)[0x55a270d20359]
Dec  1 03:15:02 xxxx mysqld: /lib64/libpthread.so.0(+0x7e65)[0x7fdc85bcce65]
Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(clone+0x6d)[0x7fdc83d3988d]

Comment by Sander Pilon [ 2019-12-01 ]

We have migraties to percona, wit much succes. No more crashes.

Comment by Elena Stepanova [ 2019-12-07 ]

Stack trace from george.ccto's occurrence of the failure is attached as gdb.txt

Comment by George To [ 2020-04-01 ]

F.Y.I. -
A few days ago, we have updated MariaDB from 10.1.41 to 10.1.44 , it seems the problem disappeared (for a few days already).

Comment by Marko Mäkelä [ 2020-04-07 ]

In gdb.txt (as well as the Description), I see that the server crashed while trying to produce a stack trace:

mariadb-10.1.41

#6  0x000055f142b40c40 in my_addr_resolve (ptr=<optimized out>, loc=loc@entry=0x7fdb42ffd0b0) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/my_addr_resolve.c:222
#7  0x000055f142b2ce82 in print_with_addr_resolve (n=<optimized out>, addrs=0x7fdb42ffd0d0) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/stacktrace.c:253
#8  my_print_stacktrace (stack_bottom=<optimized out>, thread_stack=299008) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/stacktrace.c:271
#9  0x000055f14264f5d5 in handle_fatal_signal (sig=6) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/sql/signal_handler.cc:166
#10 <signal handler called>
#11 0x00007fdb7082d337 in raise () from /lib64/libc.so.6
#12 0x00007fdb7082ea28 in abort () from /lib64/libc.so.6
#13 0x000055f1429ca359 in srv_error_monitor_thread () at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/storage/xtradb/srv/srv0srv.cc:2410

That is, XtraDB intentionally initiated a kill of itself after a hang. In addition, the server crashed while trying to produce a stack trace. cvicentiu should be aware of this issue and try to remove the use of functions that are not safe to use in signal handlers. There is similar crash-while-crashing output in MDEV-19310.

Now, what could cause the hang? I was going to suggest MDEV-13983, but it turns out that that regression was fixed in 10.1.41.

Most threads are waiting for dict_sys->mutex while trying to open an InnoDB table.

Thread 5 (purge) is attempting to acquire dict_operation_lock in shared mode. That should prevent others from acquiring it in exclusive mode.

A possible cause of the deadlock could be between Thread 4 and Thread 2. This strongly looks like a duplicate of MDEV-20621, which was fixed in 10.1.42. Also george.ccto’s comment supports this.

CrewOne, do you use fulltext indexes on InnoDB or XtraDB tables? If not, your crash should be due to something else. Also, you mention "update on JOINs". Were there any messages in the server error log before the crash?

paladox, jcrespo, the InnoDB diagnostic output is next to useless for diagnosing hangs. We need stack traces of all threads during the hang, and to ease debugging, the information on who is holding the mutexes that almost everyone is waiting for:

gdb -ex 'set height 0' -ex 'thread apply all backtrace' -ex 'print *dict_operation_lock' -ex 'print *dict_sys.mutex' -ex detach -ex quit -p $(pgrep -x myqsld)

Generated at Thu Feb 08 08:58:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.