[MDEV-11624] mysqld got signal 11 0x0x7f9ff8323008 Created: 2016-12-21  Updated: 2019-05-20  Resolved: 2019-05-20

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB
Affects Version/s: 10.1.17, 10.1.18, 10.1.19, 10.1.20
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Andrew Ensley Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Ubuntu 16.04 x64
2x Intel Xeon E5506 Quad-core 2.13GHz
96GB (6x16GB) DDR3 1333MHz ECC Registered



 Description   

We have a 4-server galera cluster distributed over WAN (2 in Florida, 2 in Texas). All 4 machines are essentially identical; same hardware, OS, and software loaded (2 have slightly faster CPUs). The cluster is great and runs flawlessly IF we only use servers in one location for production.

However, whenever we enable both geographic locations at the same time in production, 1 server dies within a couple of hours. The only output is what is pasted at the bottom of this issue.

It is always this 1 server (in Texas), and it is also the only server that has been running a version of MariaDB newer than 10.1.14 (10.1.17 - 10.1.20 over the past few months). The other 3 servers are still on 10.1.14. I began performing upgrades 1 server at a time after the fiasco in MDEV-10420. Until this upgraded server is stable, I don't feel comfortable upgrading the others.

I've checked everything I know to check, and can't find anything wrong, but I can reproduce it every time I re-enable our Texas location for production, usually within 2 hours. The difference seems to be accepting user connections (from our web application) at both geographic locations simultaneously.

161219 13:29:02 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.20-MariaDB-1~xenial
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=31
max_threads=10002
thread_count=9
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 22100835 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f9ff8323008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fb1ffa8b0b8 thread_stack 0x48400
InnoDB: Warning: a long semaphore wait:
--Thread 140402491964160 has waited at read0read.ic line 117 for 241.00 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: Warning: semaphore wait:
--Thread 140402491964160 has waited at read0read.ic line 117 for 241.00 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
 
=====================================
2016-12-19 13:33:19 7fa00abfc700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 67 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 169514 srv_active, 0 srv_shutdown, 89081 srv_idle
srv_master_thread log flush and writes: 258591
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1345020
--Thread 140402491964160 has waited at read0read.ic line 117 for 256.00 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
OS WAIT ARRAY INFO: signal count 1344552
Mutex spin waits 5592109, rounds 57361529, OS waits 1219317
RW-shared spins 58887, rounds 1685423, OS waits 54177
RW-excl spins 142252, rounds 950616, OS waits 18185
Spin rounds per wait: 10.26 mutex, 28.62 RW-shared, 6.68 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 508222612
Purge done for trx's n:o < 508222565 undo n:o < 0 state: running but idle
History list length 1057
LIST OF TRANSACTIONS FOR EACH SESSION:
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Warning: a long semaphore wait:
--Thread 140402491964160 has waited at read0read.ic line 117 for 272.00 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: Warning: semaphore wait:
--Thread 140402491964160 has waited at read0read.ic line 117 for 272.00 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: Warning: semaphore wait:
--Thread 140325351835392 has waited at lock0lock.cc line 6323 for 16.000 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Warning: a long semaphore wait:
--Thread 140402491964160 has waited at read0read.ic line 117 for 303.00 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: Warning: semaphore wait:
--Thread 140402491964160 has waited at read0read.ic line 117 for 303.00 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: Warning: semaphore wait:
--Thread 140325351835392 has waited at lock0lock.cc line 6323 for 47.000 seconds the semaphore:
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream



 Comments   
Comment by Elena Stepanova [ 2016-12-25 ]

jplindst,

Could you please take a look – doesn't it appear that the crash happens upon long semaphore wait output itself? We've obviously seen many "long semaphore waits" before, but they usually end up with SIGABRT, not with SIGSEGV, this is new.

Comment by Andrew Ensley [ 2017-01-17 ]

Any update on this?

Comment by Jan Lindström (Inactive) [ 2019-05-20 ]

Can you try to reproduce with more recent version and if this is still repeatable provide cluster configuration and full unedited error log.

Generated at Thu Feb 08 07:51:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.