Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.1.17, 10.1.18, 10.1.19, 10.1.20
-
None
-
Ubuntu 16.04 x64
2x Intel Xeon E5506 Quad-core 2.13GHz
96GB (6x16GB) DDR3 1333MHz ECC Registered
Description
We have a 4-server galera cluster distributed over WAN (2 in Florida, 2 in Texas). All 4 machines are essentially identical; same hardware, OS, and software loaded (2 have slightly faster CPUs). The cluster is great and runs flawlessly IF we only use servers in one location for production.
However, whenever we enable both geographic locations at the same time in production, 1 server dies within a couple of hours. The only output is what is pasted at the bottom of this issue.
It is always this 1 server (in Texas), and it is also the only server that has been running a version of MariaDB newer than 10.1.14 (10.1.17 - 10.1.20 over the past few months). The other 3 servers are still on 10.1.14. I began performing upgrades 1 server at a time after the fiasco in MDEV-10420. Until this upgraded server is stable, I don't feel comfortable upgrading the others.
I've checked everything I know to check, and can't find anything wrong, but I can reproduce it every time I re-enable our Texas location for production, usually within 2 hours. The difference seems to be accepting user connections (from our web application) at both geographic locations simultaneously.
161219 13:29:02 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
|
Server version: 10.1.20-MariaDB-1~xenial
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=31
|
max_threads=10002
|
thread_count=9
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 22100835 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
|
Thread pointer: 0x0x7f9ff8323008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7fb1ffa8b0b8 thread_stack 0x48400
|
InnoDB: Warning: a long semaphore wait:
|
--Thread 140402491964160 has waited at read0read.ic line 117 for 241.00 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: Warning: semaphore wait:
|
--Thread 140402491964160 has waited at read0read.ic line 117 for 241.00 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
|
InnoDB: Pending preads 0, pwrites 0
|
|
|
=====================================
|
2016-12-19 13:33:19 7fa00abfc700 INNODB MONITOR OUTPUT
|
=====================================
|
Per second averages calculated from the last 67 seconds
|
-----------------
|
BACKGROUND THREAD
|
-----------------
|
srv_master_thread loops: 169514 srv_active, 0 srv_shutdown, 89081 srv_idle
|
srv_master_thread log flush and writes: 258591
|
----------
|
SEMAPHORES
|
----------
|
OS WAIT ARRAY INFO: reservation count 1345020
|
--Thread 140402491964160 has waited at read0read.ic line 117 for 256.00 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
OS WAIT ARRAY INFO: signal count 1344552
|
Mutex spin waits 5592109, rounds 57361529, OS waits 1219317
|
RW-shared spins 58887, rounds 1685423, OS waits 54177
|
RW-excl spins 142252, rounds 950616, OS waits 18185
|
Spin rounds per wait: 10.26 mutex, 28.62 RW-shared, 6.68 RW-excl
|
------------
|
TRANSACTIONS
|
------------
|
Trx id counter 508222612
|
Purge done for trx's n:o < 508222565 undo n:o < 0 state: running but idle
|
History list length 1057
|
LIST OF TRANSACTIONS FOR EACH SESSION:
|
InnoDB: ###### Diagnostic info printed to the standard error stream
|
InnoDB: Warning: a long semaphore wait:
|
--Thread 140402491964160 has waited at read0read.ic line 117 for 272.00 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: Warning: semaphore wait:
|
--Thread 140402491964160 has waited at read0read.ic line 117 for 272.00 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: Warning: semaphore wait:
|
--Thread 140325351835392 has waited at lock0lock.cc line 6323 for 16.000 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
|
InnoDB: Pending preads 0, pwrites 0
|
InnoDB: ###### Diagnostic info printed to the standard error stream
|
InnoDB: Warning: a long semaphore wait:
|
--Thread 140402491964160 has waited at read0read.ic line 117 for 303.00 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: Warning: semaphore wait:
|
--Thread 140402491964160 has waited at read0read.ic line 117 for 303.00 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: Warning: semaphore wait:
|
--Thread 140325351835392 has waited at lock0lock.cc line 6323 for 47.000 seconds the semaphore:
|
Mutex at 0x7fd245965868 '&trx_sys->mutex', lock var 1
|
Last time reserved by thread 140402541828864 in file not yet reserved line 0, waiters flag 1
|
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
|
InnoDB: Pending preads 0, pwrites 0
|
InnoDB: ###### Diagnostic info printed to the standard error stream
|