[MDEV-10699] Mariadb crash with assertion error Created: 2016-08-29  Updated: 2017-10-11  Resolved: 2017-10-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5.33a
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: TYNDALE BANZA Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: innodb
Environment:

NAME="Amazon Linux AMI"
VERSION="2015.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2015.03"
PRETTY_NAME="Amazon Linux AMI 2015.03"
ANSI_COLOR="0;33"


Attachments: File casbptwcdb027.rar    
Issue Links:
Blocks
is blocked by MDEV-11896 thd_get_error_context_description rac... Closed
Sprint: 5.5.54

 Description   

Maria DB crashes with semaphore wait followed by ascertion error .



 Comments   
Comment by Elena Stepanova [ 2016-08-31 ]

Excerpt from the attached log

=====================================
160824 21:01:02 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 12774809 1_second, 12774413 sleeps, 1277452 10_second, 1121 background, 1120 flush
srv_master_thread log flush and writes: 5885197
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 34669673, signal count 214168191
--Thread 140430801172224 has waited at row0ins.c line 2028 for 937.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7fb605d8de78 '&block->lock'
a writer (thread id 140430801172224) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.c line 4084
Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.30/storage/xtradb/row/row0ins.c line 2028
--Thread 140419152877312 has waited at btr0cur.c line 3632 for 936.00 seconds the semaphore:
S-lock on RW-latch at 0x7fb605d8de78 '&block->lock'
a writer (thread id 140430801172224) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.c line 4084
Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.30/storage/xtradb/row/row0ins.c line 2028
...
<cut>
...
--Thread 140419150784256 has waited at btr0cur.c line 3632 for 252.00 seconds the semaphore:
S-lock on RW-latch at 0x7fb605d8de78 '&block->lock'
a writer (thread id 140430801172224) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.c line 4084
Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.30/storage/xtradb/row/row0ins.c line 2028
Mutex spin waits 162094122, rounds 809826605, OS waits 22915353
RW-shared spins 77276872, rounds 479297342, OS waits 9166519
RW-excl spins 6712811, rounds 409455743, OS waits 2259595
Spin rounds per wait: 5.00 mutex, 6.20 RW-shared, 61.00 RW-excl
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (read thread)
I/O thread 7 state: waiting for completed aio requests (read thread)
I/O thread 8 state: waiting for completed aio requests (read thread)
I/O thread 9 state: waiting for completed aio requests (read thread)
I/O thread 10 state: waiting for completed aio requests (write thread)
I/O thread 11 state: waiting for completed aio requests (write thread)
I/O thread 12 state: waiting for completed aio requests (write thread)
I/O thread 13 state: waiting for completed aio requests (write thread)
I/O thread 14 state: waiting for completed aio requests (write thread)
I/O thread 15 state: waiting for completed aio requests (write thread)
I/O thread 16 state: waiting for completed aio requests (write thread)
I/O thread 17 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0, 0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0, 0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
11733079 OS file reads, 1709580060 OS file writes, 17594964 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.40 writes/s, 0.20 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 524, seg size 526, 1049150 merges
merged operations:
 insert 762426, delete mark 70572327, delete 83457
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 21249841, node heap has 44760 buffer(s)
8777.86 hash searches/s, 514.27 non-hash searches/s
---
LOG
---
Log sequence number 14359276334189
Log flushed up to   14359276334189
Last checkpoint at  14359276334179
Max checkpoint age    339017565
Checkpoint age target 328423267
Modified age          10
Checkpoint age        10
0 pending log writes, 0 pending chkp writes
1523572186 log i/o's done, 0.20 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 11031019520; in additional pool allocated 0
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 903350784 	(169998728 + 733352056)
    Page hash           10625704 (buffer pool 0 only)
    Dictionary cache    44249462 	(42501104 + 1748358)
    File system         180736 	(82672 + 98064)
    Lock system         26638592 	(26563016 + 75576)
    Recovery system     0 	(0 + 0)
Dictionary memory allocated 1748358
Buffer pool size        655359
Buffer pool size, bytes 10737401856
Free buffers            0
Database pages          610599
Old database pages      225376
Modified db pages       1
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 29263677, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 11732882, created 10461558, written 179409726
0.00 reads/s, 0.00 creates/s, 0.10 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 610599, unzip_LRU len: 0
I/O sum[183]:cur[1], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
8 read views open inside InnoDB
---OLDEST VIEW---
Normal read view
Read view low limit trx n:o 302B3B598B
Read view up limit trx id 302B3B598A
Read view low limit trx id 302B3B598B
Read view individually stored trx ids:
Read view trx id 302B3B598A
-----------------
Main thread process no. 5215, id 140419230009088, state: sleeping
Number of rows inserted 751668373, updated 856359788, deleted 682417032, read 237915096500
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 10897.06 reads/s
------------
TRANSACTIONS
------------
Trx id counter 3031EBDA29
Purge done for trx's n:o < 302B3B5994 undo n:o < 0
History list length 7305082
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 3031EBB8C1, not started estimating records in index range
mysql tables in use 1, locked 0
MySQL thread id 99255259, OS thread handle 0x7fb5e10cb700, query id 48785934830 10.10.130.9 ws_user statistics
SELECT TIMESTAMPDIFF( MINUTE, timestamp, UTC_TIMESTAMP() ) AS Age
FROM radar
WHERE loc_type = 'RADAR'
AND loc_code = '002'
AND data_source = 'BOM'
AND format = 'POLAR_ASCII'
AND return_type = 'REFLECTIVITY'
ORDER BY timestamp DESC, loc_code
LIMIT 1
...
<cut>
...
---TRANSACTION 3031E2B404, not started estimating records in index range
mysql tables in use 1, locked 0
MySQL thread id 99200853, OS thread handle 0x7fb5e4269700, query id 48770282047 10.10.2.29 ws_user statistics
SELECT TIMESTAMPDIFF( MINUTE, timestamp, UTC_TIMESTAMP() ) AS Age
FROM radar
WHERE loc_type = 'RADAR'
AND loc_code = '002'
AND data_source = 'BOM'
AND format = 'POLAR_ASCII'
AND return_type = 'REFLECTIVITY'
ORDER BY timestamp DESC, loc_code
LIMIT 1
---TRANSACTION 3031E29FDC, not started estimating records in index range
mysql tables in use 1, locked 0
MySQL thread id 99200600, OS thread handle 0x7fb5e1ba1700, query id 48770158687 10.10.130.9 ws_user statistics
SELECT TIMESTAMPDIFF( MINUTE, timestamp, UTC_TIMESTAMP() ) AS Age
FROM radar
WHERE loc_type = 'RADAR'
AND loc_code = '002'
AND data_source = 'BOM'
AND format = 'POLAR_ASCII'
AND return_type = 'REFLECTIVITY'
ORDER BY timestamp DESC, loc_code
LIMIT 1
---TRANSACTION 3031EBDA28, not started
MySQL thread id 99173022, OS thread handle 0x7fb5e4543700, query id 48786013669 10.10.130.9 elders_user
---TRANSACTION 3031EBDA1D, not started
MySQL thread id 99168796, OS thread handle 0x7fb5e1959700, query id 48786013658 10.10.130.9 public
---TRANSACTION 3031EBDA15, not started
MySQL thread id 99103726, OS thread handle 0x7fb5e0da8700, query id 48786013577 10.10.130.9 public
---TRANSACTION 3031EBD7B3, not started
MySQL thread id 94268674, OS thread handle 0x7fb5e2d4f700, query id 48786012143 10.10.130.9 uwas_user
---TRANSACTION 3031EBD8A5, not started
MySQL thread id 43095182, OS thread handle 0x7fb5e3ab6700, query id 48786012918 10.10.2.29 uwas_user
---TRANSACTION 3031EBC602, ACTIVE 252 sec estimating records in index range
mysql tables in use 1, locked 0
MySQL thread id 99257713, OS thread handle 0x7fb5e19a2700, query id 48785997581 10.10.130.9 public statistics
select data_source, format, return_type, loc_type, loc_code, timestamp, image, convert_tz(timestamp, 'UTC', 'Australia/Sydney') as timestamp_system from radar where loc_type = 'radar' and loc_code = '002' and data_source = 'BOM' and format = 'POLAR_ASCII' and return_type = 'reflectivity' and timestamp >= '2016-08-23 10:56' order by timestamp desc limit 21
---TRANSACTION 3031E82B56, ACTIVE 708 sec estimating records in index range
mysql tables in use 2, locked 0
MySQL thread id 99218874, OS thread handle 0x7fb5e39db700, query id 48779632744 10.10.130.9 public statistics
select r.loc_type, r.loc_code, r.data_source, r.format, r.return_type, r.timestamp, r.image, convert_tz(r.timestamp, 'UTC', 'Australia/Sydney') as timestamp_system, abs(time_to_sec(timediff(r.timestamp, '2016-08-24 10:40'))) as dt from radar r inner join locations l on l.sType = r.loc_type and l.sCode = r.loc_code where r.data_source = 'BOM' and r.format = 'POLAR_ASCII' and r.return_type = 'REFLECTIVITY' and r.loc_type = 'RADAR' and r.loc_code not in ( '054', '051', '050', '046', '026', '010', '034', '043', '011', '038', '012', '020', '021', '037' ) and r.timestamp >= subdate('2016-08-24 10:40
---TRANSACTION 3031E82A39, ACTIVE 708 sec estimating records in index range
mysql tables in use 2, locked 0
MySQL thread id 99218859, OS thread handle 0x7fb5e2fe0700, query id 48779625587 10.10.130.9 public statistics
select r.loc_type, r.loc_code, r.data_source, r.format, r.return_type, r.timestamp, r.image, convert_tz(r.timestamp, 'UTC', 'Australia/Sydney') as timestamp_system, abs(time_to_sec(timediff(r.timestamp, '2016-08-24 10:40'))) as dt from radar r inner join locations l on l.sType = r.loc_type and l.sCode = r.loc_code where r.data_source = 'BOM' and r.format = 'POLAR_ASCII' and r.return_type = 'REFLECTIVITY' and r.loc_type = 'RADAR' and r.loc_code not in ( '054', '051', '050', '046', '026', '010', '034', '043', '011', '038', '012', '020', '021', '037' ) and r.timestamp >= subdate('2016-08-24 10:40
---TRANSACTION 3031E79EFF, ACTIVE 730 sec starting index read
mysql tables in use 1, locked 0
MySQL thread id 99211883, OS thread handle 0x7fb5e3793700, query id 48778651629 10.10.2.29 public statistics
select data_source, format, return_type, loc_type, loc_code, timestamp, image, convert_tz(timestamp, 'UTC', 'Australia/Sydney') as timestamp_system from radar where loc_type = 'radar' and loc_code = '002' and data_source = 'BOM' and format = 'POLAR_ASCII' and return_type = 'reflectivity' and timestamp = '2016-08-24 10:42:00.000' order by timestamp
Trx read view will not see trx with id >= 3031E79F00, sees < 302B3B598A
---TRANSACTION 3031E68F5A, ACTIVE 773 sec estimating records in index range
mysql tables in use 2, locked 0
MySQL thread id 99213548, OS thread handle 0x7fb5e11ef700, query id 48776780163 10.10.130.9 public statistics
select r.loc_type, r.loc_code, r.data_source, r.format, r.return_type, r.timestamp, r.image, convert_tz(r.timestamp, 'UTC', 'Australia/Sydney') as timestamp_system, abs(time_to_sec(timediff(r.timestamp, '2016-08-24 10:40'))) as dt from radar r inner join locations l on l.sType = r.loc_type and l.sCode = r.loc_code where r.data_source = 'BOM' and r.format = 'POLAR_ASCII' and r.return_type = 'REFLECTIVITY' and r.loc_type = 'RADAR' and r.loc_code not in ( '054', '051', '050', '046', '026', '010', '034', '043', '011', '038', '012', '020', '021', '037' ) and r.timestamp >= subdate('2016-08-24 10:40
---TRANSACTION 3031E679AA, ACTIVE 777 sec starting index read
mysql tables in use 1, locked 0
MySQL thread id 99211672, OS thread handle 0x7fb5e2f05700, query id 48776489557 10.10.2.29 public statistics
select data_source, format, return_type, loc_type, loc_code, timestamp, image, convert_tz(timestamp, 'UTC', 'Australia/Sydney') as timestamp_system from radar where loc_type = 'radar' and loc_code = '002' and data_source = 'BOM' and format = 'POLAR_ASCII' and return_type = 'reflectivity' and timestamp = '2016-08-24 10:42:00.000' order by timestamp
Trx read view will not see trx with id >= 3031E679AB, sees < 302B3B598A
...
<cut>
...
---TRANSACTION 3031E5E82A, ACTIVE 799 sec estimating records in index range
mysql tables in use 2, locked 0
MySQL thread id 99211475, OS thread handle 0x7fb5e45d5700, query id 48775630786 10.10.130.9 public statistics
select r.loc_type, r.loc_code, r.data_source, r.format, r.return_type, r.timestamp, r.image, convert_tz(r.timestamp, 'UTC', 'Australia/Sydney') as timestamp_system, abs(time_to_sec(timediff(r.timestamp, '2016-08-24 10:10'))) as dt from radar r inner join locations l on l.sType = r.loc_type and l.sCode = r.loc_code where r.data_source = 'BOM' and r.format = 'POLAR_ASCII' and r.return_type = 'REFLECTIVITY' and r.loc_type = 'RADAR' and r.loc_code not in ( '054', '051', '050', '046', '026', '010', '034', '043', '011', '038', '012', '020', '021', '037' ) and r.timestamp >= subdate('2016-08-24 10:10
---TRANSACTION 3031E5E80F, ACTIVE 799 sec estimating records in index range
mysql tables in use 2, locked 0
MySQL thread id 99211473, OS thread handle 0x7fb5e2677700, query id 48775629828 10.10.130.9 public statistics
select r.loc_type, r.loc_code, r.data_source, r.format, r.return_type, r.timestamp, r.image, convert_tz(r.timestamp, 'UTC', 'Australia/Sydney') as timestamp_system, abs(time_to_sec(timediff(r.timestamp, '2016-08-24 10:10'))) as dt from radar r inner join locations l on l.sType = r.loc_type and l.sCode = r.loc_code where r.data_source = 'BOM' and r.format = 'POLAR_ASCII' and r.return_type = 'REFLECTIVITY' and r.loc_type = 'RADAR' and r.loc_code not in ( '054', '051', '050', '046', '026', '010', '034', '043', '011', '038', '012', '020', '021', '037' ) and r.timestamp >= subdate('2016-08-24 10:10
---TRANSACTION 3031E2985B, ACTIVE 937 sec inserting
mysql tables in use 1, locked 1
1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 1
MySQL thread id 30, OS thread handle 0x7fb89804f700, query id 48770108518 update
INSERT INTO radar
( loc_type, loc_code, data_source, format, return_type, levels, timestamp, image,
create_time, create_system, create_version, create_source )
VALUES ( 'RADAR', '002', 'BOM', 'POLAR_ASCII', 'REFLECTIVITY', '16', '2016-08-24 10:42', '/IMAGE:   02 1608241042\n/IMAGESCANS: 1\n/IMAGESIZE:     6580\n/SCAN    1:   02 1608241042  2   0.5  0  1        143       6409\n/IMAGEHEADER END:\nCOUNTRY: 036\nNAME: Melb\nSTNID: 02\nSTN_NUM: 087031\nLATITUDE: 37.8520\nLONGITUDE: 144.7520\nHEIGHT: 14\nDATE: 23716\nTIME: 10.42\nTIMESTAMP: 20160824104228\nVERS: 2.64_RapicComms\nFREQUENCY: 2860\nPEA
TABLE LOCK table `weather`.`radar` trx id 3031E2985B lock mode IX
---TRANSACTION 3031E1C409, ACTIVE 962 sec
mysql tables in use 5, locked 0
MySQL thread id 99196089, OS thread handle 0x7fb5e3efd700, query id 48768958834 10.10.2.29 wz_user User sleep
select loc.sType, loc.sCode, loc.sScreenName from locations loc inner join location_mappings_2008 distx on distx.custom_code = 'TWC' and distx.type1 = loc.sType and distx.code1 = loc.sCode and distx.context = 'districtforecast' inner join locations dist on dist.sType = distx.type2 and dist.sCode = distx.code2 inner join location_positions lp on lp.sCustomCode = 'WZ' and lp.sMapType = 'fcast_360' and lp.sContextType = 'wzstate' and lp.sContextCode = 'wa' and sleep(30) -- ' and lp.sLocType = loc.sType and lp.sLocCode = loc.sCode and lp.iActive = 1 inner join forecasts_daily fc on fc.sLocType = l
Trx read view will not see trx with id >= 3031E1C40A, sees < 302B3B598A
---TRANSACTION 3030EF3970, ACTIVE 9068 sec starting index read
mysql tables in use 1, locked 0
MySQL thread id 98531399, OS thread handle 0x7fb5e461e700, query id 48442968068 10.10.2.29 public User sleep
select data_source, format, return_type, loc_type, loc_code, timestamp, image, convert_tz(timestamp, 'UTC', 'Australia/Sydney') as timestamp_system from radar where loc_type = 'radar' and loc_code = '002' and sleep(30) -- ' and data_source = 'BOM' and format = 'POLAR_ASCII' and return_type = 'reflectivity' and timestamp >= '2016-08-23 08:29' order by timestamp desc limit 3
Trx read view will not see trx with id >= 3030EF3971, sees < 302B3B598A
---TRANSACTION 302B75F84D, ACTIVE 52677 sec
mysql tables in use 5, locked 0
MySQL thread id 94999947, OS thread handle 0x7fb5e3b91700, query id 46629323500 10.10.130.9 wz_user User sleep
ep(30) -- ' and lp.sLocType = loc.sType and lp.sLocCode = loc.sCode and lp.iActive = 1 inner join forecasts_daily fc on fc.sLocType = l
Trx read view will not see trx with id >= 302B75F84E, sees < 302B3B598A
---TRANSACTION 302B3B598A, ACTIVE 54890 sec
mysql tables in use 5, locked 0
MySQL thread id 94846719, OS thread handle 0x7fb5e2ebc700, query id 46559275412 10.10.2.29 wz_user User sleep
select loc.sType, loc.sCode, loc.sScreenName from locations loc inner join location_mappings_2008 distx on distx.custom_code = 'TWC' and distx.type1 = loc.sType and distx.code1 = loc.sCode and distx.context = 'districtforecast' inner join locations dist on dist.sType = distx.type2 and dist.sCode = distx.code2 inner join location_positions lp on lp.sCustomCode = 'WZ' and lp.sMapType = 'fcast_360' and lp.sContextType = 'wzstate' and lp.sContextCode = 'wa' and sleep(30) -- ' and lp.sLocType = loc.sType and lp.sLocCode = loc.sCode and lp.iActive = 1 inner join forecasts_daily fc on fc.sLocType = l
Trx read view will not see trx with id >= 302B3B598B, sees < 302B3B598B
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
160824 21:01:18  InnoDB: Assertion failure in thread 140419255187200 in file srv0srv.c line 2915
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
160824 21:01:18 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 5.5.30-MariaDB-log
key_buffer_size=262144
read_buffer_size=8388608
max_used_connections=201
max_threads=202
thread_count=200
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2485267 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
(my_addr_resolve failure: fork)
/usr/sbin/mysqld(my_print_stacktrace+0x2e) [0xa7f34e]
/usr/sbin/mysqld(handle_fatal_signal+0x40b) [0x6cb4ab]
/lib64/libpthread.so.0(+0xf130) [0x7fb89d17d130]
/lib64/libc.so.6(gsignal+0x39) [0x7fb89b9315c9]
/lib64/libc.so.6(abort+0x148) [0x7fb89b932cd8]
/usr/sbin/mysqld() [0x86e195]
/lib64/libpthread.so.0(+0x7df3) [0x7fb89d175df3]
/lib64/libc.so.6(clone+0x6d) [0x7fb89b9f21ad]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Comment by Elena Stepanova [ 2016-08-31 ]

The real version in the log is 5.5.30, but it's not on the drop-down list, I assume that's why it was set to 5.5.33a.

The version is rather old, and long semaphore wait problems have generally non-specific appearance, but here is one that looks similar:
https://bugs.launchpad.net/percona-server/+bug/1576128
At least, it also has numerous threads in {[estimating records in index range}} state – it's not something I remember seeing often.

jplindst, please take a look to check if you see here anything that can be used for further analysis, otherwise I guess we'll consider it an upstream bug for now and see what XtraDB comes up with.

Comment by Marko Mäkelä [ 2016-12-13 ]

This is the interesting part:

OS WAIT ARRAY INFO: reservation count 34669673, signal count 214168191
--Thread 140430801172224 has waited at row0ins.c line 2028 for 937.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7fb605d8de78 '&block->lock'
a writer (thread id 140430801172224) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff

It seems like some kind of starvation to me, or alternatively a bug in the implementation or the use of XtraDB rw-latches.
This thread is attempting to acquire an exclusive page latch, which should block subsequent RW_S_LATCH requests.
The thread is also marked as the X-latch holder. If it is actually holding the RW_X_LATCH, it should instantaneously be able to increase its recursive x-latch count.
But maybe this thread only is marked as waiting for RW_X_LATCH, and some earlier thread (we cannot see that easily) is still holding RW_S_LATCH or forgot to release it.

Is it possible to get a core dump for this?

Comment by Sergey Vojtovich [ 2016-12-21 ]

Taking into account number of pending S-locks on the same latch, I'd say second assumption looks more probable.

This doesn't immediately look like a bug in rw-latches implementation to me. Famous memory barrier issues had different symptoms with rw-locks.

The following transaction look suspicious:

---TRANSACTION 302B3B598A, ACTIVE 54890 sec
mysql tables in use 5, locked 0
MySQL thread id 94846719, OS thread handle 0x7fb5e2ebc700, query id 46559275412 10.10.2.29 wz_user User sleep
select loc.sType, loc.sCode, loc.sScreenName from locations loc inner join location_mappings_2008 distx on distx.custom_code = 'TWC' and distx.type1 = loc.sType and distx.code1 = loc.sCode and distx.context = 'districtforecast' inner join locations dist on dist.sType = distx.type2 and dist.sCode = distx.code2 inner join location_positions lp on lp.sCustomCode = 'WZ' and lp.sMapType = 'fcast_360' and lp.sContextType = 'wzstate' and lp.sContextCode = 'wa' and sleep(30) -- ' and lp.sLocType = loc.sType and lp.sLocCode = loc.sCode and lp.iActive = 1 inner join forecasts_daily fc on fc.sLocType = l
Trx read view will not see trx with id >= 302B3B598B, sees < 302B3B598B

It is running for over 15 hours, has many joins and has sleep(30). Can this be cause for a long S-lock?

It should be possible to reduce scope of the search by eliminating threads waiting for S-locks: they're not allowed to hold S-locks on the same latch (covered by assertions in debug builds). Doable, but annoying taking into account number of threads.

Comment by Marko Mäkelä [ 2017-01-30 ]

InnoDB should generally never hold any latches when returning control to the SQL layer. The only exception is the btr_search_latch, which is explicitly released by a handlerton method release_temporary_latches. And that delayed release has been removed in MySQL 8.0; maybe we should do that too.

So, the long-running transaction should not explain this, other than possibly by causing rarely-visited code paths to be executed more frequently, and potentially exposing bugs in those code paths that fail to release a buffer page latch. Normally these latch acquisitions are recorded in the mini-transaction memo and released by mtr_commit(). It is of course possible that some code path is forgetting the mtr_commit().

Comment by Marko Mäkelä [ 2017-08-10 ]

For the record, release_temporary_latches was removed in MariaDB 10.2.7.

Generated at Thu Feb 08 07:44:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.