[MDEV-8337] Disabling Change Buffer Dynamically Can Corrupt InnoDB Created: 2015-06-18  Updated: 2016-04-22  Resolved: 2016-04-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.19
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Kenny Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: need_feedback
Environment:

debian wheezy 7.2


Attachments: File bt.applyall     File bt.applyall.full    
Sprint: 10.0.22, 10.1.9-1, 10.1.9-2

 Description   

I changed some dynamic mysql settings, MySQL then got stuck on &dict_operation_lock
stop slave, show slave status, all hung....

MySQL died after 600+ seconds, and when starting again it was corrupted.

The very weird thing that happened is that 'checkpoint age' went above 'max checkpoint age', as you can see below.

This is what I originally did:

~18:45
set global innodb_change_buffering=none
set global innodb_change_buffer_max_size=0
 
~19:00:
set global innodb_flush_neighbors=0
set global innodb_io_capacity=2000
set global innodb_io_capacity_max=4000

 
Status:
=====================================
2015-06-15 19:01:40 7f332a54d700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 6 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 32047 srv_active, 0 srv_shutdown, 0 srv_idle
srv_master_thread log flush and writes: 32046
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1056771
--Thread 139377363625728 has waited at row0purge.cc line 754 for 124.00 seconds the semaphore:
S-lock on RW-latch at 0x7f332d85bd40 '&dict_operation_lock'
a writer (thread id 139377338447616) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 754
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.19/storage/xtradb/dict/dict0stats.cc line 2391
OS WAIT ARRAY INFO: signal count 2085166
Mutex spin waits 10415704, rounds 35180513, OS waits 259109
RW-shared spins 1806036, rounds 26717440, OS waits 448318
RW-excl spins 726736, rounds 17572495, OS waits 308890
Spin rounds per wait: 3.38 mutex, 14.79 RW-shared, 24.18 RW-excl

the replication thread above hang ^^
(no other workload besides from myself and the monitoring)

---
LOG
---
Log sequence number 38171722465891
Log flushed up to   38171718868849
Pages flushed up to 38169449975858
Last checkpoint at  38169277050527
Max checkpoint age    2173616088
Checkpoint age target 2105690586
Modified age          2272490033
Checkpoint age        2445415364

check the checkpoint age ^^ it grows above the max checkpoint age.. I have never seen that...

MySQL crashes after +600 seconds....

If we then try to start MySQL... I get InnoDB Corruption...

Here's the log:

InnoDB: Warning: a long semaphore wait:
--Thread 139377363625728 has waited at row0purge.cc line 754 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x7f332d85bd40 '&dict_operation_lock'
a writer (thread id 139377338447616) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 754
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.19/storage/xtradb/dict/dict0stats.cc line 2391
InnoDB: Warning: semaphore wait:
--Thread 139377363625728 has waited at row0purge.cc line 754 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x7f332d85bd40 '&dict_operation_lock'
a writer (thread id 139377338447616) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 754
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.19/storage/xtradb/dict/dict0stats.cc line 2391
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
2015-06-15 19:03:52 7ec3533fc700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 18 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 32047 srv_active, 0 srv_shutdown, 0 srv_idle
srv_master_thread log flush and writes: 32046
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1144040
--Thread 139377363625728 has waited at row0purge.cc line 754 for 256.00 seconds the semaphore:
S-lock on RW-latch at 0x7f332d85bd40 '&dict_operation_lock'
a writer (thread id 139377338447616) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 754
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.19/storage/xtradb/dict/dict0stats.cc line 2391
OS WAIT ARRAY INFO: signal count 2277512
Mutex spin waits 10648226, rounds 36452467, OS waits 276298
RW-shared spins 1964714, rounds 29203557, OS waits 488786
RW-excl spins 751320, rounds 19137297, OS waits 335185
Spin rounds per wait: 3.42 mutex, 14.86 RW-shared, 25.47 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 25231378400
Purge done for trx's n:o < 25231378391 undo n:o < 0 state: running
History list length 2068
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 31911, OS thread handle 0x7f332a54d700, query id 95931899 localhost monitoring Killing slave
stop slave
---TRANSACTION 0, not started
MySQL thread id 31294, OS thread handle 0x7f332a458700, query id 95931232 localhost monitoring cleaning up
---TRANSACTION 0, not started
MySQL thread id 27950, OS thread handle 0x7f332a489700, query id 94905598 localhost monitoring cleaning up
---TRANSACTION 0, not started
MySQL thread id 26534, OS thread handle 0x7f332a5e0700, query id 81668923 localhost monitoring cleaning up
---TRANSACTION 0, not started
MySQL thread id 23535, OS thread handle 0x7f332a4ba700, query id 77440689 localhost monitoring cleaning up
---TRANSACTION 0, not started
MySQL thread id 18311, OS thread handle 0x7f332c472700, query id 83833644 192.168.10.98 basar cleaning up
---TRANSACTION 25176649341, not started
MySQL thread id 13490, OS thread handle 0x7f332c567700, query id 39325280 localhost monitoring cleaning up
---TRANSACTION 25161437356, not started
MySQL thread id 7851, OS thread handle 0x7f332a57e700, query id 23520106 localhost monitoring cleaning up
---TRANSACTION 25138080305, not started
MySQL thread id 49, OS thread handle 0x7f332c536700, query id 0 Waiting for master to send event
---TRANSACTION 25138080003, not started
MySQL thread id 1, OS thread handle 0x7f332c5c9700, query id 0 Waiting for background binlog tasks
---TRANSACTION 25231378399, ACTIVE 212 sec
2 lock struct(s), heap size 360, 2 row lock(s), undo log entries 2
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
---TRANSACTION 25231378397, ACTIVE 222 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 1
MySQL thread id 50, OS thread handle 0x7f332c505700, query id 95931231 After opening tables
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: complete io for buf page (read thread)
I/O thread 3 state: complete io for buf page (read thread)
I/O thread 4 state: complete io for buf page (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 141 [31, 48, 38, 24] , aio writes: 3 [0, 0, 3, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
17258393 OS file reads, 119962887 OS file writes, 1331255 OS fsyncs
4242.71 reads/s, 8192 avg bytes/read, 2228.88 writes/s, 62.94 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 289363, free list len 5056703, seg size 5346067, 5307139 merges
merged operations:
insert 30319500, delete mark 3853746, delete 3409100
discarded operations:
insert 0, delete mark 0, delete 0
0.00 hash searches/s, 17819.34 non-hash searches/s
---
LOG
---
Log sequence number 38171503789955
Log flushed up to   38171503147091
Pages flushed up to 38169318504989
Last checkpoint at  38169277050527
Max checkpoint age    2173616088
Checkpoint age target 2105690586
Modified age          2185284966
Checkpoint age        2226739428
0 pending log writes, 0 pending chkp writes
93340556 log i/o's done, 0.50 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 450552135680; in additional pool allocated 0
Total memory allocated by read views 400
Internal hash tables (constant factor + variable factor)
   Adaptive hash index 7030300544      (6969921304 + 60379240)
   Page hash           54453352 (buffer pool 0 only)
   Dictionary cache    1744717883      (1742481904 + 2235979)
   File system         948616  (812272 + 136344)
   Lock system         1133959960      (1133955176 + 4784)
   Recovery system     0       (0 + 0)
Dictionary memory allocated 2235979
Buffer pool size        26869752
Buffer pool size, bytes 440234016768
Free buffers            1543851
Database pages          17461313
Old database pages      6445837
Modified db pages       4017882
Percent of dirty pages(LRU & free pages): 21.141
Max dirty pages percent: 75.000
Pending reads 142
Pending writes: LRU 0, flush list 5, single page 0
Pages made young 866317, not young 0
5.78 youngs/s, 0.00 non-youngs/s
Pages read 17258003, created 203168, written 26153839
4244.76 reads/s, 0.00 creates/s, 2190.04 writes/s
Buffer pool hit rate 969 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 17461313, unzip_LRU len: 15721802
I/O sum[0]:cur[271432], unzip sum[0]:cur[516176]
----------------------
INDIVIDUAL BUFFER POOL INFO
----------------------
---BUFFER POOL 0
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            306515
Database pages          2114209
Old database pages      780460
Modified db pages       480184
Percent of dirty pages(LRU & free pages): 19.836
Max dirty pages percent: 75.000
Pending reads 3
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 103241, not young 0
0.89 youngs/s, 0.00 non-youngs/s
Pages read 2067330, created 46876, written 4603108
503.25 reads/s, 0.00 creates/s, 258.32 writes/s
Buffer pool hit rate 989 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2114209, unzip_LRU len: 1875082
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
---BUFFER POOL 1
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            174424
Database pages          2194238
Old database pages      810002
Modified db pages       507323
Percent of dirty pages(LRU & free pages): 21.418
Max dirty pages percent: 75.000
Pending reads 29
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 108435, not young 0
0.61 youngs/s, 0.00 non-youngs/s
Pages read 2171118, created 23091, written 3074893
530.80 reads/s, 0.00 creates/s, 277.98 writes/s
Buffer pool hit rate 950 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2194238, unzip_LRU len: 1979222
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
---BUFFER POOL 2
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            155456
Database pages          2206778
Old database pages      814631
Modified db pages       512286
Percent of dirty pages(LRU & free pages): 21.686
Max dirty pages percent: 75.000
Pending reads 17
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 109491, not young 0
0.61 youngs/s, 0.00 non-youngs/s
Pages read 2183081, created 23680, written 3084998
528.92 reads/s, 0.00 creates/s, 267.10 writes/s
Buffer pool hit rate 969 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2206778, unzip_LRU len: 1992095
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
---BUFFER POOL 3
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            185198
Database pages          2186920
Old database pages      807301
Modified db pages       502713
Percent of dirty pages(LRU & free pages): 21.193
Max dirty pages percent: 75.000
Pending reads 31
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 108819, not young 0
0.89 youngs/s, 0.00 non-youngs/s
Pages read 2164414, created 22475, written 3079785
541.36 reads/s, 0.00 creates/s, 255.65 writes/s
Buffer pool hit rate 948 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2186920, unzip_LRU len: 1972324
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
---BUFFER POOL 4
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            180871
Database pages          2189403
Old database pages      808217
Modified db pages       505407
Percent of dirty pages(LRU & free pages): 21.323
Max dirty pages percent: 75.000
Pending reads 15
Pending writes: LRU 0, flush list 3, single page 0
Pages made young 108851, not young 0
0.78 youngs/s, 0.00 non-youngs/s
Pages read 2166708, created 22680, written 3076868
551.97 reads/s, 0.00 creates/s, 293.87 writes/s
Buffer pool hit rate 954 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2189403, unzip_LRU len: 1975928
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
---BUFFER POOL 5
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            170970
Database pages          2196200
Old database pages      810726
Modified db pages       507713
Percent of dirty pages(LRU & free pages): 21.448
Max dirty pages percent: 75.000
Pending reads 23
Pending writes: LRU 0, flush list 2, single page 0
Pages made young 109639, not young 0
1.06 youngs/s, 0.00 non-youngs/s
Pages read 2174763, created 21414, written 3074767
538.25 reads/s, 0.00 creates/s, 292.71 writes/s
Buffer pool hit rate 967 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2196200, unzip_LRU len: 1982171
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
---BUFFER POOL 6
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            192186
Database pages          2182484
Old database pages      805663
Modified db pages       500428
Percent of dirty pages(LRU & free pages): 21.074
Max dirty pages percent: 75.000
Pending reads 13
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 108655, not young 0
0.39 youngs/s, 0.00 non-youngs/s
Pages read 2160463, created 22008, written 3072401
522.19 reads/s, 0.00 creates/s, 283.32 writes/s
Buffer pool hit rate 971 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2182484, unzip_LRU len: 1967125
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
---BUFFER POOL 7
Buffer pool size        3358719
Buffer pool size, bytes 55029252096
Free buffers            178231
Database pages          2191081
Old database pages      808837
Modified db pages       501828
Percent of dirty pages(LRU & free pages): 21.180
Max dirty pages percent: 75.000
Pending reads 11
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 109186, not young 0
0.56 youngs/s, 0.00 non-youngs/s
Pages read 2170126, created 20944, written 3087019
528.03 reads/s, 0.00 creates/s, 261.10 writes/s
Buffer pool hit rate 945 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2191081, unzip_LRU len: 1977855
I/O sum[0]:cur[33929], unzip sum[0]:cur[64522]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
2 RW transactions active inside InnoDB
0 RO transactions active inside InnoDB
2 out of 1000 descriptors used
Main thread process no. 14695, id 139377372018432, state: doing insert buffer merge
Number of rows inserted 30151297, updated 17208774, deleted 146396, read 40041959
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 46640217, updated 0, deleted 46640217, read 46640218
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 46640217, updated 0, deleted 46640217, read 46640218
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
 
…. (plenty of more of these) …
 
015-06-15 19:15:28 7ec55fffb700 InnoDB: ERROR: the age of the last checkpoint is 2961997331,
InnoDB: which exceeds the log group capacity 2415915418.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2015-06-15 19:15:29 7ec353bfd700  InnoDB: Assertion failure in thread 139377388803840 in file srv0srv.cc line 2196
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150615 19:15:29 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.0.19-MariaDB-1~wheezy-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=30
max_threads=5052
thread_count=30
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15636003 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
Fatal signal 11 while backtracing
150615 19:16:05 mysqld_safe Number of processes running now: 0
150615 19:16:05 mysqld_safe mysqld restarted
150615 19:16:07 [Note] /usr/sbin/mysqld (mysqld 10.0.19-MariaDB-1~wheezy-log) starting as process 25650 ...
150615 19:16:07 [Note] InnoDB: Using mutexes to ref count buffer pool pages
150615 19:16:07 [Note] InnoDB: The InnoDB memory heap is disabled
150615 19:16:07 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
150615 19:16:07 [Note] InnoDB: Memory barrier is not used
150615 19:16:07 [Note] InnoDB: Compressed tables use zlib 1.2.7
150615 19:16:07 [Note] InnoDB: Using Linux native AIO
150615 19:16:07 [Note] InnoDB: Using CPU crc32 instructions
150615 19:16:07 [Note] InnoDB: Initializing buffer pool, size = 410.0G
150615 19:16:28 [Note] InnoDB: Completed initialization of buffer pool
150615 19:16:32 [Note] InnoDB: Highest supported file format is Barracuda.
150615 19:16:32 [Note] InnoDB: The log sequence numbers 38104412999604 and 38104412999604 in ibdata files do not match the log sequence number 38170086942868 in the ib_logfiles!
150615 19:16:32 [Note] InnoDB: Database was not shutdown normally!
150615 19:16:32 [Note] InnoDB: Starting crash recovery.
150615 19:16:32 [Note] InnoDB: Reading tablespace information from the .ibd files...
150615 19:16:33 [Note] InnoDB: Restoring possible half-written data pages
150615 19:16:33 [Note] InnoDB: from the doublewrite buffer...
150615 19:16:33 [ERROR] InnoDB: We scanned the log up to 38170086942720. A checkpoint was at 38170086942868 and the maximum LSN on a database page was 0. It is possible that the database is now corrupt!
2015-06-15 19:16:34 7f8d06f0b760 InnoDB: Error: page 2 log sequence number 38170738963477
InnoDB: is in the future! Current system log sequence number 38170086942868.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
2015-06-15 19:16:34 7f8d06f0b760 InnoDB: Error: page 4 log sequence number 38170404171544
InnoDB: is in the future! Current system log sequence number 38170086942868.
...
 
2015-06-15 19:16:34 7f8d06f0b760 InnoDB: Error: page 3653636 log sequence number 38170839882425
InnoDB: is in the future! Current system log sequence number 38170086942868.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
InnoDB: 128 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 2 row operations to undo
InnoDB: Trx id counter is 25231318784
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 2082770, file name bin.070522
InnoDB: Last MySQL binlog file position 0 2887918, file name /data/log/db/mysql/binary-log/bin.025191
InnoDB: Cleaning up trx with id 25229118446
2015-06-15 19:16:41 7f8d06f0b760  InnoDB: Assertion failure in thread 140243683555168 in file trx0trx.cc line 462
InnoDB: Failing assertion: trx->update_undo == NULL
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150615 19:16:41 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
....
 
 
InnoDB: Last MySQL binlog file position 0 2887918, file name /data/log/db/mysql/binary-log/bin.025191
InnoDB: Cleaning up trx with id 25229118446
2015-06-15 19:19:19 7ff800a96760  InnoDB: Assertion failure in thread 140703139719008 in file trx0trx.cc line 462
InnoDB: Failing assertion: trx->update_undo == NULL
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150615 19:19:19 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.0.19-MariaDB-1~wheezy-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=0
max_threads=5052
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15636003 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7ff8014a290b]
/usr/sbin/mysqld(handle_fatal_signal+0x422)[0x7ff80102cc42]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7ff8006790a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7ff7fecd4165]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x180)[0x7ff7fecd73e0]
/usr/sbin/mysqld(+0x806452)[0x7ff8012ae452]
/usr/sbin/mysqld(+0x7fd4b5)[0x7ff8012a54b5]
/usr/sbin/mysqld(+0x76a089)[0x7ff801212089]
/usr/sbin/mysqld(+0x7e9495)[0x7ff801291495]
/usr/sbin/mysqld(+0x7233db)[0x7ff8011cb3db]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x68)[0x7ff80102f2b8]
/usr/sbin/mysqld(+0x434a02)[0x7ff800edca02]
/usr/sbin/mysqld(_Z11plugin_initPiPPci+0x702)[0x7ff800edd7b2]
/usr/sbin/mysqld(+0x39d2a5)[0x7ff800e452a5]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x497)[0x7ff800e45ee7]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff7fecc0ead]
/usr/sbin/mysqld(+0x392b5d)[0x7ff800e3ab5d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150615 19:19:24 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended



 Comments   
Comment by Kenny [ 2015-06-18 ]

I then compiled mariadb with debugging symbols....

How To Reproduce:

  • get a server that has some IOPS capacity! (4k?)
  • Set innodb_log_file_size to 256MB
  • set innodb_io_capacity(_max) very low, like 200
  • Make sure buffer pool is quite large (so insert buffer will grow)
  • create some write workload
  • Make sure insert buffer is growing, like for example 100.000 pages
  • set innodb_change_buffering=0 innodb_change_buffer_max_size=0
  • set innodb_io_capacity=4000 innodb_io_capacity
  • Innodb will now reduce the insert buffer size
  • Watch Ibuf Size, and also watch checkpoint age!!!

At some point you will get errors like::

2015-06-18 14:01:05 7f54a83fb700 InnoDB: ERROR: the age of the last checkpoint is 483181057,
InnoDB: which exceeds the log group capacity 483180135.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

Then I saw this:

 
Log sequence number 38726137452302
Log flushed up to   38726137452292
Pages flushed up to 38725388385449
Last checkpoint at  38725312940027
Max checkpoint age    434154333
Checkpoint age target 420587011
Modified age          749066853
Checkpoint age        824512275
0 pending log writes, 0 pending chkp writes
859354 log i/o's done, 0.00 log i/o's/second

The checkpoint age is more than 500MB, modified age too. This is not possible....

Then the &dict_operation_lock was held for quite a while:

 
--Thread 139992760297216 has waited at row0purge.cc line 754 for 284.00 seconds the semaphore:
S-lock on RW-latch at 0x7fc275c02040 '&dict_operation_lock'
a writer (thread id 139992735119104) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 754
Last time write locked in file /home/vagrant/mariadb-10.0.19/storage/xtradb/dict/dict0stats.cc line 2391

`bt thread apply` and `bt thread apply full` are attached.

In the end, we have a crashed system, with resolved symbols now! (Hurray).
This is MariaDB 10.0.19.

Crash::

 
	InnoDB: ###### Diagnostic info printed to the standard error stream
	InnoDB: Error: semaphore wait has lasted > 600 seconds
	InnoDB: We intentionally crash the server, because it appears to be hung.
	2015-06-18 14:25:30 7f529c3fd700  InnoDB: Assertion failure in thread 139992785475328 in file srv0srv.cc line 2196
	InnoDB: We intentionally generate a memory trap.
	InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
	InnoDB: If you get repeated assertion failures or crashes, even
	InnoDB: immediately after the mysqld startup, there may be
	InnoDB: corruption in the InnoDB tablespace. Please refer to
	InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
	InnoDB: about forcing recovery.
	150618 14:25:30 [ERROR] mysqld got signal 6 ;
	This could be because you hit a bug. It is also possible that this binary
	or one of the libraries it was linked against is corrupt, improperly built,
	or misconfigured. This error can also be caused by malfunctioning hardware.
 
	To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
	We will try our best to scrape up some info that will hopefully help
	diagnose the problem, but since we have already crashed, 
	something is definitely wrong and this may fail.
 
	Server version: 10.0.19-MariaDB-2~wheezy-log
	key_buffer_size=16777216
	read_buffer_size=2097152
	max_used_connections=4
	max_threads=5052
	thread_count=3
	It is possible that mysqld could use up to 
	key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15636003 K  bytes of memory
	Hope that's ok; if not, decrease some variables in the equation.
 
	Thread pointer: 0x0x0
	Attempting backtrace. You can use the following information to find out
	where mysqld died. If you see no messages after this, something went
	terribly wrong...
	stack_bottom = 0x0 thread_stack 0x30000
	/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7fc2753a666b]
	/usr/sbin/mysqld(handle_fatal_signal+0x422)[0x7fc274f30ab2]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7fc27457d030]
	/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fc272bd8475]
	/lib/x86_64-linux-gnu/libc.so.6(abort+0x180)[0x7fc272bdb6f0]
	/usr/sbin/mysqld(+0x79cd83)[0x7fc275148d83]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7fc274574b50]
	/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc272c820ed]
	The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
	information that should help you find out what is causing the crash.

Comment by Kenny [ 2015-06-18 ]

So to elaborate:

I crashed 1 slave system, then I was able to crash a second slave as well.

Comment by Elena Stepanova [ 2015-06-18 ]

jplindst,

I don't want to jump into reproducing it blindly, I suppose you'll do it much faster, and besides, maybe there is already enough information for you to proceed.

Comment by Roel Van de Paar [ 2015-06-19 ]

Preliminary bt analysis by a colleague;

There are a lot of things going in the server:
1) several threads trying to do ibuf merge after buffer I/O, blocked on fil_system->mutex;
2) master thread trying to do ibuf merge, blocked on fil_system->mutex;
3) purge trying to do its thing, blocked on dict_operation_lock;
4) one thread actually in progress, the doublewrite write at the end of flush batch;
5) background stats thread waiting for the free space in the log. This must be the one holding dict_operation_lock;
6) several parallel slave threads in the binlog group commit;
7) several parallel slave threads waiting for the free space in the log;

It does not look strictly deadlocked, as the flush appears to be progressing, but apparently too slowly or we wouldn't get 600 s lock timeouts. No idea re. checkpoint age anomaly, nor what is the root cause behind all this.

Comment by Jan Lindström (Inactive) [ 2015-07-21 ]

Changing the IO capacity to too low could be behind this but not sure. I did not see indication that server would be strictly deadlocked but you did not include full unedited error log so this is based on bt output. If you think the server is hang you should do

(1) show processlist; (do this several times e.g. every 1 minute)
(2) show innodb status; (do this also several times e.g. every 1 minute)
(3) wait at least 600s
(4) provide full unedited error log (make sure you have that enabled)
(5) do several gdb stack outputs e.g. 3 on 1 minute wait between
(7) see top or similar several times e.g. every 1 minute

and provide all of this for further analysis. With given data, I do not have enough information to continue.

R: Jan

Comment by Jan Lindström (Inactive) [ 2015-10-26 ]

Hi,

I could not repeat the problem, either I had too slow IO (I used SSD) but there is something strange on how to repeat as I can't set :

MariaDB [(none)]> set global innodb_change_buffering=0;
--------------
set global innodb_change_buffering=0
--------------
 
ERROR 1232 (42000): Incorrect argument type to variable 'innodb_change_buffering'

This is MariaDB 10.0.22 (to be released).

Comment by Elena Stepanova [ 2015-10-26 ]

jplindst,
innodb_change_buffering needs to be set to none, not to 0, there is some weird extra check for the values (probably incorrect).
I suppose in "How To Reproduce" steps Kenny was just describing the idea, not always the exact commands that were used.

Comment by Jan Lindström (Inactive) [ 2015-10-26 ]

Yes, I noted that but still, can't repeat.

Comment by Jan Lindström (Inactive) [ 2015-11-05 ]

Can you provide step by step instructions how to repeat and does this repeat on every time (not sure why you would disable change buffering dynamically) ?

Comment by Jan Lindström (Inactive) [ 2016-04-22 ]

Could not repeat after several different ways of repeating.

Generated at Thu Feb 08 07:26:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.