[MDEV-9747] XtraDB thread never exits. Always in 'cleaning up' Created: 2016-03-16  Updated: 2017-05-03  Resolved: 2016-04-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - OQGRAPH, Storage Engine - XtraDB
Affects Version/s: 10.1.12, 10.0, 10.1
Fix Version/s: 10.0.25

Type: Bug Priority: Major
Reporter: Brian Evans Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File my.cnf     File mysqld.err     Text File thread-trace.txt    
Issue Links:
Duplicate
is duplicated by MDEV-8206 [PATCH] OQGraph crash caused by SELEC... Closed
Sprint: 10.0.25

 Description   

I've been having an issue with an instance of MariaDB 10.1.12 on a non-production machine which makes it never shutdown. It just hangs forever with "waiting for transaction".

This was odd as nothing was showing in "SHOW FULL PROCESSLIST"

I discovered a transaction thread never finishes.
This is the data collected. If you need more, let met know.

MariaDB [(none)]> show full processlist\G
*************************** 1. row ***************************
      Id: 80
    User: root
    Host: localhost
      db: NULL
 Command: Query
    Time: 0
   State: init
    Info: show full processlist
Progress: 0.000
1 row in set (0.00 sec)
 
MariaDB [(none)]> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2016-03-16 10:03:21 7ff7fc505700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 25 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 50 srv_active, 0 srv_shutdown, 673746 srv_idle
srv_master_thread log flush and writes: 673567
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 118
OS WAIT ARRAY INFO: signal count 120
Mutex spin waits 2904, rounds 3613, OS waits 53
RW-shared spins 68, rounds 1956, OS waits 64
RW-excl spins 3, rounds 50, OS waits 1
Spin rounds per wait: 1.24 mutex, 28.76 RW-shared, 16.67 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 5443238
Purge done for trx's n:o < 5443232 undo n:o < 0 state: running but idle
History list length 541
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 80, OS thread handle 0x7ff7fc505700, query id 2271 localhost root init
show engine innodb status
---TRANSACTION 0, not started
MySQL thread id 24, OS thread handle 0x7ff7ff267700, query id 174 cleaning up
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
2049 OS file reads, 621 OS file writes, 280 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1842, seg size 1844, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 225203203047
Log flushed up to   225203203047
Pages flushed up to 225203203047
Last checkpoint at  225203203047
Max checkpoint age    80354304
Checkpoint age target 77843232
Modified age          0
Checkpoint age        0
0 pending log writes, 0 pending chkp writes
201 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 138412032; in additional pool allocated 0
Total memory allocated by read views 200
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 2561632         (2213368 + 348264)
    Page hash           139112 (buffer pool 0 only)
    Dictionary cache    2001389         (554768 + 1446621)
    File system         938224  (812272 + 125952)
    Lock system         333592  (332872 + 720)
    Recovery system     0       (0 + 0)
Dictionary memory allocated 1446621
Buffer pool size        8191
Buffer pool size, bytes 134201344
Free buffers            6575
Database pages          1595
Old database pages      607
Modified db pages       0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1572, created 23, written 369
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 1595, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
0 RW transactions active inside InnoDB
0 RO transactions active inside InnoDB
0 out of 1000 descriptors used
Main thread process no. 3914, id 140702637872896, state: sleeping
Number of rows inserted 157, updated 37, deleted 850, read 188280
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
 
1 row in set (0.00 sec)
 
MariaDB [(none)]>



 Comments   
Comment by Elena Stepanova [ 2016-03-17 ]

grknight,
Where do you see "waiting for transaction"?
Does it also happen when you run without TokuDB?
Could you please attach the error log and the cnf file(s)?

Comment by Brian Evans [ 2016-03-17 ]

Added files requested. The Signal 6's are me doing SIGABRT to get the daemon to stop

Comment by Elena Stepanova [ 2016-03-17 ]

Thanks, I am able to reproduce the problem.

It happens between OQGRAPH and InnoDB. I'm not sure which one is to blame, so I'm assigning it to jplindst for now; Jan, if it turns out to be not an InnoDB problem, please reassign it back to me or directly to andymc73.

The described issue with shutdown happens on a non-debug build. Debug build goes down with an assertion failure earlier.

The MTR test case below is for manual reproducing/debugging only, please don't put it like this into the regression test suite (especially since it passes either way).

If it's run on a debug build, it fails on SELECT * FROM oq_graph (assertion failure is at the end of the comment). On a non-debug build, it does not crash, but returns an empty result set, which is already wrong, then the test proceeds to shutdown, hangs there until the timeout is exceeded (you'll see messages from the test), then kills the server, cleans up and exits.

MTR test case

--source include/have_innodb.inc
--enable_connect_log
 
install soname 'ha_oqgraph';
 
CREATE TABLE oq_backing (
  origid INT UNSIGNED NOT NULL, 
  destid INT UNSIGNED NOT NULL,  
  PRIMARY KEY (origid, destid), 
  KEY (destid)
) ENGINE=InnoDB;
 
INSERT INTO oq_backing(origid, destid) 
 VALUES (1,2), (2,3), (3,4), (4,5), (2,6), (5,6);
 
CREATE TABLE oq_graph (
  latch VARCHAR(32) NULL,
  origid BIGINT UNSIGNED NULL,
  destid BIGINT UNSIGNED NULL,
  weight DOUBLE NULL,
  seq BIGINT UNSIGNED NULL,
  linkid BIGINT UNSIGNED NULL,
  KEY (latch, origid, destid) USING HASH,
  KEY (latch, destid, origid) USING HASH
) 
ENGINE=OQGRAPH 
data_table='oq_backing' origid='origid' destid='destid';
 
--enable_reconnect
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--shutdown_server 10
--source include/wait_until_disconnected.inc
 
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--source include/wait_until_connected_again.inc
 
--connect (con1,localhost,root,,)
SELECT * FROM oq_graph;
--disconnect con1
 
--connect (con2,localhost,root,,)
--enable_reconnect
 
SELECT * FROM oq_graph;
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
 
--echo
--echo #########################
--echo # If the test proceeds right away, then the bug was not reproduced.
--echo # If nothing happens for a long time, your are likely to have hit the problem,
--echo # wait for 70 sec until the timeout finishes and we'll check what the error log says...
--echo #########################
--echo
 
--shutdown_server 70
 
--exec sh -c "tail $MYSQLTEST_VARDIR/log/mysqld.1.err"
 
--echo
--echo #########################
--echo # MESSAGE FROM THE TEST:
--echo # If you see 'Waiting for X active transactions' in the error log above,
--echo # the problem was definitely reproduced. 
--echo #########################
--echo
 
--source include/wait_until_disconnected.inc
 
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--source include/wait_until_connected_again.inc
 
--disable_query_log
drop table oq_graph, oq_backing;
uninstall soname 'ha_oqgraph';

Assertion failure on 10.0 commit 0125e588f3967bcaf01cf60b6a64e92758158ac4 debug build

mysqld: /src/10.0/sql/handler.cc:2473: THD* handler::ha_thd() const: Assertion `!table || !table->in_use || table->in_use == _current_thd()' failed.
160317 20:08:09 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f78ce74d1d2 in __assert_fail () from /lib64/libc.so.6
#8  0x000000000086060a in handler::ha_thd (this=0x7f78bba79888) at /src/10.0/sql/handler.cc:2473
#9  0x00007f78c6f561ed in ha_innodb::extra (this=0x7f78bba79888, operation=HA_EXTRA_IS_ATTACHED_CHILDREN) at /src/10.0/storage/innobase/handler/ha_innodb.cc:12544
#10 0x00007f78c6bc95c3 in ha_oqgraph::extra (this=0x7f78bba5d888, operation=HA_EXTRA_IS_ATTACHED_CHILDREN) at /src/10.0/storage/oqgraph/ha_oqgraph.cc:1102
#11 0x00000000007ed36c in tc_acquire_table (thd=0x7f78c85716f0, share=0x7f78bba4b088) at /src/10.0/sql/table_cache.cc:342
#12 0x00000000007ee25b in tdc_acquire_share (thd=0x7f78c85716f0, db=0x7f78bb822868 "test", table_name=0x7f78bb822240 "oq_graph", key=0x7f78bb8226c9 "test", key_length=14, hash_value=3220223330, flags=3, out_table=0x7f78d07623e8) at /src/10.0/sql/table_cache.cc:765
#13 0x000000000060c249 in open_table (thd=0x7f78c85716f0, table_list=0x7f78bb822288, mem_root=0x7f78d07625f0, ot_ctx=0x7f78d07625b0) at /src/10.0/sql/sql_base.cc:2384
#14 0x000000000060ee71 in open_and_process_table (thd=0x7f78c85716f0, lex=0x7f78c8574f98, tables=0x7f78bb822288, counter=0x7f78d07626f4, flags=0, prelocking_strategy=0x7f78d0762730, has_prelocking_list=false, ot_ctx=0x7f78d07625b0, new_frm_mem=0x7f78d07625f0) at /src/10.0/sql/sql_base.cc:4045
#15 0x000000000060ff36 in open_tables (thd=0x7f78c85716f0, start=0x7f78d07626b0, counter=0x7f78d07626f4, flags=0, prelocking_strategy=0x7f78d0762730) at /src/10.0/sql/sql_base.cc:4579
#16 0x0000000000610f22 in open_and_lock_tables (thd=0x7f78c85716f0, tables=0x7f78bb822288, derived=true, flags=0, prelocking_strategy=0x7f78d0762730) at /src/10.0/sql/sql_base.cc:5222
#17 0x0000000000605313 in open_and_lock_tables (thd=0x7f78c85716f0, tables=0x7f78bb822288, derived=true, flags=0) at /src/10.0/sql/sql_base.h:489
#18 0x0000000000674664 in execute_sqlcom_select (thd=0x7f78c85716f0, all_tables=0x7f78bb822288) at /src/10.0/sql/sql_parse.cc:5249
#19 0x000000000066ccdb in mysql_execute_command (thd=0x7f78c85716f0) at /src/10.0/sql/sql_parse.cc:2562
#20 0x0000000000677479 in mysql_parse (thd=0x7f78c85716f0, rawbuf=0x7f78bb822088 "SELECT * FROM oq_graph", length=22, parser_state=0x7f78d0763650) at /src/10.0/sql/sql_parse.cc:6565
#21 0x0000000000669f45 in dispatch_command (command=COM_QUERY, thd=0x7f78c85716f0, packet=0x7f78c85784b1 "SELECT * FROM oq_graph", packet_length=22) at /src/10.0/sql/sql_parse.cc:1308
#22 0x0000000000669212 in do_command (thd=0x7f78c85716f0) at /src/10.0/sql/sql_parse.cc:998
#23 0x00000000007878a7 in do_handle_one_connection (thd_arg=0x7f78c85716f0) at /src/10.0/sql/sql_connect.cc:1377
#24 0x0000000000787619 in handle_one_connection (arg=0x7f78c85716f0) at /src/10.0/sql/sql_connect.cc:1292
#25 0x0000000000df4358 in pfs_spawn_thread (arg=0x7f78c84a0e70) at /src/10.0/storage/perfschema/pfs.cc:1860
#26 0x00007f78d04700a4 in start_thread () from /lib64/libpthread.so.0
#27 0x00007f78ce80404d in clone () from /lib64/libc.so.6

Comment by Jan Lindström (Inactive) [ 2016-04-21 ]

Problem is that oqgraph used incorrect thd, this leads to fact that transaction inside InnoDB is not committed/aborted leaving it active => xtradb log thread waits forever transaction to end. Fix in https://github.com/MariaDB/server/pull/153 for https://jira.mariadb.org/browse/MDEV-8206 fixes also both problems reported in this bug report.

Comment by Jan Lindström (Inactive) [ 2016-04-26 ]

Fixed by commits:
commit 23b1b69b101f8f22319a8e661ebe8b4587c435d4
Author: Andrew McDonnell <bugs@andrewmcdonnell.net>
Date: Sat Apr 9 22:39:22 2016 +0930

Comment from https://github.com/MariaDB/server/pull/153

commit 6651005e623e3a12ca0af6a645483a165941f8fe
Author: Andrew McDonnell <bugs@andrewmcdonnell.net>
Date: Fri Jan 22 21:13:33 2016 +1030

Fix for MDEV-8206, as per Jira comments of 2015-11-16 and 2015-11-30

Comment by Nilnandan Joshi [ 2017-05-03 ]

I can reproduce this issue with MariaDB 5.0.29, 5.1.21.
Simple login into mysql from three different sessions and check innodb status.

------------
TRANSACTIONS
------------
Trx id counter 2822
Purge done for trx's n:o < 0 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 3, OS thread handle 0x7f5a704ca700, query id 56 localhost root cleaning up
---TRANSACTION 0, not started
MySQL thread id 4, OS thread handle 0x7f5a70481700, query id 22 localhost root cleaning up
---TRANSACTION 2821, not started
MySQL thread id 2, OS thread handle 0x7f5a70513700, query id 58 localhost root init
show engine innodb status

Comment by Marko Mäkelä [ 2017-05-03 ]

The output lines "MySQL thread id" are emitted by thd_get_error_context_description(), which is called by innobase_mysql_print_thd(), called by trx_print_low() which displays the line "TRANSACTION 0, not started".

From the InnoDB point of view, the 2 transactions are not active. The trx objects should be cached objects that only exist in the handler interface and in trx_sys->mysql_trx_list. The objects should be freed when the connection is closed. Maybe these connections did access InnoDB in the past, but no transaction is active any more?

There only is one transaction allocated for the SHOW ENGINE INNODB STATUS.

Do the "cleaning up" entries go away when the two connections are disconnected?
I do not know where the "cleaning up" message is coming from. One possibility is PSI_stage_info stage_cleaning_up defined in mysqld.cc.

Generated at Thu Feb 08 07:37:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.