[MDEV-9650] server stall on sync index (insert into large innodb table with fulltext index) Created: 2016-02-28  Updated: 2016-10-25  Resolved: 2016-10-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.0, 10.1
Fix Version/s: 10.0.28, 10.1.19

Type: Bug Priority: Critical
Reporter: David Schnall Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: upstream-fixed
Environment:

linux


Attachments: Text File innodb_status_2016-09-08_0926.txt     Text File innodb_status_2016-09-08_0929.txt     Text File innodb_status_2016-09-08_0936.txt     File sync.test    
Issue Links:
PartOf
is part of MDEV-10224 10.0.26 merge Closed
is part of MDEV-11133 10.0.28 merge Closed
Sprint: 10.0.26

 Description   

I encounter exactly the same issue as described in this mysql bug
http://bugs.mysql.com/bug.php?id=73816
"In an instance of ours, where we are using FULLTEXT indexes on InnoDB tables, we see occasional blocked DML operations on any table using any engine (InnoDB as well as MyISAM) for the duration of ~2 minutes. The SHOW INNODB STATUS reveals that the oldest running transaction is an INSERT into a table with a FULLTEXT index"

As mariadb doesn't have the parameter binlog_order_commits the only workaround found so far was to disable binary logging.



 Comments   
Comment by Elena Stepanova [ 2016-02-29 ]

Since the fix is in InnoDB, it should be pulled to MariaDB automatically with the merge of InnoDB 5.6.30 (and XtraDB, when the fix comes through Percona).

Comment by David Schnall [ 2016-05-27 ]

I just installed version 10.0.25 (which should have pulled the updates done for mysql 5.6.30) but issue is not solved on mariadb. Turning binlog on stalls the server the same way as before on inserts into large innodb table with a FTS index)

bugfix release 5.6.30 (https://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-30.html)
InnoDB: MySQL stalled when synchronizing the InnoDB full-text index cache. (Bug #22516559, Bug #16510576, Bug #73816)

Comment by David Schnall [ 2016-09-08 ]

It was fixed in 10.0.26 but it seems that 10.0.27 broke it again. It looks like there is a deadlock. Attaching innodb status log during lock. I will downgrade again to 10.0.26 to fix it innodb_status_2016-09-08_0929.txt innodb_status_2016-09-08_0926.txt innodb_status_2016-09-08_0936.txt
Downgrading indeed again fixes it

Comment by Elena Stepanova [ 2016-09-09 ]

Thanks for the update.
Indeed, the bug seems to have an unfortunate history in MySQL 5.6 branch, and consequently in MariaDB 10.x. it can be seen by running innodb_fts.sync test from the current MySQL 5.6 (to make sure it's not the test changes but the code changes, take the latest version of the test, and run it on different versions).
So, the bug was fixed in 5.6.30.

Then, it re-appeared in 5.6.31, innodb_fts.sync test causes there an assertion failure. It happened after this commit:

commit 5ba2d8751bdf900a5481c89cdddde5e240483e70
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date:   Mon Apr 18 12:39:36 2016 +0800
 
    Followup: BUG#22996488 - CRASH IN FTS_SYNC_INDEX WHEN DOING DDL IN A LOOP
    
    Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
    RB: 12340

Then, it was fixed again in 5.6.32, by this commit:

commit 802b76add9c4e74459e01de021fa829b0977d8e8
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date:   Wed Jun 1 14:39:49 2016 +0800
 
    Followup: BUG#22516559 MYSQL INSTANCE STALLS WHEN SYNCING FTS INDEX
    
    BUG#23320569 INNODB FTS:TRY TO ACQUIRE DICT_OPERATION_LOCK AGAIN
    WHEN HOLD IT IN FTS SYNC
    
    Problem:
    We hold dict_operation_lock when sync fts in background to prevent
    DDL like DROP INDEX, etc. But if sync fails, we need to rollback
    (row_undo()), in which we acquire dict_operation_lock again, so
    assert fails.
    
    Solution:
    Set trx->dict_operation_lock_mode in fts_sync() to avoid acquiring
    dict_operation_lock again in rollback.
    
    Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
    RB: 12840

MariaDB 10.0 now has InnoDB plugin 5.6.32, so it should be okay; but XtraDB (which is default InnoDB) is still of 5.6.31, hence the problem.

Normally we wouldn't re-open a closed bug, but create a new one instead; but in this particular case, since the report did not show up anywhere in git history or the change log, it should be safe to re-open.

Comment by Elena Stepanova [ 2016-09-09 ]

Additionally, please check why innodb_fts.sync test didn't make it to our tree. While innodb_fts.block_sync is not applicable because it uses a variable which is not in MariaDB, innodb_fts.sync seems to run fine on MariaDB (unless it crashes, I mean).

The test has been attached to the issue for convenience (sync.test). I didn't change anything in it, just took it as is from MySQL 5.6 (maybe not the latest version).

perl ./mtr bug.sync --mysqld=--innodb --mysqld=--innodb-ft-index-cache --mysqld=--innodb-ft-index-table --nocheck-testcases

Stack trace from 10.0 6c74ef8ae9c

2016-09-09 13:07:31 7fb491ffb700  InnoDB: Assertion failure in thread 140413520295680 in file sync0rw.ic line 416
InnoDB: Failing assertion: !rw_lock_own(lock, 352)
...
#5  0x00007f17dc469448 in __GI_abort () at abort.c:89
#6  0x0000000000b0846b in rw_lock_s_lock_func (lock=0x17d0200 <dict_operation_lock>, pass=0, file_name=0xfcddb0 "/data/src/10.0/storage/xtradb/row/row0undo.cc", line=299) at /data/src/10.0/storage/xtradb/include/sync0rw.ic:416
#7  0x0000000000b08b19 in pfs_rw_lock_s_lock_func (lock=0x17d0200 <dict_operation_lock>, pass=0, file_name=0xfcddb0 "/data/src/10.0/storage/xtradb/row/row0undo.cc", line=299) at /data/src/10.0/storage/xtradb/include/sync0rw.ic:1040
#8  0x0000000000b0fdc0 in row_mysql_freeze_data_dictionary_func (trx=0x7f17cbc42678, file=0xfcddb0 "/data/src/10.0/storage/xtradb/row/row0undo.cc", line=299) at /data/src/10.0/storage/xtradb/row/row0mysql.cc:2155
#9  0x0000000000b40ea4 in row_undo (node=0x7f17cbc67478, thr=0x7f17cbc35440) at /data/src/10.0/storage/xtradb/row/row0undo.cc:299
#10 0x0000000000b40fef in row_undo_step (thr=0x7f17cbc35440) at /data/src/10.0/storage/xtradb/row/row0undo.cc:351
#11 0x0000000000acdc0b in que_thr_step (thr=0x7f17cbc35440) at /data/src/10.0/storage/xtradb/que/que0que.cc:1087
#12 0x0000000000acde2b in que_run_threads_low (thr=0x7f17cbc35440) at /data/src/10.0/storage/xtradb/que/que0que.cc:1151
#13 0x0000000000acdfb7 in que_run_threads (thr=0x7f17cbc35440) at /data/src/10.0/storage/xtradb/que/que0que.cc:1192
#14 0x0000000000b81d52 in trx_rollback_to_savepoint_low (trx=0x7f17cbc42678, savept=0x0) at /data/src/10.0/storage/xtradb/trx/trx0roll.cc:112
#15 0x0000000000b8207b in trx_rollback_to_savepoint (trx=0x7f17cbc42678, savept=0x0) at /data/src/10.0/storage/xtradb/trx/trx0roll.cc:154
#16 0x0000000000c8b887 in fts_sql_rollback (trx=0x7f17cbc42678) at /data/src/10.0/storage/xtradb/fts/fts0sql.cc:362
#17 0x0000000000c745de in fts_sync_rollback (sync=0x7f17dbcd4ab8) at /data/src/10.0/storage/xtradb/fts/fts0fts.cc:4517
#18 0x0000000000c748c0 in fts_sync (sync=0x7f17dbcd4ab8, unlock_cache=true, wait=false) at /data/src/10.0/storage/xtradb/fts/fts0fts.cc:4612
#19 0x0000000000c749f8 in fts_sync_table (table=0x7f17d62c8278, unlock_cache=true, wait=false) at /data/src/10.0/storage/xtradb/fts/fts0fts.cc:4652
#20 0x0000000000c8112e in fts_optimize_sync_table (table_id=54) at /data/src/10.0/storage/xtradb/fts/fts0opt.cc:2989
#21 0x0000000000c81467 in fts_optimize_thread (arg=0x7f17dbc2e4f8) at /data/src/10.0/storage/xtradb/fts/fts0opt.cc:3121
#22 0x00007f17de3630a4 in start_thread (arg=0x7f17ccbfb700) at pthread_create.c:309
#23 0x00007f17dc51b87d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Generated at Thu Feb 08 07:36:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.