[MDEV-14154] Failing assertion: slot->last_run <= current_time in fts0opt.cc Created: 2017-10-26  Updated: 2023-11-27  Resolved: 2019-07-25

Status: Closed
Project: MariaDB Server
Component/s: Full-text Search, Storage Engine - InnoDB, Storage Engine - XtraDB, Tests
Affects Version/s: 10.0, 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.26, 10.1.41, 10.3.17, 10.4.7

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: fulltext
Environment:

https://travis-ci.org/elenst/travis-tests/jobs/394371314


Attachments: HTML File threads_full    
Issue Links:
Blocks
blocks MDEV-20133 Merge new release of InnoDB 5.7.27 to... Closed
Relates
relates to MDEV-20126 Semaphore timeout due to large fullte... Open

 Description   

Note: The bug report was initially filed due to a buildbot failure on 10.0 tree. According to buildbot cross-reference, the test stopped failing some time at the end of 2017 – maybe the test has changed, or something else did. I'm keeping it here for the record though, the second part of the description relates to one of those old test failures.

The first part is the same assertion failure which happened on the current 10.3 (as of 2018-06-20) during non-buildbot concurrent tests. It's not reproducible so far, but for it we have a full stack trace, coredump, binary, datadir and logs.

Failure on current 10.3

10.3 956b296248acbe35aa87f056ccf99dbb999eea02

2018-06-20 04:01:30 0x7f9d927fc700  InnoDB: Assertion failure in file /home/travis/src/storage/innobase/fts/fts0opt.cc line 2844
InnoDB: Failing assertion: slot->last_run <= current_time
 
Thread 1 (Thread 0x7f9d927fc700 (LWP 23704)):
#0  0x00007f9dc9dd4611 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000055e3c07716a3 in my_write_core (sig=6) at /home/travis/src/mysys/stacktrace.c:481
#2  0x000055e3bffc735f in handle_fatal_signal (sig=6) at /home/travis/src/sql/signal_handler.cc:305
#3  <signal handler called>
#4  0x00007f9dc9214c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f9dc9218028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x000055e3c0409095 in ut_dbg_assertion_failed (expr=0x55e3c0b185f8 "slot->last_run <= current_time", file=0x55e3c0b17a48 "/home/travis/src/storage/innobase/fts/fts0opt.cc", line=2844) at /home/travis/src/storage/innobase/ut/ut0dbg.cc:61
#7  0x000055e3c054ff0f in fts_optimize_how_many (tables=0x55e3c2e16c98) at /home/travis/src/storage/innobase/fts/fts0opt.cc:2844
#8  0x000055e3c0550522 in fts_optimize_thread (arg=0x55e3c2e109c0) at /home/travis/src/storage/innobase/fts/fts0opt.cc:3101
#9  0x00007f9dc9dcf184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f9dc92dbffd in clone () from /lib/x86_64-linux-gnu/libc.so.6

All threads' stack traces are attached as threads_full.
Coredump, datadir etc. are available separately.

Old innodb_fts.innodb-fts-fic failure on 10.0

innodb_fts.innodb-fts-fic 'xtradb'       w1 [ fail ]
        Test ended at 2017-08-28 20:34:08
 
CURRENT_TEST: innodb_fts.innodb-fts-fic
mysqltest: At line 181: query 'CREATE FULLTEXT INDEX idx2 ON articles (body)' failed: 2013: Lost connection to MySQL server during query
 
The result from queries just before the failure was:
< snip >
1	aaaa	DBMS stands for DataBase ...
drop table articles;
CREATE TABLE articles (
FTS_DOC_ID BIGINT UNSIGNED NOT NULL ,
title VARCHAR(200),
body TEXT
) ENGINE=InnoDB;
CREATE FULLTEXT INDEX idx on articles (title);
INSERT INTO articles  VALUES (9, 'MySQL Tutorial','DBMS stands for DataBase ...');
UPDATE articles set title = 'bbbb'  WHERE MATCH(title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
ERROR HY000: Invalid InnoDB FTS Doc ID
UPDATE articles set title = 'bbbb', FTS_DOC_ID=8  WHERE MATCH(title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
ERROR HY000: Invalid InnoDB FTS Doc ID
UPDATE articles set title = 'bbbb', FTS_DOC_ID=10  WHERE MATCH(title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
SELECT * FROM articles WHERE MATCH (title) AGAINST ('bbbb' IN NATURAL LANGUAGE MODE);
FTS_DOC_ID	title	body
10	bbbb	DBMS stands for DataBase ...
SELECT * FROM articles WHERE MATCH (title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
FTS_DOC_ID	title	body
CREATE FULLTEXT INDEX idx2 ON articles (body);
 
More results from queries before failure can be found in /dev/shm/var/1/log/innodb-fts-fic.log
 
 
Server [mysqld.1 - pid: 26345, winpid: 26345, exit: 256] failed during test run
Server log from this test:

----------SERVER LOG START-----------
2017-08-28 20:33:54 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
2017-08-28 20:33:54 b17c3b40 InnoDB: A new Doc ID must be supplied while updating FTS indexed columns.
InnoDB: FTS Doc ID must be larger than 9 for table`test`.`articles`
2017-08-28 20:33:54 aa7f9b40  InnoDB: Assertion failure in thread 2860489536 in file fts0opt.cc line 2856
InnoDB: Failing assertion: slot->last_run <= current_time
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
"170828 20:33:54 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.0.33-MariaDB-1~xenial
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61689 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
/usr/sbin/mysqld(my_print_stacktrace+0x36)[0x87dc6a6]
/usr/sbin/mysqld(handle_fatal_signal+0x3b3)[0x83639b3]
[0xb7769c14]
[0xb7769c31]
/lib/i386-linux-gnu/libc.so.6(gsignal+0x39)[0xb70b2e89]
/lib/i386-linux-gnu/libc.so.6(abort+0x157)[0xb70b43e7]
/usr/sbin/mysqld[0x8647ecc]
/usr/sbin/mysqld[0x864c1d8]
/lib/i386-linux-gnu/libpthread.so.0(+0x62b5)[0xb76832b5]
/lib/i386-linux-gnu/libc.so.6(clone+0x6e)[0xb716e16e]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
----------SERVER LOG END-------------

Logs are not available.



 Comments   
Comment by Marko Mäkelä [ 2019-07-22 ]

The problem is limited to FULLTEXT INDEX in InnoDB tables. The assertion expression seems to assume that ut_time() will be returning monotonically increasing values. Maybe this is not always the case?

Comment by Marko Mäkelä [ 2019-07-22 ]

Could it be that the assertion can fail whenever the system time is moved backwards? For example, if the system clock was drifting (moving too fast), and it was subsequently moved backwards, for example by NTP? If we can provoke a crash by moving the system time backwards, then the logic must be rewritten to be more stable.

There currently are about 60 calls to ut_time() in InnoDB. Also, btr_defragment_thread is the only user of a function pointer ut_timer_now. Maybe we should just use a 64-bit my_timer_cycles everywhere where we assume monotonically increasing time.

Comment by Marko Mäkelä [ 2019-07-23 ]

Another problematic call is ut_time_ms(). Percona fixed one piece of code by introducing ut_monotonic_time_ms(), a wrapper for clock_gettime(CLOCK_MONOTONIC, &timespec).

Comment by Marko Mäkelä [ 2019-07-23 ]

I think that the portability wrapper my_interval_timer() is the way to go. I did not yet touch ut_time():

diff --git a/storage/xtradb/ut/ut0ut.cc b/storage/xtradb/ut/ut0ut.cc
index 7b76d968e28..eda3921c5de 100644
--- a/storage/xtradb/ut/ut0ut.cc
+++ b/storage/xtradb/ut/ut0ut.cc
@@ -1,6 +1,7 @@
 /*****************************************************************************
 
 Copyright (c) 1994, 2017, Oracle and/or its affiliates. All Rights Reserved.
+Copyright (c) 2019, MariaDB Corporation.
 
 This program is free software; you can redistribute it and/or modify it under
 the terms of the GNU General Public License as published by the Free Software
@@ -189,12 +190,7 @@ ut_time_us(
 /*=======*/
 	ullint*	tloc)	/*!< out: us since epoch, if non-NULL */
 {
-	struct timeval	tv;
-	ullint		us;
-
-	ut_gettimeofday(&tv, NULL);
-
-	us = (ullint) tv.tv_sec * 1000000 + tv.tv_usec;
+	ullint us = my_interval_timer() / 1000;
 
 	if (tloc != NULL) {
 		*tloc = us;
@@ -213,11 +209,7 @@ ulint
 ut_time_ms(void)
 /*============*/
 {
-	struct timeval	tv;
-
-	ut_gettimeofday(&tv, NULL);
-
-	return((ulint) tv.tv_sec * 1000 + tv.tv_usec / 1000);
+	return static_cast<ulint>(my_interval_timer() / 1000000);
 }
 #endif /* !UNIV_HOTBACKUP */
 

Comment by Marko Mäkelä [ 2019-07-23 ]

It looks like MySQL 5.7.27 might fix this. I believe that we must fix this differently. I already replaced ut_time_us() and ut_time_ms() with the use of the monotonic clock in my_interval_timer(). I believe that we must do the same for most or all use of ut_time().

Comment by Marko Mäkelä [ 2019-07-24 ]

The high-precision timers may incur significantly more overhead than time(NULL). I will not introduce new calls to my_interval_timer() or equivalent in normal operation, because it could reduce performance.

What we can and will do is remove some redundant calls to time(NULL). The failing assertion is simply too strict. I will change that logic so that if the system clock was moved to the past, the time margins will expire immediately.

Comment by Marko Mäkelä [ 2019-07-25 ]

As related cleanup, I also removed ut_timer_now(), which was only used by innodb_defragment. The call to ut_init_timer() was missing in 10.2, so it is possible that the innodb_defragment_frequency had been broken in MariaDB 10.2.2.

Generated at Thu Feb 08 08:11:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.