Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14154

Failing assertion: slot->last_run <= current_time in fts0opt.cc

Details

    Description

      Note: The bug report was initially filed due to a buildbot failure on 10.0 tree. According to buildbot cross-reference, the test stopped failing some time at the end of 2017 – maybe the test has changed, or something else did. I'm keeping it here for the record though, the second part of the description relates to one of those old test failures.

      The first part is the same assertion failure which happened on the current 10.3 (as of 2018-06-20) during non-buildbot concurrent tests. It's not reproducible so far, but for it we have a full stack trace, coredump, binary, datadir and logs.

      Failure on current 10.3

      10.3 956b296248acbe35aa87f056ccf99dbb999eea02

      2018-06-20 04:01:30 0x7f9d927fc700  InnoDB: Assertion failure in file /home/travis/src/storage/innobase/fts/fts0opt.cc line 2844
      InnoDB: Failing assertion: slot->last_run <= current_time
       
      Thread 1 (Thread 0x7f9d927fc700 (LWP 23704)):
      #0  0x00007f9dc9dd4611 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
      #1  0x000055e3c07716a3 in my_write_core (sig=6) at /home/travis/src/mysys/stacktrace.c:481
      #2  0x000055e3bffc735f in handle_fatal_signal (sig=6) at /home/travis/src/sql/signal_handler.cc:305
      #3  <signal handler called>
      #4  0x00007f9dc9214c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #5  0x00007f9dc9218028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #6  0x000055e3c0409095 in ut_dbg_assertion_failed (expr=0x55e3c0b185f8 "slot->last_run <= current_time", file=0x55e3c0b17a48 "/home/travis/src/storage/innobase/fts/fts0opt.cc", line=2844) at /home/travis/src/storage/innobase/ut/ut0dbg.cc:61
      #7  0x000055e3c054ff0f in fts_optimize_how_many (tables=0x55e3c2e16c98) at /home/travis/src/storage/innobase/fts/fts0opt.cc:2844
      #8  0x000055e3c0550522 in fts_optimize_thread (arg=0x55e3c2e109c0) at /home/travis/src/storage/innobase/fts/fts0opt.cc:3101
      #9  0x00007f9dc9dcf184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      #10 0x00007f9dc92dbffd in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      All threads' stack traces are attached as threads_full.
      Coredump, datadir etc. are available separately.

      Old innodb_fts.innodb-fts-fic failure on 10.0

      innodb_fts.innodb-fts-fic 'xtradb'       w1 [ fail ]
              Test ended at 2017-08-28 20:34:08
       
      CURRENT_TEST: innodb_fts.innodb-fts-fic
      mysqltest: At line 181: query 'CREATE FULLTEXT INDEX idx2 ON articles (body)' failed: 2013: Lost connection to MySQL server during query
       
      The result from queries just before the failure was:
      < snip >
      1	aaaa	DBMS stands for DataBase ...
      drop table articles;
      CREATE TABLE articles (
      FTS_DOC_ID BIGINT UNSIGNED NOT NULL ,
      title VARCHAR(200),
      body TEXT
      ) ENGINE=InnoDB;
      CREATE FULLTEXT INDEX idx on articles (title);
      INSERT INTO articles  VALUES (9, 'MySQL Tutorial','DBMS stands for DataBase ...');
      UPDATE articles set title = 'bbbb'  WHERE MATCH(title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
      ERROR HY000: Invalid InnoDB FTS Doc ID
      UPDATE articles set title = 'bbbb', FTS_DOC_ID=8  WHERE MATCH(title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
      ERROR HY000: Invalid InnoDB FTS Doc ID
      UPDATE articles set title = 'bbbb', FTS_DOC_ID=10  WHERE MATCH(title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
      SELECT * FROM articles WHERE MATCH (title) AGAINST ('bbbb' IN NATURAL LANGUAGE MODE);
      FTS_DOC_ID	title	body
      10	bbbb	DBMS stands for DataBase ...
      SELECT * FROM articles WHERE MATCH (title) AGAINST ('tutorial' IN NATURAL LANGUAGE MODE);
      FTS_DOC_ID	title	body
      CREATE FULLTEXT INDEX idx2 ON articles (body);
       
      More results from queries before failure can be found in /dev/shm/var/1/log/innodb-fts-fic.log
       
       
      Server [mysqld.1 - pid: 26345, winpid: 26345, exit: 256] failed during test run
      Server log from this test:
      

      ----------SERVER LOG START-----------
      2017-08-28 20:33:54 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
      2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
      2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
      2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
      2017-08-28 20:33:55 aa7f9b40 InnoDB: FTS Optimize Removing table test/articles
      2017-08-28 20:33:54 b17c3b40 InnoDB: A new Doc ID must be supplied while updating FTS indexed columns.
      InnoDB: FTS Doc ID must be larger than 9 for table`test`.`articles`
      2017-08-28 20:33:54 aa7f9b40  InnoDB: Assertion failure in thread 2860489536 in file fts0opt.cc line 2856
      InnoDB: Failing assertion: slot->last_run <= current_time
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
      InnoDB: about forcing recovery.
      "170828 20:33:54 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.0.33-MariaDB-1~xenial
      key_buffer_size=1048576
      read_buffer_size=131072
      max_used_connections=1
      max_threads=153
      thread_count=1
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61689 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x48000
      /usr/sbin/mysqld(my_print_stacktrace+0x36)[0x87dc6a6]
      /usr/sbin/mysqld(handle_fatal_signal+0x3b3)[0x83639b3]
      [0xb7769c14]
      [0xb7769c31]
      /lib/i386-linux-gnu/libc.so.6(gsignal+0x39)[0xb70b2e89]
      /lib/i386-linux-gnu/libc.so.6(abort+0x157)[0xb70b43e7]
      /usr/sbin/mysqld[0x8647ecc]
      /usr/sbin/mysqld[0x864c1d8]
      /lib/i386-linux-gnu/libpthread.so.0(+0x62b5)[0xb76832b5]
      /lib/i386-linux-gnu/libc.so.6(clone+0x6e)[0xb716e16e]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      Writing a core file
      ----------SERVER LOG END-------------
      

      Logs are not available.

      Attachments

        1. threads_full
          242 kB
          Elena Stepanova

        Issue Links

          Activity

            The problem is limited to FULLTEXT INDEX in InnoDB tables. The assertion expression seems to assume that ut_time() will be returning monotonically increasing values. Maybe this is not always the case?

            marko Marko Mäkelä added a comment - The problem is limited to FULLTEXT INDEX in InnoDB tables. The assertion expression seems to assume that ut_time() will be returning monotonically increasing values. Maybe this is not always the case?

            Could it be that the assertion can fail whenever the system time is moved backwards? For example, if the system clock was drifting (moving too fast), and it was subsequently moved backwards, for example by NTP? If we can provoke a crash by moving the system time backwards, then the logic must be rewritten to be more stable.

            There currently are about 60 calls to ut_time() in InnoDB. Also, btr_defragment_thread is the only user of a function pointer ut_timer_now. Maybe we should just use a 64-bit my_timer_cycles everywhere where we assume monotonically increasing time.

            marko Marko Mäkelä added a comment - Could it be that the assertion can fail whenever the system time is moved backwards? For example, if the system clock was drifting (moving too fast), and it was subsequently moved backwards, for example by NTP? If we can provoke a crash by moving the system time backwards, then the logic must be rewritten to be more stable. There currently are about 60 calls to ut_time() in InnoDB. Also, btr_defragment_thread is the only user of a function pointer ut_timer_now . Maybe we should just use a 64-bit my_timer_cycles everywhere where we assume monotonically increasing time.

            Another problematic call is ut_time_ms(). Percona fixed one piece of code by introducing ut_monotonic_time_ms(), a wrapper for clock_gettime(CLOCK_MONOTONIC, &timespec).

            marko Marko Mäkelä added a comment - Another problematic call is ut_time_ms() . Percona fixed one piece of code by introducing ut_monotonic_time_ms() , a wrapper for clock_gettime(CLOCK_MONOTONIC, &timespec) .

            I think that the portability wrapper my_interval_timer() is the way to go. I did not yet touch ut_time():

            diff --git a/storage/xtradb/ut/ut0ut.cc b/storage/xtradb/ut/ut0ut.cc
            index 7b76d968e28..eda3921c5de 100644
            --- a/storage/xtradb/ut/ut0ut.cc
            +++ b/storage/xtradb/ut/ut0ut.cc
            @@ -1,6 +1,7 @@
             /*****************************************************************************
             
             Copyright (c) 1994, 2017, Oracle and/or its affiliates. All Rights Reserved.
            +Copyright (c) 2019, MariaDB Corporation.
             
             This program is free software; you can redistribute it and/or modify it under
             the terms of the GNU General Public License as published by the Free Software
            @@ -189,12 +190,7 @@ ut_time_us(
             /*=======*/
             	ullint*	tloc)	/*!< out: us since epoch, if non-NULL */
             {
            -	struct timeval	tv;
            -	ullint		us;
            -
            -	ut_gettimeofday(&tv, NULL);
            -
            -	us = (ullint) tv.tv_sec * 1000000 + tv.tv_usec;
            +	ullint us = my_interval_timer() / 1000;
             
             	if (tloc != NULL) {
             		*tloc = us;
            @@ -213,11 +209,7 @@ ulint
             ut_time_ms(void)
             /*============*/
             {
            -	struct timeval	tv;
            -
            -	ut_gettimeofday(&tv, NULL);
            -
            -	return((ulint) tv.tv_sec * 1000 + tv.tv_usec / 1000);
            +	return static_cast<ulint>(my_interval_timer() / 1000000);
             }
             #endif /* !UNIV_HOTBACKUP */
             
            

            marko Marko Mäkelä added a comment - I think that the portability wrapper my_interval_timer() is the way to go. I did not yet touch ut_time() : diff --git a/storage/xtradb/ut/ut0ut.cc b/storage/xtradb/ut/ut0ut.cc index 7b76d968e28..eda3921c5de 100644 --- a/storage/xtradb/ut/ut0ut.cc +++ b/storage/xtradb/ut/ut0ut.cc @@ -1,6 +1,7 @@ /***************************************************************************** Copyright (c) 1994, 2017, Oracle and/or its affiliates. All Rights Reserved. +Copyright (c) 2019, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -189,12 +190,7 @@ ut_time_us( /*=======*/ ullint* tloc) /*!< out: us since epoch, if non-NULL */ { - struct timeval tv; - ullint us; - - ut_gettimeofday(&tv, NULL); - - us = (ullint) tv.tv_sec * 1000000 + tv.tv_usec; + ullint us = my_interval_timer() / 1000; if (tloc != NULL) { *tloc = us; @@ -213,11 +209,7 @@ ulint ut_time_ms(void) /*============*/ { - struct timeval tv; - - ut_gettimeofday(&tv, NULL); - - return((ulint) tv.tv_sec * 1000 + tv.tv_usec / 1000); + return static_cast<ulint>(my_interval_timer() / 1000000); } #endif /* !UNIV_HOTBACKUP */

            It looks like MySQL 5.7.27 might fix this. I believe that we must fix this differently. I already replaced ut_time_us() and ut_time_ms() with the use of the monotonic clock in my_interval_timer(). I believe that we must do the same for most or all use of ut_time().

            marko Marko Mäkelä added a comment - It looks like MySQL 5.7.27 might fix this. I believe that we must fix this differently. I already replaced ut_time_us() and ut_time_ms() with the use of the monotonic clock in my_interval_timer() . I believe that we must do the same for most or all use of ut_time() .

            The high-precision timers may incur significantly more overhead than time(NULL). I will not introduce new calls to my_interval_timer() or equivalent in normal operation, because it could reduce performance.

            What we can and will do is remove some redundant calls to time(NULL). The failing assertion is simply too strict. I will change that logic so that if the system clock was moved to the past, the time margins will expire immediately.

            marko Marko Mäkelä added a comment - The high-precision timers may incur significantly more overhead than time(NULL) . I will not introduce new calls to my_interval_timer() or equivalent in normal operation, because it could reduce performance. What we can and will do is remove some redundant calls to time(NULL) . The failing assertion is simply too strict. I will change that logic so that if the system clock was moved to the past, the time margins will expire immediately.

            As related cleanup, I also removed ut_timer_now(), which was only used by innodb_defragment. The call to ut_init_timer() was missing in 10.2, so it is possible that the innodb_defragment_frequency had been broken in MariaDB 10.2.2.

            marko Marko Mäkelä added a comment - As related cleanup, I also removed ut_timer_now() , which was only used by innodb_defragment . The call to ut_init_timer() was missing in 10.2, so it is possible that the innodb_defragment_frequency had been broken in MariaDB 10.2.2.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.