Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14643

InnoDB: Failing assertion: !cursor->index->is_committed()

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.2.10, 10.2(EOL), 10.3(EOL)
    • 10.5.0, 10.2.37, 10.3.28, 10.4.18
    • None
    • Ubuntu 64 bits. CPU: Intel Xeon E5-1660v3 - 8c/16t - 3GHz /3.5GHz
      RAM: 128GB DDR4 ECC 2400 MHz
      Disks: SoftRaid 2x4TB

    Description

      we are having the following assertion error and server is crashing on production on every 3-5 days.

      Oct 24 17:46:15 ns535551 mysqld[30669]: 2017-10-24 17:46:15 0x7f1044218700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.2.9/storage/innobase/row/row0ins.cc line 274
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: Failing assertion: !cursor->index->is_committed()
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: We intentionally generate a memory trap.
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: If you get repeated assertion failures or crashes, even
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: immediately after the mysqld startup, there may be
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: corruption in the InnoDB tablespace. Please refer to
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
      Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: about forcing recovery.
      Oct 24 17:46:15 ns535551 mysqld[30669]: 171024 17:46:15 [ERROR] mysqld got signal 6 ;
      Oct 24 17:46:15 ns535551 mysqld[30669]: This could be because you hit a bug. It is also possible that this binary
      Oct 24 17:46:15 ns535551 mysqld[30669]: or one of the libraries it was linked against is corrupt, improperly built,
      Oct 24 17:46:15 ns535551 mysqld[30669]: or misconfigured. This error can also be caused by malfunctioning hardware.
      Oct 24 17:46:15 ns535551 mysqld[30669]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Oct 24 17:46:15 ns535551 mysqld[30669]: We will try our best to scrape up some info that will hopefully help
      Oct 24 17:46:15 ns535551 mysqld[30669]: diagnose the problem, but since we have already crashed,
      Oct 24 17:46:15 ns535551 mysqld[30669]: something is definitely wrong and this may fail.
      Oct 24 17:46:15 ns535551 mysqld[30669]: Server version: 10.2.9-MariaDB-10.2.9+maria~zesty-log
      Oct 24 17:46:15 ns535551 mysqld[30669]: key_buffer_size=134217728
      Oct 24 17:46:15 ns535551 mysqld[30669]: read_buffer_size=2097152
      Oct 24 17:46:15 ns535551 mysqld[30669]: max_used_connections=887
      Oct 24 17:46:15 ns535551 mysqld[30669]: max_threads=20002
      Oct 24 17:46:15 ns535551 mysqld[30669]: thread_count=845
      Oct 24 17:46:15 ns535551 mysqld[30669]: It is possible that mysqld could use up to
      Oct 24 17:46:15 ns535551 mysqld[30669]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 82388470 K  bytes of memory
      Oct 24 17:46:15 ns535551 mysqld[30669]: Hope that's ok; if not, decrease some variables in the equation.
      Oct 24 17:46:15 ns535551 mysqld[30669]: Thread pointer: 0x7f0f3c13f0a8
      Oct 24 17:46:15 ns535551 mysqld[30669]: Attempting backtrace. You can use the following information to find out
      Oct 24 17:46:15 ns535551 mysqld[30669]: where mysqld died. If you see no messages after this, something went
      Oct 24 17:46:15 ns535551 mysqld[30669]: terribly wrong...
      Oct 24 17:46:15 ns535551 mysqld[30669]: stack_bottom = 0x7f1044217cf8 thread_stack 0x49000
      Oct 24 17:46:15 ns535551 mysqld[30669]: *** buffer overflow detected ***: /usr/sbin/mysqld terminated
      Oct 24 17:46:15 ns535551 mysqld[30669]: ======= Backtrace: =========
      Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7f18bc07308b]
      Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x54)[0x7f18bc114f84]
      Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(+0x118f00)[0x7f18bc112f00]
      Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(+0x11aeda)[0x7f18bc114eda]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(my_addr_resolve+0x4c)[0x56114b087c4c]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(my_print_stacktrace+0x1bb)[0x56114b06e8eb]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(handle_fatal_signal+0x3cd)[0x56114ab0c6cd]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11670)[0x7f18bca63670]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x9f)[0x7f18bc02f77f]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f18bc03137a]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x409d3b)[0x56114a8c2d3b]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8918ea)[0x56114ad4a8ea]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8945dc)[0x56114ad4d5dc]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8d801e)[0x56114ad9101e]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8d8619)[0x56114ad91619]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8a7b14)[0x56114ad60b14]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x7e8813)[0x56114aca1813]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_ZN7handler13ha_update_rowEPKhPh+0x152)[0x56114ab16652]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0x1c3d)[0x56114aa0ef6d]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x26af)[0x56114a974a5f]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x28a)[0x56114a97b40a]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1fb6)[0x56114a97e266]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z10do_commandP3THD+0x167)[0x56114a97eba7]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x25a)[0x56114aa451ca]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(handle_one_connection+0x3d)[0x56114aa4533d]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da)[0x7f18bca596da]
      Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x5f)[0x7f18bc102d7f]
      

      I checked the source code
      It is happening here

      if (!rec_get_deleted_flag(rec, rec_offs_comp(*offsets))) {
      		/* We should never insert in place of a record that
      		has not been delete-marked. The only exception is when
      		online CREATE INDEX copied the changes that we already
      		made to the clustered index, and completed the
      		secondary index creation before we got here. In this
      		case, the change would already be there. The CREATE
      		INDEX should be waiting for a MySQL meta-data lock
      		upgrade at least until this INSERT or UPDATE
      		returns. After that point, set_committed(true)
      		would be invoked in commit_inplace_alter_table(). */
      		ut_a(update->n_fields == 0);
      		ut_a(!cursor->index->is_committed());
      		ut_ad(!dict_index_is_online_ddl(cursor->index));
      		return(DB_SUCCESS);
      	}
      

      We have started to have this problem after doing the following configuration

      innodb_flush_log_at_trx_commit = 0
      

      we have a chat application and inserting 1 million records to message table every day
      one of these inserts is trying to use a secondary index which is already deleted and committed but not delete marked. That is why it is database is crashing with assertion failure. It seems to be a synchronization problem between change buffer and disk. Delete transaction on index is committed to table but it does not have delete marked flag probably waiting for change buffer to be merged.

      This case can be reproducible with many inserts and deletes on the same table with parameter innodb_flush_log_at_trx_commit = 0 or innodb_flush_log_at_trx_commit = 2
      in our case, it is happening after 3 - 5 days.

      I configured database with the following parameter

      innodb_change_buffering        = inserts
      

      and restarted. I disabled keeping change buffer for delete operations then the problem has been resolved database is working without crash for 17 days.

      But this time delete operations are very slow.. 10 times slower than before and we are having replication delays

      Attachments

        1. t65.test
          7.84 MB
        2. my.cnf
          6 kB
        3. all_threads_Jan9
          54 kB

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              mehmet Mehmet
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.