Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23670

Crash during OPTIMIZE TABLE mysql.innodb_table_stats

Details

    Description

      We run rather large production servers with over hundreds of databases with varying sizes between a few MB and many GB.

      To improve performance we optimize tables periodically with a cron running this command:

      /usr/bin/mysqlcheck --optimize --all-databases --auto-repair
      

      We started noticing problems with crashing MariaDB servers during the optimize using MariaDB 10.1. We already switched to MariaDB 10.3 in the meantime.

      The problem keeps returning almost every optimize with MariaDB 10.3 on our most busy production servers. On less busy servers the problem seems more rare. In a test environments we are unable to reproduce the issue.

      Log:

      2020-09-04 08:10:17 0x7fc16c0b0700  InnoDB: Assertion failure in file /builddir/build/BUILD/mariadb-10.3.23/storage/innobase/row/row0merge.cc line 4492
      InnoDB: Failing assertion: table->get_ref_count() == 0
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      200904  8:10:17 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.3.23-MariaDB-log-cll-lve
      key_buffer_size=67108864
      read_buffer_size=1048576
      max_used_connections=253
      max_threads=502
      thread_count=49
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1104963 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7fc0b0178be8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fc16c0afd68 thread_stack 0x40000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55f8152d3d5e]
      *** stack smashing detected ***: /usr/sbin/mysqld terminated
      

      Config:

      mysqld would have been started with the following arguments:
      --basedir=/usr --bind-address=:: --binlog_checksum=NONE --binlog_format=STATEMENT --datadir=/var/lib/mysql --expire_logs_days=10 --ft_min_word_len=3 --innodb_buffer_pool_size=256M --innodb_checksum_algorithm=innodb --innodb_doublewrite=0 --innodb_file_format=barracuda --innodb_file_per_table=1 --innodb_large_prefix=ON --innodb_log_file_size=192M --innodb_strict_mode=false --innodb_use_native_aio=0 --join_buffer_size=1M --key_buffer_size=64M --local-infile=1 --log-error=/var/log/mysqld.log --log_warnings=2 --long_query_time=2 --max_allowed_packet=24M --max_binlog_size=100M --max_connections=500 --max_heap_table_size=20M --max_user_connections=100 --myisam_sort_buffer_size=32M --open_files_limit=51200 --pid-file=/var/run/mysqld/mysqld.pid --port=3306 --query_cache_size=32M --read_buffer_size=1M --read_rnd_buffer_size=1M --skip-external-locking --slow_query_log=1 --slow_query_log_file=/var/lib/mysql/slow_query.log --socket=/var/lib/mysql_sock/mysql.sock --sort_buffer_size=1M --sql_mode=NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION --ssl --ssl-cert=/etc/pki/tls/certs/server.fullchain --ssl-key=/etc/pki/tls/private/server.mysql.key --ssl_cipher=TLSv1.2 --symbolic-links=0 --table_cache=2048 --table_definition_cache=2048 --thread_cache_size=8 --thread_stack=256K --tmp_table_size=10M --tmpdir=/var/lib/mysql_tmp --user=mysql 
      

      Attachments

        Issue Links

          Activity

            I would need a stack trace of all threads during the crash, to analyze this. Please try to preserve the core dump for further questions (I may need to run some debugger commands to extract more information).

            marko Marko Mäkelä added a comment - I would need a stack trace of all threads during the crash, to analyze this. Please try to preserve the core dump for further questions (I may need to run some debugger commands to extract more information).

            Joriz, I believe that running any DDL on the InnoDB statistics tables always was crash-prone before the fix of MDEV-25919 in the upcoming MariaDB Server 10.6.5 release. OPTIMIZE TABLE is internally implemented as ALTER TABLE…FORCE.

            valerii, does the table of the support customer contain FULLTEXT INDEX? If yes, that could be a duplicate of MDEV-25702. Do we have fully resolved stack traces of those crashes?

            marko Marko Mäkelä added a comment - Joriz , I believe that running any DDL on the InnoDB statistics tables always was crash-prone before the fix of MDEV-25919 in the upcoming MariaDB Server 10.6.5 release. OPTIMIZE TABLE is internally implemented as ALTER TABLE…FORCE . valerii , does the table of the support customer contain FULLTEXT INDEX ? If yes, that could be a duplicate of MDEV-25702 . Do we have fully resolved stack traces of those crashes?

            Joriz, I believe that the originally reported issue (crash during OPTIMIZE TABLE mysql.innodb_index_stats or OPTIMIZE TABLE mysql.innodb_table_stats) has been fixed in MDEV-25919 in the upcoming 10.6.5 release. The fix is that any internal InnoDB operation that would access the statistics tables will acquire a shared metadata lock (MDL) on the statistics table names. That will prevent concurrent execution of such background operations and DDL (such as OPTIMIZE TABLE or ALTER TABLE) on the statistics tables.

            In older release series this bug is not feasible to fix, because that fix depends on many large refactoring tasks that were only implemented in the 10.6 release series.

            The crash that a support customer reported for 10.1.34 may or may not have occurred due to the same reason. There is no such assertion expression table->get_ref_count() == 0 in the 10.1 source code.

            marko Marko Mäkelä added a comment - Joriz , I believe that the originally reported issue (crash during OPTIMIZE TABLE mysql.innodb_index_stats or OPTIMIZE TABLE mysql.innodb_table_stats ) has been fixed in MDEV-25919 in the upcoming 10.6.5 release. The fix is that any internal InnoDB operation that would access the statistics tables will acquire a shared metadata lock (MDL) on the statistics table names. That will prevent concurrent execution of such background operations and DDL (such as OPTIMIZE TABLE or ALTER TABLE ) on the statistics tables. In older release series this bug is not feasible to fix, because that fix depends on many large refactoring tasks that were only implemented in the 10.6 release series. The crash that a support customer reported for 10.1.34 may or may not have occurred due to the same reason. There is no such assertion expression table->get_ref_count() == 0 in the 10.1 source code.

            This issue also appears to apply to MariaDB 10.5.13. After rebooting a Debian system with 10.5.13, crashed tables are automatically attempted repaired, and this results in crashes a few times per minute with the following error:

            InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.13/storage/innobase/row/row0merge.cc line 4338

            After upgrading to MariaDB 10.6.5, it appears stable, repairing tables does not crash the server.

            It would therefore be nice if 10.5.13 was added to the list of affected versions where this crasher will not be fixed.

            frettled Jan Ingvoldstad added a comment - This issue also appears to apply to MariaDB 10.5.13 . After rebooting a Debian system with 10.5.13, crashed tables are automatically attempted repaired, and this results in crashes a few times per minute with the following error: InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.13/storage/innobase/row/row0merge.cc line 4338 After upgrading to MariaDB 10.6.5, it appears stable, repairing tables does not crash the server. It would therefore be nice if 10.5.13 was added to the list of affected versions where this crasher will not be fixed.

            I do not think that it is feasible to port the fix (MDEV-25919) from 10.6 to earlier versions. Any releases between 10.0 and 10.5 are affected by this.

            marko Marko Mäkelä added a comment - I do not think that it is feasible to port the fix ( MDEV-25919 ) from 10.6 to earlier versions. Any releases between 10.0 and 10.5 are affected by this.

            People

              marko Marko Mäkelä
              Joriz Joris de Leeuw
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.