Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20475

Assertion `flushed_lsn == log_get_lsn()' failed in srv_prepare_to_delete_redo_log_files upon server startup

Details

    Description

      10.2 9bf424bc

      2019-08-30  1:12:28 140084597409600 [Note] InnoDB: Highest supported file format is Barracuda.
      2019-08-30  1:12:29 140084597409600 [Note] InnoDB: Encrypting redo log: 1*524288000 bytes; LSN=1656361
      mysqld: /data/src/10.2/storage/innobase/srv/srv0start.cc:1432: lsn_t srv_prepare_to_delete_redo_log_files(ulint): Assertion `flushed_lsn == log_get_lsn()' failed.
      190830  1:12:29 [ERROR] mysqld got signal 6 ;
       
      #3  <signal handler called>
      #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
      #5  0x00007f67fab2742a in __GI_abort () at abort.c:89
      #6  0x00007f67fab1ee67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55e58b010ff4 "flushed_lsn == log_get_lsn()", file=file@entry=0x55e58b010520 "/data/src/10.2/storage/innobase/srv/srv0start.cc", line=line@entry=1432, function=function@entry=0x55e58b012ca0 <srv_prepare_to_delete_redo_log_files(unsigned long)::__PRETTY_FUNCTION__> "lsn_t srv_prepare_to_delete_redo_log_files(ulint)") at assert.c:92
      #7  0x00007f67fab1ef12 in __GI___assert_fail (assertion=0x55e58b010ff4 "flushed_lsn == log_get_lsn()", file=0x55e58b010520 "/data/src/10.2/storage/innobase/srv/srv0start.cc", line=1432, function=0x55e58b012ca0 <srv_prepare_to_delete_redo_log_files(unsigned long)::__PRETTY_FUNCTION__> "lsn_t srv_prepare_to_delete_redo_log_files(ulint)") at assert.c:101
      #8  0x000055e58aa0a0e4 in srv_prepare_to_delete_redo_log_files (n_files=1) at /data/src/10.2/storage/innobase/srv/srv0start.cc:1432
      #9  0x000055e58aa0c999 in innobase_start_or_create_for_mysql () at /data/src/10.2/storage/innobase/srv/srv0start.cc:2381
      #10 0x000055e58a853675 in innobase_init (p=0x55e58cfe2940) at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:4371
      #11 0x000055e58a64cc1b in ha_initialize_handlerton (plugin=0x55e58cfb6678) at /data/src/10.2/sql/handler.cc:555
      #12 0x000055e58a3e5d64 in plugin_initialize (tmp_root=0x7ffddebb4450, plugin=0x55e58cfb6678, argc=0x55e58b93e928 <remaining_argc>, argv=0x55e58cf51b90, options_only=false) at /data/src/10.2/sql/sql_plugin.cc:1417
      #13 0x000055e58a3e6986 in plugin_init (argc=0x55e58b93e928 <remaining_argc>, argv=0x55e58cf51b90, flags=2) at /data/src/10.2/sql/sql_plugin.cc:1698
      #14 0x000055e58a2ef7bb in init_server_components () at /data/src/10.2/sql/mysqld.cc:5335
      #15 0x000055e58a2f08a5 in mysqld_main (argc=36, argv=0x55e58cf51b90) at /data/src/10.2/sql/mysqld.cc:5931
      #16 0x000055e58a2e57b0 in main (argc=30, argv=0x7ffddebb51e8) at /data/src/10.2/sql/main.cc:25
      

      Reproducing doesn't require any inflow on the server. It happens randomly, quite rarely, upon server startup right after bootstrap, it's enough to run bootstrap-startup in a loop with given options for a long enough time (hours). I use this crude script for that, but it can really be anything, as long as the server started with the right options. Scrubbing is important, but also on some reason innodb_log_files_in_group (or possibly some other InnoDB log options) – maybe resizing redo log is essential for the scenario.

      Unfortunately, I don't know how to make it any faster / more probable.

      if [ -n "$1" ] ; then
          port=$1
      else
          port=8307
      fi
       
      while [ true ] ; do
          rm -rf data data.bootstrap tmp
          mkdir data
          mkdir tmp
       
          echo `date` "Running bootstrap..."
          scripts/mysql_install_db --no-defaults --basedir=`pwd` --lc-messages-dir=`pwd`/share --character-sets-dir=`pwd`/share/charsets --tmpdir=`pwd`/tmp --datadir=`pwd`/data --innodb_log_files_in_group=1 > `pwd`/boot.log 2>&1
       
          cp -r data data.bootstrap
       
          echo `date` "Starting server on port $port..."
          `pwd`/bin/mysqld --no-defaults --basedir=`pwd` --lc-messages-dir=`pwd`/share --character-sets-dir=`pwd`/share/charsets --tmpdir=`pwd`/tmp --core-file --datadir=`pwd`/data --port=$port --socket=`pwd`/mysql.sock --pid-file=`pwd`/mysql.pid --innodb-encryption-threads=4 --loose-innodb_scrub_log=ON --log-error=`pwd`/mysql.err &
       
          res=1;
          echo `date` "Waiting for the server to start..."
          for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ; do
              sleep 1
              `pwd`/bin/mysqladmin -uroot --socket=`pwd`/mysql.sock ping 2>/dev/null
      #        `pwd`/bin/mysqladmin --socket=`pwd`/mysql.sock ping 2>/dev/null
              if [ $? == 0 ] ; then
                  echo "Success"
                  res=0
                  break
              fi
          done
          if [ $res == 1 ] ; then
              echo `date` "Failed to start the server"
              break
          fi
          echo `date` "Trying to shut down the server..."
          `pwd`/bin/mysql -uroot --socket=`pwd`/mysql.sock -e shutdown 2>/dev/null
      #    `pwd`/bin/mysql --socket=`pwd`/mysql.sock -e shutdown 2>/dev/null
          if [ $? != 0 ] ; then
              echo `date` "Shut down failed on some reason"
              break
          else
              echo `date` "Success"
          fi
          fg
      done
      

      The problem happens interchangeably with MDEV-18370 and MDEV-20474, they're likely to be related.

      Attachments

        Issue Links

          Activity

            My main suspects are innodb_scrub_log=ON and much less likely innodb_stats_traditional=OFF. The other options should not have any direct impact on what is being written to the redo log.
            Side note: I would recommend to stay away from innodb_lock_schedule_algorithm=VATS until MDEV-16664 has been fixed. I do not think that that a correctness issue with transactional locks would lead to an issue with the redo log, though. It might cause higher level corruption of the MDEV-9663 kind.

            marko Marko Mäkelä added a comment - My main suspects are innodb_scrub_log=ON and much less likely innodb_stats_traditional=OFF . The other options should not have any direct impact on what is being written to the redo log. Side note: I would recommend to stay away from innodb_lock_schedule_algorithm=VATS  until MDEV-16664 has been fixed. I do not think that that a correctness issue with transactional locks would lead to an issue with the redo log, though. It might cause higher level corruption of the MDEV-9663 kind.

            innodb_scrub_log is most certainly a part of the deal, but not the only part. I am still trying to figure out which other options make a difference.

            elenst Elena Stepanova added a comment - innodb_scrub_log is most certainly a part of the deal, but not the only part. I am still trying to figure out which other options make a difference.

            Can you reproduce this at all without innodb_scrub_log, which we disabled in MDEV-21870? If yes, I would like to analyze a rr replay.

            marko Marko Mäkelä added a comment - Can you reproduce this at all without innodb_scrub_log , which we disabled in MDEV-21870 ? If yes, I would like to analyze a rr replay .

            If the description says "scrubbing is important", it means I tried really hard to reproduce it without scrubbing and didn't succeed. If you want to close it as "won't fix" in 10.2-10.4 because it was removed in 10.5 anyway, please feel free to do so, I have no objections.

            elenst Elena Stepanova added a comment - If the description says "scrubbing is important", it means I tried really hard to reproduce it without scrubbing and didn't succeed. If you want to close it as "won't fix" in 10.2-10.4 because it was removed in 10.5 anyway, please feel free to do so, I have no objections.

            Starting with MDEV-21870, the parameter innodb_scrub_log is being ignored.

            marko Marko Mäkelä added a comment - Starting with MDEV-21870 , the parameter innodb_scrub_log is being ignored.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.