Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13495

Crash in rollback of a recovered INSERT transaction after upgrade

Details

    Description

      2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Completed initialization of buffer pool
      2017-08-10 17:48:57 140404724037376 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      2017-08-10 17:48:57 140405466861632 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
      2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Trx id counter is 7424
      2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Upgrading redo log: 2*50331648 bytes; LSN=9779797
      2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Starting to delete and rewrite log files.
      2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Setting log file ./ib_logfile101 size to 50331648 bytes
      2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Setting log file ./ib_logfile1 size to 50331648 bytes
      2017-08-10 17:48:58 140405466861632 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
      2017-08-10 17:48:58 140405466861632 [Note] InnoDB: New log files created, LSN=9779797
      2017-08-10 17:48:58 140404715644672 [Note] InnoDB: Starting in background the rollback of recovered transactions
      2017-08-10 17:48:58 140405466861632 [Note] InnoDB: 128 out of 128 rollback segments are active.
      2017-08-10 17:48:58 140404715644672 [Note] InnoDB: Rolling back trx with id 6568, 1 rows to undo
      2017-08-10 17:48:58 140405466861632 [Note] InnoDB: Creating shared tablespace for temporary tables
      2017-08-10 17:48:58 140405466861632 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      mysqld: /data/src/10.3/storage/innobase/trx/trx0trx.cc:1918: void trx_commit_low(trx_t*, mtr_t*): Assertion `!mtr == !trx->has_logged()' failed.
      170810 17:48:58 [ERROR] mysqld got signal 6 ;
      

      10.3 73297f532fba89e33785460448e93f87478c2505

      #3  0x00007f3ff87f7ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #4  0x0000565165cfad75 in trx_commit_low (trx=0x7f3ff5406878, mtr=0x7f3fce337470) at /data/src/10.3/storage/innobase/trx/trx0trx.cc:1918
      #5  0x0000565165cfafcc in trx_commit (trx=0x7f3ff5406878) at /data/src/10.3/storage/innobase/trx/trx0trx.cc:2018
      #6  0x0000565165cee5c2 in trx_rollback_finish (trx=0x7f3ff5406878) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:1147
      #7  0x0000565165cecf5b in trx_rollback_active (trx=0x7f3ff5406878) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:689
      #8  0x0000565165ced427 in trx_rollback_resurrected (trx=0x7f3ff5406878, all=1) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:773
      #9  0x0000565165ced6d6 in trx_rollback_or_clean_recovered (all=1) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:841
      #10 0x0000565165ced889 in trx_rollback_or_clean_all_recovered (arg=0x0) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:881
      #11 0x00007f3ffa73c494 in start_thread (arg=0x7f3fce338700) at pthread_create.c:333
      #12 0x00007f3ff88b493f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      The attached datadir was created as a part of MDEV-13269 workflow.

      • 10.2 server (86f9b7714791) was started with --innodb-change-buffering=none --innodb-page-size=8K --innodb-compression-algorithm=zlib, otherwise defaults;
      • a number of tables was created and concurrent DML was executed in 4 threads for about a minute;
      • the server was SIGKILL-ed;
      • the server was started again with the same options plus --innodb-force-recovery=3;
      • the server was shut down normally.

      Now, to reproduce the reported assertion failure, start 10.3 server on the provided datadir with --innodb-change-buffering=none --innodb-page-size=8K --innodb-compression-algorithm=zlib, the failure happens upon startup.

      10.2 starts all right on this datadir.

      Note: in the initial scenario, there was an intermediate step when the 10.3 server is started on the datadir with --innodb-read-only and then gets shut down normally. It starts all right, but the step does not change the outcome when the server is finally restarted in a normal mode, so it's omitted above.

      Attachments

        1. data.tar.gz
          5.91 MB
        2. threads1
          24 kB

        Issue Links

          Activity

            The crash happens in the rollback of a recovered transaction:

            2017-08-11 11:22:34 140736992429824 [Note] InnoDB: Rolling back trx with id 6568, 1 rows to undo
            …
            #3  0x00007ffff6550f12 in __GI___assert_fail (
                assertion=0x10140f5f3 "!mtr == !trx->has_logged()", 
                file=0x10140e758 "/mariadb/10.3/storage/innobase/trx/trx0trx.cc", 
                line=1918, 
                function=0x101411a00 <trx_commit_low(trx_t*, mtr_t*)::__PRETTY_FUNCTION__> "void trx_commit_low(trx_t*, mtr_t*)") at assert.c:101
            #4  0x0000000100d1dc77 in trx_commit_low (trx=0x7ffff03dc888, 
                mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1918
            #5  0x0000000100d1deca in trx_commit (trx=0x7ffff03dc888)
                at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:2018
            #6  0x0000000100d10822 in trx_rollback_finish (trx=0x7ffff03dc888)
                at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:1147
            #7  0x0000000100d0f143 in trx_rollback_active (trx=0x7ffff03dc888)
                at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:689
            #8  0x0000000100d0f63a in trx_rollback_resurrected (trx=0x7ffff03dc888, all=1)
                at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:773
            #9  0x0000000100d0f91e in trx_rollback_or_clean_recovered (all=1)
                at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:841
            

            InnoDB implements ROLLBACK by first applying the undo log backwards, and then committing the transaction. The crash occurs in the commit.

            The assertion fails because mtr!=NULL (because trx->has_logged_or_recovered() holds in trx_commit()) but trx->has_logged() does not hold.
            This distinction was made in MDEV-12288:

            	/** @return whether any undo log has been generated */
            	bool has_logged() const
            	{
            		return(has_logged_persistent() || rsegs.m_noredo.undo);
            	}
             
            	/** @return whether any undo log has been generated or
            	recovered */
            	bool has_logged_or_recovered() const
            	{
            		return(has_logged() || rsegs.m_redo.old_insert);
            	}
            

            Because the rollback happens after upgrade, we indeed have insert_undo log attached to the transaction. This is a too strict debug assertion; it should check for has_logged_or_recovered() instead of has_logged().
            After relaxing the assertion, we hit another problem for this transaction: trx->rsegs.m_redo.undo is NULL, causing a SIGSEGV when it is being dereferenced:

            #0  0x0000000100cf5478 in trx_purge_add_update_undo_to_history (
                trx=0x7ffff03dc888, 
                undo_page=0x7fffea8b2000 "E\026", <incomplete sequence \330>, 
                mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0purge.cc:252
            #1  0x0000000100d1c354 in trx_write_serialisation_history (
                trx=0x7ffff03dc888, mtr=0x7fffe270b410)
                at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1451
            #2  0x0000000100d1dd36 in trx_commit_low (trx=0x7ffff03dc888, 
                mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1948
            #3  0x0000000100d1deca in trx_commit (trx=0x7ffff03dc888)
                at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:2018
            #4  0x0000000100d10822 in trx_rollback_finish (trx=0x7ffff03dc888)
                at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:1147
            

            marko Marko Mäkelä added a comment - The crash happens in the rollback of a recovered transaction: 2017-08-11 11:22:34 140736992429824 [Note] InnoDB: Rolling back trx with id 6568, 1 rows to undo … #3 0x00007ffff6550f12 in __GI___assert_fail ( assertion=0x10140f5f3 "!mtr == !trx->has_logged()", file=0x10140e758 "/mariadb/10.3/storage/innobase/trx/trx0trx.cc", line=1918, function=0x101411a00 <trx_commit_low(trx_t*, mtr_t*)::__PRETTY_FUNCTION__> "void trx_commit_low(trx_t*, mtr_t*)") at assert.c:101 #4 0x0000000100d1dc77 in trx_commit_low (trx=0x7ffff03dc888, mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1918 #5 0x0000000100d1deca in trx_commit (trx=0x7ffff03dc888) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:2018 #6 0x0000000100d10822 in trx_rollback_finish (trx=0x7ffff03dc888) at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:1147 #7 0x0000000100d0f143 in trx_rollback_active (trx=0x7ffff03dc888) at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:689 #8 0x0000000100d0f63a in trx_rollback_resurrected (trx=0x7ffff03dc888, all=1) at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:773 #9 0x0000000100d0f91e in trx_rollback_or_clean_recovered (all=1) at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:841 InnoDB implements ROLLBACK by first applying the undo log backwards, and then committing the transaction. The crash occurs in the commit. The assertion fails because mtr!=NULL (because trx->has_logged_or_recovered() holds in trx_commit()) but trx->has_logged() does not hold. This distinction was made in MDEV-12288 : /** @return whether any undo log has been generated */ bool has_logged() const { return(has_logged_persistent() || rsegs.m_noredo.undo); }   /** @return whether any undo log has been generated or recovered */ bool has_logged_or_recovered() const { return(has_logged() || rsegs.m_redo.old_insert); } Because the rollback happens after upgrade, we indeed have insert_undo log attached to the transaction. This is a too strict debug assertion; it should check for has_logged_or_recovered() instead of has_logged(). After relaxing the assertion, we hit another problem for this transaction: trx->rsegs.m_redo.undo is NULL, causing a SIGSEGV when it is being dereferenced: #0 0x0000000100cf5478 in trx_purge_add_update_undo_to_history ( trx=0x7ffff03dc888, undo_page=0x7fffea8b2000 "E\026", <incomplete sequence \330>, mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0purge.cc:252 #1 0x0000000100d1c354 in trx_write_serialisation_history ( trx=0x7ffff03dc888, mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1451 #2 0x0000000100d1dd36 in trx_commit_low (trx=0x7ffff03dc888, mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1948 #3 0x0000000100d1deca in trx_commit (trx=0x7ffff03dc888) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:2018 #4 0x0000000100d10822 in trx_rollback_finish (trx=0x7ffff03dc888) at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:1147

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.