[MDEV-13495] Crash in rollback of a recovered INSERT transaction after upgrade Created: 2017-08-10  Updated: 2017-08-13  Resolved: 2017-08-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.1
Fix Version/s: 10.3.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: File data.tar.gz     HTML File threads1    
Issue Links:
Problem/Incident
is caused by MDEV-12288 Reset DB_TRX_ID when the history is r... Closed
Relates
relates to MDEV-13269 Upgrade tests for InnoDB undo log Closed

 Description   

2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Completed initialization of buffer pool
2017-08-10 17:48:57 140404724037376 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-08-10 17:48:57 140405466861632 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Trx id counter is 7424
2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Upgrading redo log: 2*50331648 bytes; LSN=9779797
2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Starting to delete and rewrite log files.
2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Setting log file ./ib_logfile101 size to 50331648 bytes
2017-08-10 17:48:57 140405466861632 [Note] InnoDB: Setting log file ./ib_logfile1 size to 50331648 bytes
2017-08-10 17:48:58 140405466861632 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2017-08-10 17:48:58 140405466861632 [Note] InnoDB: New log files created, LSN=9779797
2017-08-10 17:48:58 140404715644672 [Note] InnoDB: Starting in background the rollback of recovered transactions
2017-08-10 17:48:58 140405466861632 [Note] InnoDB: 128 out of 128 rollback segments are active.
2017-08-10 17:48:58 140404715644672 [Note] InnoDB: Rolling back trx with id 6568, 1 rows to undo
2017-08-10 17:48:58 140405466861632 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-08-10 17:48:58 140405466861632 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
mysqld: /data/src/10.3/storage/innobase/trx/trx0trx.cc:1918: void trx_commit_low(trx_t*, mtr_t*): Assertion `!mtr == !trx->has_logged()' failed.
170810 17:48:58 [ERROR] mysqld got signal 6 ;

10.3 73297f532fba89e33785460448e93f87478c2505

#3  0x00007f3ff87f7ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000565165cfad75 in trx_commit_low (trx=0x7f3ff5406878, mtr=0x7f3fce337470) at /data/src/10.3/storage/innobase/trx/trx0trx.cc:1918
#5  0x0000565165cfafcc in trx_commit (trx=0x7f3ff5406878) at /data/src/10.3/storage/innobase/trx/trx0trx.cc:2018
#6  0x0000565165cee5c2 in trx_rollback_finish (trx=0x7f3ff5406878) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:1147
#7  0x0000565165cecf5b in trx_rollback_active (trx=0x7f3ff5406878) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:689
#8  0x0000565165ced427 in trx_rollback_resurrected (trx=0x7f3ff5406878, all=1) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:773
#9  0x0000565165ced6d6 in trx_rollback_or_clean_recovered (all=1) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:841
#10 0x0000565165ced889 in trx_rollback_or_clean_all_recovered (arg=0x0) at /data/src/10.3/storage/innobase/trx/trx0roll.cc:881
#11 0x00007f3ffa73c494 in start_thread (arg=0x7f3fce338700) at pthread_create.c:333
#12 0x00007f3ff88b493f in clone () from /lib/x86_64-linux-gnu/libc.so.6

The attached datadir was created as a part of MDEV-13269 workflow.

  • 10.2 server (86f9b7714791) was started with --innodb-change-buffering=none --innodb-page-size=8K --innodb-compression-algorithm=zlib, otherwise defaults;
  • a number of tables was created and concurrent DML was executed in 4 threads for about a minute;
  • the server was SIGKILL-ed;
  • the server was started again with the same options plus --innodb-force-recovery=3;
  • the server was shut down normally.

Now, to reproduce the reported assertion failure, start 10.3 server on the provided datadir with --innodb-change-buffering=none --innodb-page-size=8K --innodb-compression-algorithm=zlib, the failure happens upon startup.

10.2 starts all right on this datadir.

Note: in the initial scenario, there was an intermediate step when the 10.3 server is started on the datadir with --innodb-read-only and then gets shut down normally. It starts all right, but the step does not change the outcome when the server is finally restarted in a normal mode, so it's omitted above.



 Comments   
Comment by Marko Mäkelä [ 2017-08-11 ]

The crash happens in the rollback of a recovered transaction:

2017-08-11 11:22:34 140736992429824 [Note] InnoDB: Rolling back trx with id 6568, 1 rows to undo
#3  0x00007ffff6550f12 in __GI___assert_fail (
    assertion=0x10140f5f3 "!mtr == !trx->has_logged()", 
    file=0x10140e758 "/mariadb/10.3/storage/innobase/trx/trx0trx.cc", 
    line=1918, 
    function=0x101411a00 <trx_commit_low(trx_t*, mtr_t*)::__PRETTY_FUNCTION__> "void trx_commit_low(trx_t*, mtr_t*)") at assert.c:101
#4  0x0000000100d1dc77 in trx_commit_low (trx=0x7ffff03dc888, 
    mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1918
#5  0x0000000100d1deca in trx_commit (trx=0x7ffff03dc888)
    at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:2018
#6  0x0000000100d10822 in trx_rollback_finish (trx=0x7ffff03dc888)
    at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:1147
#7  0x0000000100d0f143 in trx_rollback_active (trx=0x7ffff03dc888)
    at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:689
#8  0x0000000100d0f63a in trx_rollback_resurrected (trx=0x7ffff03dc888, all=1)
    at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:773
#9  0x0000000100d0f91e in trx_rollback_or_clean_recovered (all=1)
    at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:841

InnoDB implements ROLLBACK by first applying the undo log backwards, and then committing the transaction. The crash occurs in the commit.

The assertion fails because mtr!=NULL (because trx->has_logged_or_recovered() holds in trx_commit()) but trx->has_logged() does not hold.
This distinction was made in MDEV-12288:

	/** @return whether any undo log has been generated */
	bool has_logged() const
	{
		return(has_logged_persistent() || rsegs.m_noredo.undo);
	}
 
	/** @return whether any undo log has been generated or
	recovered */
	bool has_logged_or_recovered() const
	{
		return(has_logged() || rsegs.m_redo.old_insert);
	}

Because the rollback happens after upgrade, we indeed have insert_undo log attached to the transaction. This is a too strict debug assertion; it should check for has_logged_or_recovered() instead of has_logged().
After relaxing the assertion, we hit another problem for this transaction: trx->rsegs.m_redo.undo is NULL, causing a SIGSEGV when it is being dereferenced:

#0  0x0000000100cf5478 in trx_purge_add_update_undo_to_history (
    trx=0x7ffff03dc888, 
    undo_page=0x7fffea8b2000 "E\026", <incomplete sequence \330>, 
    mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0purge.cc:252
#1  0x0000000100d1c354 in trx_write_serialisation_history (
    trx=0x7ffff03dc888, mtr=0x7fffe270b410)
    at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1451
#2  0x0000000100d1dd36 in trx_commit_low (trx=0x7ffff03dc888, 
    mtr=0x7fffe270b410) at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:1948
#3  0x0000000100d1deca in trx_commit (trx=0x7ffff03dc888)
    at /mariadb/10.3/storage/innobase/trx/trx0trx.cc:2018
#4  0x0000000100d10822 in trx_rollback_finish (trx=0x7ffff03dc888)
    at /mariadb/10.3/storage/innobase/trx/trx0roll.cc:1147

Generated at Thu Feb 08 08:06:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.