[MDEV-13536] DB_TRX_ID is not actually being reset when the history is removed Created: 2017-08-15  Updated: 2017-12-07  Resolved: 2017-08-16

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.1
Fix Version/s: 10.3.1

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: performance, transactions

Issue Links:
Blocks
blocks MDEV-13697 DB_TRX_ID is not always reset when th... Closed
Problem/Incident
causes MDEV-13654 Various crashes due to DB_TRX_ID mism... Closed
causes MDEV-13820 trx_id_check() fails during row_log_t... Closed
is caused by MDEV-12288 Reset DB_TRX_ID when the history is r... Closed
Relates
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-8139 Fix scrubbing Closed
relates to MDEV-13559 encryption.innodb-redo-badkey failed ... Closed

 Description   

The purpose of MDEV-12288 is to reset the DB_TRX_ID column when the history is being removed. This is not taking place:

--source include/have_innodb.inc
CREATE TABLE t1(a INT PRIMARY KEY, b INT NOT NULL) ENGINE=InnoDB;
INSERT INTO t1 VALUES(1,2),(3,4);
UPDATE t1 SET b=-3 WHERE a=3;
# Initiate a full purge, which should reset all DB_TRX_ID.
SET GLOBAL innodb_fast_shutdown=0;
--source include/shutdown_mysqld.inc

After running this test, the t1.ibd file page 3 will contain nonzero DB_TRX_ID values.



 Comments   
Comment by Marko Mäkelä [ 2017-08-15 ]

bb-10.3-marko

Comment by Marko Mäkelä [ 2017-08-15 ]

jplindst, please run innodb.innodb_bug14147491 in the branch. It seems that some follow-up fix for MDEV-12253 is needed, so that the server will not crash when flagging a page corrupted.
I pushed a follow-up adjustment for some test failures, including a workaround for the innodb.innodb_14147491 failure.
It looks like some code changes will be necessary, because the system columns in two records are not being reset in innodb.table_flags.

Comment by Marko Mäkelä [ 2017-08-15 ]

It turned out that some undo log records were being omitted from the purge queue.
Because of this, among other things, the history for the SYS_DATAFILES and SYS_FOREIGN_COLS records in SYS_TABLES was not being reset. The records were being skipped, even though the records for SYS_TABLESPACES or SYS_FOREIGN (by the same transactions) were buffered for purge.

Comment by Jan Lindström (Inactive) [ 2017-08-16 ]

Last commit ok to push, the first one still has FIXME comment, has the code review done ?

Comment by Marko Mäkelä [ 2017-08-16 ]

There is one outstanding FIXME in the first commit: Carefully review the code to determine if the comment that we are removing from trx_purge_free_segment() is bogus, and if it is actually safe to remove the TRX_UNDO_DEL_MARKS field.
I did not observe a test failure related to this, but I do need to review this. We definitely do not want to have purge accessing a stale or freed undo log page.

All other FIXME comments have been addressed in subsequent commits to bb-10.3-marko. I will squash these commits once bb-10.3-marko is green.
The final push to 10.3 will consist of two commits:

  1. the first 3 commits squashed together (excluding the innodb.innodb_bug14147491 changes)
  2. innodb.innodb_bug14147491 and encryption test workarounds for MDEV-13542 Crashing on a corrupted page is unhelpful
Comment by Jan Lindström (Inactive) [ 2017-08-16 ]

ok to push.

Comment by Axel Schwenke [ 2017-08-22 ]

Note for benchmarks: implemented in commit 92f9be4, reference should be the commit immediately before f4b379d. Workload shall include read-only and read/write TRX. Possibly some queries using secondary indexes.

Generated at Thu Feb 08 08:06:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.