[MDEV-13603] innodb_fast_shutdown=0 may fail to purge all history Created: 2017-08-21  Updated: 2023-10-19  Resolved: 2018-04-09

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3
Fix Version/s: 10.3.6

Type: Bug Priority: Minor
Reporter: Alice Sherepa Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: upstream, upstream-5.5

Issue Links:
Duplicate
is duplicated by MDEV-14445 innodb.table_flags failed in buildbot... Closed
Problem/Incident
is caused by MDEV-11802 innodb.innodb_bug14676111 fails in bu... Closed
Relates
relates to MDEV-11802 innodb.innodb_bug14676111 fails in bu... Closed
relates to MDEV-13039 innodb_fast_shutdown=0 may fail to pu... Closed
relates to MDEV-17049 Enable --suite=innodb_undo on buildbot Closed
relates to MDEV-11802 innodb.innodb_bug14676111 fails in bu... Closed
relates to MDEV-13697 DB_TRX_ID is not always reset when th... Closed
relates to MDEV-14080 InnoDB shutdown sometimes hangs Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-deb-xenial-ppc64le/builds/1363/steps/mtr/logs/stdio

innodb.dml_purge '64k,innodb'            w3 [ fail ]
        Test ended at 2017-08-20 16:06:45
 
CURRENT_TEST: innodb.dml_purge
--- /usr/share/mysql/mysql-test/suite/innodb/r/dml_purge.result	2017-08-20 14:24:53.000000000 -0400
+++ /dev/shm/var/3/log/dml_purge.reject	2017-08-20 16:06:45.122366993 -0400
@@ -15,8 +15,8 @@
  DB_ROLL_PTR=0x80000000000000,
  b=0x80000002)
 header=0x000018090074 (a=0x80000003,
- DB_TRX_ID=0x000000000000,
- DB_ROLL_PTR=0x80000000000000,
+ DB_TRX_ID=0x000000000506,
+ DB_ROLL_PTR=0x04000001370317,
  b=0x7ffffffd)
 header=0x030008030000 (a=0x73757072656d756d00)
 SELECT * FROM t1;
 
mysqltest: Result content mismatch



 Comments   
Comment by Elena Stepanova [ 2017-08-24 ]

No special environment required, reproducible by running the test with --repeat

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
innodb.dml_purge '16k,innodb'            [ pass ]   2713
innodb.dml_purge '16k,innodb'            [ pass ]   2722
innodb.dml_purge '16k,innodb'            [ pass ]   2619
innodb.dml_purge '16k,innodb'            [ pass ]   2615
innodb.dml_purge '16k,innodb'            [ fail ]
        Test ended at 2017-08-25 01:54:26
 
CURRENT_TEST: innodb.dml_purge
--- /data/bld/10.3-rel/mysql-test/suite/innodb/r/dml_purge.result	2017-08-16 16:56:44.000000000 +0300
+++ /data/bld/10.3-rel/mysql-test/suite/innodb/r/dml_purge.reject	2017-08-25 01:54:26.419881579 +0300
@@ -15,8 +15,8 @@
  DB_ROLL_PTR=0x80000000000000,
  b=0x80000002)
 header=0x000018090074 (a=0x80000003,
- DB_TRX_ID=0x000000000000,
- DB_ROLL_PTR=0x80000000000000,
+ DB_TRX_ID=0x000000000d0a,
+ DB_ROLL_PTR=0x070000013b0517,
  b=0x7ffffffd)
 header=0x030008030000 (a=0x73757072656d756d00)
 SELECT * FROM t1;
 
mysqltest: Result content mismatch

First happened on bb-10.3-marko e7b9c46c0436431f938ed6614f65cf85 on Aug 16th, since then has happened ~20 times on different 10.3-based trees.

Comment by Marko Mäkelä [ 2017-08-25 ]

I believe that this is caused by MDEV-11802. Given that this failure repeats so easily, it could help us fix MDEV-11802.

Comment by Marko Mäkelä [ 2017-08-28 ]

The test did not SET GLOBAL innodb_purge_rseg_truncate_frequency = 1
before the slow shutdown.

Apparently, a slow shutdown will not always run purge to completion.

Comment by Marko Mäkelä [ 2017-09-28 ]

Sporadic failures of this test still occur. Possibly related to this are Valgrind warnings that row_purge_reset_trx_id() is being invoked when node->row contains uninitialized bytes in the key. In the test that I am using (innodb.instant_alter from a MDEV-11369 development branch), it is the 2nd byte of the single 32-bit PRIMARY KEY column that is uninitialized.

Comment by Marko Mäkelä [ 2017-10-05 ]

Valgrind does not report any errors for innodb.dml_purge or innodb.instant_alter in the latest MDEV-11369 development version.
The reason for the remaining sporadic failure is unknown.

Comment by Marko Mäkelä [ 2017-10-05 ]

On innodb_fast_shutdown=0, we would certainly want everything to be purged.
I am unable to repeat the occasional remaining failure locally.

Comment by Alice Sherepa [ 2017-11-13 ]

http://buildbot.askmonty.org/buildbot/builders/kvm-deb-artful-amd64/builds/95/steps/mtr/logs/stdio
innodb.dml_purge '16k,innodb' w2 [ fail ]

Comment by Marko Mäkelä [ 2018-02-08 ]

Also the test innodb.table_flags can occasionally fail due to the same problem:

CURRENT_TEST: innodb.table_flags
/usr/local/mariadb-10.3.5-linux-x86_64/bin/mysqltest: Error on delete of '/usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/var/4/tmp/table_flags/ibtmp1' (Errcode: 2 "No such file or directory")
--- /usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/suite/innodb/r/table_flags.result	2018-02-07 17:51:05.000000000 +0200
+++ /usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/suite/innodb/r/table_flags.reject	2018-02-08 02:06:12.954068170 +0200
@@ -72,8 +72,8 @@
  CLUSTER_NAME=NULL(0 bytes),
  SPACE=0x00000002)
 header=0x000048150310 (NAME='test/td',
- DB_TRX_ID=0x000000000000,
- DB_ROLL_PTR=0x80000000000000,
+ DB_TRX_ID=0x00000000000b,
+ DB_ROLL_PTR=0x85000001390110,
  ID=0x0000000000000012,
  N_COLS=0x80000001,
  TYPE=0x00000021,
@@ -82,8 +82,8 @@
  CLUSTER_NAME=NULL(0 bytes),
  SPACE=0x00000003)
 header=0x000058150200 (NAME='test/tp',
- DB_TRX_ID=0x000000000000,
- DB_ROLL_PTR=0x80000000000000,
+ DB_TRX_ID=0x00000000000f,
+ DB_ROLL_PTR=0x870000013b0110,
  ID=0x0000000000000014,
  N_COLS=0x80000001,
  TYPE=0x000009a1,
@@ -102,8 +102,8 @@
  CLUSTER_NAME=NULL(0 bytes),
  SPACE=0x00000001)
 header=0x000050150074 (NAME='test/tz',
- DB_TRX_ID=0x000000000000,
- DB_ROLL_PTR=0x80000000000000,
+ DB_TRX_ID=0x00000000000d,
+ DB_ROLL_PTR=0x860000013a0110,
  ID=0x0000000000000013,
  N_COLS=0x80000001,
  TYPE=0x00000023,
 
mysqltest: Result content mismatch

Comment by Marko Mäkelä [ 2018-04-08 ]

The problem appears to be that slow shutdown is not always running purge to completion. This can be repeated more easily by applying the patch from the description of MDEV-13697 to the test, and adjusting the result.

./mtr --mysqld=--debug=d,purge --mem --parallel=4 --repeat=10 innodb.dml_purge,4k{,,,}

When it fails, in the failed run there will be messages

?func: purge: reset DB_TRX_ID=0x…

corresponding to the DB_TRX_ID that are shown in the result difference.
These messages were issued after server restart and not during the slow shutdown, as purge is supposed to be run.

This can be repeated with different values of -mysqld=-innodb-purge-threads=1 to 32 (the default is 4).

If I change the test to avoid restart (use wait_all_purged.inc and FLUSH TABLE t1 FOR EXPORT) instead of performing the slow shutdown, it does not fail for me.

Comment by Marko Mäkelä [ 2018-04-09 ]

The problem turns out to be an incorrect check:

@@ -2463,12 +2459,7 @@ srv_purge_should_exit(ulint n_purged)
 		return(true);
 	}
 	/* Slow shutdown was requested. */
-	if (n_purged) {
-		/* The previous round still did some work. */
-		return(false);
-	}
-	/* Exit if there are no active transactions to roll back. */
-	return(trx_sys.any_active_transactions() == 0);
+	return !trx_sys.any_active_transactions() && !trx_sys.history_size();
 }
 
 /*********************************************************************//**

If we happened to have n_purged==0 while some transaction was still active, and then that transaction was added to the history list, we would prematurely stop the purge. It is more appropriate to first check for trx_sys.any_active_transactions() == 0 (that count can only decrease during shutdown) and then for trx_sys.history_size() == 0 (that count typically decreases, but can increase when the remaining active transactions are committed or rolled back). It does not make any sense to check n_purged at all.

Comment by Marko Mäkelä [ 2018-04-09 ]

The issue exists already in MariaDB 5.5:

		do {
			n_pages_purged = trx_purge(srv_purge_batch_size);
 
			n_total_purged += n_pages_purged;
 
		} while (n_pages_purged > 0 && !srv_fast_shutdown);

I think that it suffices to fix this in MariaDB 10.3 only. Thanks to MDEV-12289, MariaDB can upgrade from older versions without a slow shutdown.

Comment by Marko Mäkelä [ 2018-04-12 ]

The issue exists already in MySQL 3.23.49 (the first InnoDB version that I have access to). In the function srv_master_thread(), there are two loops that simply wait for n_pages_purged==0, without first waiting for the number of active transactions to reach 0.

Generated at Thu Feb 08 08:06:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.