[MDEV-13603] innodb_fast_shutdown=0 may fail to purge all history Created: 2017-08-21 Updated: 2023-10-19 Resolved: 2018-04-09 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Tests |
| Affects Version/s: | 5.5, 10.0, 10.1, 10.2, 10.3 |
| Fix Version/s: | 10.3.6 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alice Sherepa | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | upstream, upstream-5.5 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
|
| Comments |
| Comment by Elena Stepanova [ 2017-08-24 ] | |||||||||||||||||||||||||||||||||||||||
|
No special environment required, reproducible by running the test with --repeat
First happened on bb-10.3-marko e7b9c46c0436431f938ed6614f65cf85 on Aug 16th, since then has happened ~20 times on different 10.3-based trees. | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-08-25 ] | |||||||||||||||||||||||||||||||||||||||
|
I believe that this is caused by | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-08-28 ] | |||||||||||||||||||||||||||||||||||||||
|
The test did not SET GLOBAL innodb_purge_rseg_truncate_frequency = 1 Apparently, a slow shutdown will not always run purge to completion. | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-09-28 ] | |||||||||||||||||||||||||||||||||||||||
|
Sporadic failures of this test still occur. Possibly related to this are Valgrind warnings that row_purge_reset_trx_id() is being invoked when node->row contains uninitialized bytes in the key. In the test that I am using (innodb.instant_alter from a | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-10-05 ] | |||||||||||||||||||||||||||||||||||||||
|
Valgrind does not report any errors for innodb.dml_purge or innodb.instant_alter in the latest | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-10-05 ] | |||||||||||||||||||||||||||||||||||||||
|
On innodb_fast_shutdown=0, we would certainly want everything to be purged. | |||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-11-13 ] | |||||||||||||||||||||||||||||||||||||||
|
http://buildbot.askmonty.org/buildbot/builders/kvm-deb-artful-amd64/builds/95/steps/mtr/logs/stdio | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-02-08 ] | |||||||||||||||||||||||||||||||||||||||
|
Also the test innodb.table_flags can occasionally fail due to the same problem:
| |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-08 ] | |||||||||||||||||||||||||||||||||||||||
|
The problem appears to be that slow shutdown is not always running purge to completion. This can be repeated more easily by applying the patch from the description of
When it fails, in the failed run there will be messages
corresponding to the DB_TRX_ID that are shown in the result difference. This can be repeated with different values of - If I change the test to avoid restart (use wait_all_purged.inc and FLUSH TABLE t1 FOR EXPORT) instead of performing the slow shutdown, it does not fail for me. | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-09 ] | |||||||||||||||||||||||||||||||||||||||
|
The problem turns out to be an incorrect check:
If we happened to have n_purged==0 while some transaction was still active, and then that transaction was added to the history list, we would prematurely stop the purge. It is more appropriate to first check for trx_sys.any_active_transactions() == 0 (that count can only decrease during shutdown) and then for trx_sys.history_size() == 0 (that count typically decreases, but can increase when the remaining active transactions are committed or rolled back). It does not make any sense to check n_purged at all. | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-09 ] | |||||||||||||||||||||||||||||||||||||||
|
The issue exists already in MariaDB 5.5:
I think that it suffices to fix this in MariaDB 10.3 only. Thanks to | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-12 ] | |||||||||||||||||||||||||||||||||||||||
|
The issue exists already in MySQL 3.23.49 (the first InnoDB version that I have access to). In the function srv_master_thread(), there are two loops that simply wait for n_pages_purged==0, without first waiting for the number of active transactions to reach 0. |