[MDEV-11802] innodb.innodb_bug14676111 fails in buildbot due to InnoDB purge failing to start when there is work to do Created: 2017-01-15 Updated: 2023-02-24 Resolved: 2018-11-01 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Tests |
| Affects Version/s: | 10.1 |
| Fix Version/s: | 10.2.7, 10.3.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | 10.2.4-1, 10.2.4-2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Started happening quite regularly in buildbot on 10.1 tree since January 5, 2017. |
| Comments |
| Comment by Marko Mäkelä [ 2017-01-19 ] | |||||||||||||||||
|
The test has not been changed recently. Also the InnoDB purge code has not been changed recently, as far as I can tell.
As the above comment shows, the tree size really should be 8 pages. I suspect that triggering the purge or waiting for the purge to finish does not work reliably. Similar problems have existed in the MySQL 5.7 tests, in particular innodb.index_merge_threshold (which is missing from MariaDB 10.2). Because this started on January 5, I think that this could be related to my attempt to address | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-09 ] | |||||||||||||||||
|
There was a previous attempt at fixing this in | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-09 ] | |||||||||||||||||
|
I was hoping that Apparently it is still possible that InnoDB purge sometimes gets stuck, or wait_for_innodb_purge.inc is not working reliably. In Oracle MySQL 5.7, another test that is randomly failing due to the same issue is innodb.index_merge_threshold (which is missing from MariaDB 10.2). There were attempts to ‘fix’ the failure by ‘kicking’ the purge threads by issuing DML operations on other InnoDB tables than the one that is being tested. I think that the proper fix is to ensure that purge does not get stuck in the first place. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-12 ] | |||||||||||||||||
|
The test gcol.innodb_virtual_debug_purge that was introduced in | |||||||||||||||||
| Comment by Elena Stepanova [ 2017-02-12 ] | |||||||||||||||||
|
FWIW, please note that so far gcol.innodb_virtual_debug_purge has been failing with a timeout only on embedded server (and it happens often enough). Here is a stack trace from a hanging test:
| |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-13 ] | |||||||||||||||||
|
elenst, I filed | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-16 ] | |||||||||||||||||
|
The function trx_purge_stop() is calling os_event_reset(purge_sys->event) before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set() call in srv_purge_coordinator_suspend() is protected by that X-latch. It would seem a good idea to protect both calls with purge_sys->latch. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-16 ] | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-17 ] | |||||||||||||||||
|
Revision (2 commits, the first adjusting tests only) pushed to bb-10.0-marko (changes to tests) and bb-10.2-marko (changes to tests). Both versions fix the potential race in trx_purge_stop() by acquiring purge_sys->latch before signaling purge_sys->event. | |||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2017-02-20 ] | |||||||||||||||||
|
ok to push after documenting srv_buf_dump_event. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-20 ] | |||||||||||||||||
|
I documented also srv_monitor_event and srv_error_event in 10.0. Events related to mutexes, rw-locks and fulltext indexes were not documented or reviewed by me. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-02-22 ] | |||||||||||||||||
|
The test innodb.innodb_bug14676111 failed on 10.1 on 64-bit Windows today. Unfortunately the purge can still remain stuck, or the test is badly written (does not properly wait for the purge to run into completion). It seems that wait_innodb_all_purged.inc is attempting to wait for the actual removal, by waiting the debug status variable INNODB_PURGE_TRX_ID_AGE to reach zero. I do not see anything obviously wrong in the instrumentation. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-03-24 ] | |||||||||||||||||
|
The 5.5 test innodb.innodb_bug14676111 unnecessarily relies on InnoDB purge. It can simply do BEGIN;INSERT;ROLLBACK to achieve the same result synchronously (the rollback of an insert immediately removes the record from the index B-tree). | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-04-26 ] | |||||||||||||||||
|
This commit in bb-10.2-marko should fix the underlying issues. The fix is needed for the test innodb.truncate_purge_debug introduced in my clean-up of an Oracle bug fix and test. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-04-27 ] | |||||||||||||||||
|
This was a prerequisite for cleaning up a test case for a TRUNCATE performance fix that I merged from MySQL 5.7 as part of We might want to backport this fix to 10.0 and 10.1 later. The impact of this bug is that even when InnoDB is idle or running mostly read-only operations and it is able to purge old history (a sort of garbage collection), it is not doing so. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-05-10 ] | |||||||||||||||||
|
This issue is still not fully fixed. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2017-08-28 ] | |||||||||||||||||
|
Related note from | |||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-01 ] | |||||||||||||||||
|
Possibly related to this: MySQL Bug #75231 records are not purged after a delete operation | |||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-04 ] | |||||||||||||||||
|
While testing my fix for
| |||||||||||||||||
| Comment by Sergey Vojtovich [ 2018-01-25 ] | |||||||||||||||||
|
One of possible culprits can be MVCC::view_open(), specifically this code:
While thread is in this gap, concurrent purge thread may clone stale "oldest" view. That is purge thread won't be able to purge some newer committed transactions while we're in this gap.
| |||||||||||||||||
| Comment by Sergey Vojtovich [ 2018-01-25 ] | |||||||||||||||||
|
OTOH according to gdb innodb.innodb_bug14676111 never calls MVCC::view_open() with view != NULL. So this code should never be executed. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2018-04-09 ] | |||||||||||||||||
|
svoj, please note that I rewrote the test so that it does not rely on purge any more, but on rollback instead. I also worked around possible lost signals by making SHOW ENGINE INNODB STATUS trigger purge. To better analyze this problem, these changes should be reverted. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2018-05-17 ] | |||||||||||||||||
|
As noted in The fix of With | |||||||||||||||||
| Comment by Marko Mäkelä [ 2018-11-01 ] | |||||||||||||||||
|
With the current work-around (SHOW ENGINE INNODB STATUS will initiate a purge), this is not a practical problem any more. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2018-11-01 ] | |||||||||||||||||
|
The wait_all_purged.inc was introduced in | |||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-24 ] | |||||||||||||||||
|
I think that the original problem (that purge fails to remove some history of committed transactions) might be fixed by The CHECK TABLE…EXTENDED implemented in |