[MDEV-12091] Shutdown fails to wait for rollback of recovered transactions to finish Created: 2017-02-20 Updated: 2017-09-15 Resolved: 2017-03-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 5.5, 10.0, 10.1, 10.2 |
| Fix Version/s: | 10.1.23, 10.2.5, 10.0.31 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ASAN, event, race, shutdown, valgrind | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | 10.2.5-1 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
If InnoDB is still rolling back resurrected transactions on while it is being shut down, it is possible that buf_flush_page_cleaner_thread() has already invoked os_event_free(buf_flush_event) while that event is being signalled:
MariaDB 10.2 should not be able to trigger this error, because it is srv_free() that is invoking os_event_destroy(buf_flush_event). Nevertheless, we should ensure that the page cleaner does not exit while there are dirty pages in the buffer pool. We should also ensure that each subsystem cleanup is done in the main shutdown thread after the threads of the subsystem have exited. Another bad example is btr_defragment_thread that is calling btr_defragment_shutdown(). Apparently the main thread is not at all waiting for that shutdown to finish. |
| Comments |
| Comment by Marko Mäkelä [ 2017-03-10 ] | |||||||
|
buf_flush_event only exists in the 10.1 InnoDB, not in XtraDB, and not 10.0 or 10.2. I did not find any other unsafe os_event_free() than this one. | |||||||
| Comment by Marko Mäkelä [ 2017-03-10 ] | |||||||
|
There is also the problem that we should stop dirtying pages before the last phases of shutdown:
We should definitely wait for the rollback thread to finish, or otherwise some data might become corrupted. Only srv_fast_shutdown=2 (innodb_fast_shutdown=2, crash-like) can skip the wait. | |||||||
| Comment by Marko Mäkelä [ 2017-03-10 ] | |||||||
|
bb-10.0-marko | |||||||
| Comment by Marko Mäkelä [ 2017-03-10 ] | |||||||
|
I was unable to deterministically trigger this in a small test. It could be doable with some additional instrumentation that would pause the background rollback thread. | |||||||
| Comment by Jan Lindström (Inactive) [ 2017-03-13 ] | |||||||
|
ok to push all versions. |