[MDEV-17813] Crash in instant ALTER TABLE due to purge concurrently emptying table Created: 2018-11-23 Updated: 2019-12-10 Resolved: 2018-11-23 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.4.0 |
| Fix Version/s: | 10.4.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | corruption, crash, instant, purge | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
The following deterministic test case reproduces a problem that was reported in
This does not crash in 10.3, so
The crash looks like this:
In order to fix this, we must rebuild the ctx->instant_table metadata in innobase_instant_try() if ctx->old_table->is_instant() != ctx->old_was_instant. And while doing that, we must already be holding an exclusive latch on the leftmost leaf page, so that purge cannot empty the table. This implies that the following code has to be moved earlier in the function:
Furthermore, there appears to be a possible race condition earlier during the ALTER TABLE. Both during instant_alter_column_possible() and ctx->prepare_instant(), a purge thread could invoke dict_index_t::clear_instant_add() when the table becomes empty. That should not be a problem in Because of partitioned tables, I think that each inplace_alter function in ha_innobase must acquire and release index->lock independently. I do not see any way to avoid rebuilding the the ctx->instant_table metadata. |
| Comments |
| Comment by Marko Mäkelä [ 2018-11-23 ] | |||
|
Purge is not holding index->lock here, so I think that we must acquire the root page latch in order to protect index->n_core_fields and index->n_fields during instant ALTER TABLE:
| |||
| Comment by Marko Mäkelä [ 2018-11-23 ] | |||
|
Purge should never reduce index->n_fields or touch index->fields, because it should only empty the root page when !index->table->instant. I will add a debug assertion to row_purge_remove_clust_if_poss_low() to ensure this. The assertion did not fail when running tests in mysql-test-run. Therefore, the race condition should be limited to index->n_core_fields and index->n_core_null_bytes. It should suffice to copy these values from ctx->old_table to ctx->instant_table in innobase_instant_try() after acquiring the leftmost leaf page latch. There still is the question whether dict_index_t::clear_instant_alter() could clear index->table->instant in the rollback of a recovered instant ALTER TABLE transaction, while a new instant ALTER TABLE is executing on the same table. The answer is no, because InnoDB only allows at most one DDL transaction to be active at a time, and it would roll back any incomplete DDL transaction before letting the server accept any client connections. If this were not the case, we should acquire and release a shared or exclusive InnoDB table lock before invoking ctx->prepare_instant(), to ensure that no conflicting transactions exist. | |||
| Comment by Marko Mäkelä [ 2018-11-23 ] | |||
|
At the start of innobase_instant_try() we are updating the InnoDB internal data dictionary. At this time, it is not possible to hold any latch in the user table, so index->n_core_fields would necessarily be unprotected. But, we can acquire the page latch before updating the InnoDB dictionary and commit the mini-transaction before updating the data dictionary tables. Is there any possibility of race condition during dict_table_t::rollback_instant(), that is, when an instant ALTER TABLE operation is rolled back? Purge would remove the metadata record unless it was for the |