[MDEV-14693] XA: Assertion `!clust_index->online_log' failed in rollback_inplace_alter_table Created: 2017-12-17 Updated: 2018-05-07 Resolved: 2018-05-07 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Storage Engine - XtraDB |
| Affects Version/s: | 10.0, 10.1, 10.2, 10.3 |
| Fix Version/s: | 10.0.36, 10.1.33, 10.2.15, 10.3.7 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | affects-tests, online-ddl | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Sprint: | 10.2.14 | ||||||||||||||||||||||||||||
| Description |
|
Reproducible on all of 10.x. Not reproducible on MariaDB 5.5, MySQL 5.6.39, 5.7.21. |
| Comments |
| Comment by Marko Mäkelä [ 2018-03-07 ] | ||||||||||||||||||||||||||||||
|
The ALGORITHM=INPLACE code in InnoDB assumes that the table is exclusively locked both at ha_innobase::prepare_inplace_alter_table() and at ha_innobase::commit_inplace_alter_table(). As a work-around for missing WL#6049 (proper metadata lock coverage for FOREIGN KEY operations), InnoDB does try to acquire exclusive transactional table locks at commit. And that is the reason why the ALTER TABLE operation goes to rollback:
The above is the code in MariaDB 10.2. After the lock timeout, the SQL layer would invoke ha_innobase::commit_inplace_alter_table(commit=false), and during this operation, the assertion inside InnoDB would fail. This seems to demonstrate poor error handling in InnoDB. The online_log is actually owned by the ALTER TABLE operation, and InnoDB should free it on failure, instead of complaining that it exists. A similar situation should be possible when the table being ALTERed is being locked as a child or parent table of a FOREIGN KEY operation. I will try to create a test case for this. The original test case should be fixed as part of | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-03-07 ] | ||||||||||||||||||||||||||||||
|
I tested the FOREIGN KEY case, and there is no problem with that one: InnoDB will report an error after the exclusive lock acquisition failed at commit, and it would free the online_log as part of the rollback (below is the 10.2 version of the code):
So, this bug seems to share a common root cause with
| ||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2018-05-06 ] | ||||||||||||||||||||||||||||||
|
Yes and no. Yes, metadata lock would've avoided this crash by preventing ha_innobase::commit_inplace_alter_table from being executed at all. But the reason for this bug is different. It's a partitioned InnoDB table with two partitions. Online alter fails for some reason (deadlock in this particular case). The server needs to roll back the changes, it invokes ha_partition::commit_inplace_alter_table(commit=false). It invokes rollback_inplace_alter_table() for the first partition. It frees online log just fine, as you explained above. But then ha_partition invokes rollback_inplace_alter_table() for the second partition. Which does nothing because ctx->trx is NULL. And assert fires, as the online log for the second partition is not freed. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-05-07 ] | ||||||||||||||||||||||||||||||
|
For partitioned tables, all changes should be committed or rolled back when invoking ha_innobase::commit_inplace_alter_table() on the first partition. The MySQL 5.6.11 change to implement this appears to be present in MariaDB 10.0. In 10.0, the test case causes an invocation of ha_innobase::commit_inplace_alter_table(commit=true). This call will fail due to row_merge_lock_table() failing to acquire an exclusive transactional lock on the second partition (p1). Then we get subsequent calls to ha_innobase::commit_inplace_alter_table(commit=false), to roll back the ALTER TABLE operation (one call for each partition). The error is caused because ha_innobase::commit_inplace_alter_table(commit=true) cleared the ctx->trx of other partitions than the first one, and rollback_inplace_alter_table() would skip some necessary steps. It seems simplest to move the freeing of ctx->trx to a later stage. Alternatively, we could make rollback_inplace_alter_table() process all partitions of the table, instead of rolling back changes to each partition in separate commits. I will write a DEBUG_SYNC test case so that the test will not have to be rewritten when the XA MDL bug is fixed. This should be repeatable by issuing a locking read on a partition while an online ALTER TABLE is running. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-05-07 ] | ||||||||||||||||||||||||||||||
|
I wrote a DEBUG_SYNC test case, but it will always result in a MDL timeout, not InnoDB lock timeout. The InnoDB lock timeout can only be reached by exploiting bugs in the MDL. One bug would be the XA scenario in the bug report; another would be to exploit missing MDL coverage for certain FOREIGN KEY operations; however, FOREIGN KEY are not implemented for PARTITION tables.
| ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-05-07 ] | ||||||||||||||||||||||||||||||
|
I also tried to trigger this with duplicate key error in a partitioned table, but it did not crash:
|