[MDEV-23514] SEGV storage/innobase/row/row0log.cc:863 in row_log_table_low Created: 2020-08-19 Updated: 2020-10-06 Resolved: 2020-08-20 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2.34 |
| Fix Version/s: | 10.2.35 |
| Type: | Bug | Priority: | Major |
| Reporter: | Matthias Leich | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | rr-profile-analyzed | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
|
| Comments |
| Comment by Matthias Leich [ 2020-08-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-08-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is a race condition between a DML operation and a rollback of the ALTER TABLE. The log was freed here:
At this point of time, the dict_operation_lock is held by Thread 30 (which is rolling back the online ALTER TABLE). Before In row_log_table_low(), we have the following debug assertion:
In innobase_online_rebuild_log_free() we are acquiring that latch in exclusive mode. This proves that the field should be protected by dict_index_t::lock. The function row_undo_ins_remove_clust_rec() is correctly re-checking the attribute while holding dict_index_t::lock, but row_undo_mod_clust() was missing that. It is also duplicating the above mentioned debug assertion. I believe that the following should adequately fix this bug:
Online creation of secondary indexes should be unaffected by this. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-08-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that secondary index creation is affected by this in a way that is much harder to reproduce. While row_merge_drop_indexes() would not invoke dict_index_remove_from_cache() on normal indexes as long as transactions exist that hold locks on the table, it would invoke that on FULLTEXT INDEX objects. Luckily, fulltext index creation is not allowed ONLINE (LOCK=NONE), and the ALTER TABLE operation would be unable to start in prepare_inplace_alter_table_dict() as long as any transactions (recovered or not) hold locks on the table:
Because we are skipping the table lock acquisition for online operations, we seem to have a race condition between the rollback of recovered transactions and the online creation of a secondary index. The dict_table_t::indexes could be modified while the rollback is executing. I think that the easiest way to block this race condition is to ensure that both trx_rollback_active() and innobase_rollback_by_xid() will hold dict_operation_lock at least in shared mode when processing each undo log record of a recovered transaction. To incur a minimum overhead when we are rolling back large transactions, we’d better acquire and release the latch at the low level, where we used to do it for all transactions before
It would be challenging to write a reasonably deterministic test case for this, because DEBUG_SYNC cannot be directly activated for background threads. For the failure that is revealed by MDEV-23514.tgz | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-08-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This regression was never merged to other branches and never part of any release; it was only present in 10.2. |