[MDEV-13051] MySQL#86607 InnoDB crash after failed ADD INDEX and table_definition_cache eviction Created: 2017-06-10 Updated: 2019-04-30 Resolved: 2017-10-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.0, 10.1, 10.2 |
| Fix Version/s: | 10.0.33, 10.1.29, 10.2.10, 10.3.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | upstream | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Re-filing the upstream bug on behalf of the original reporter Jianwei Zhao.
|
| Comments |
| Comment by Elena Stepanova [ 2017-06-10 ] | ||||||||||||
|
It fails for me easily with and without injection (maybe my machine is slow enough not to need an extra sleep). So, I recommend trying without injection first, and if it doesn't work, then apply it. On MySQL 5.7 (also without the injection) I'm getting the assertion failure:
On MariaDB so far it has always been SIGSEGV. | ||||||||||||
| Comment by jianwei zhao [ 2017-10-16 ] | ||||||||||||
|
Upload the fixed DIFF file from AliSQL. fix.diff | ||||||||||||
| Comment by Marko Mäkelä [ 2017-10-16 ] | ||||||||||||
|
jianwei zhao, thank you for reporting the bug that was introduced by me in MySQL 5.6.6. I believe that a simpler fix would be to compare table->n_ref_count to 0 instead of 1, because we already decremented the count, and we are interested in whether the last table handle is being released. I will try to rewrite the test case with DEBUG_SYNC, because sleeps can make regression tests nondeterministic and unnecessarily slow. | ||||||||||||
| Comment by jianwei zhao [ 2017-10-16 ] | ||||||||||||
|
Marko, table->n_ref_count == 0 can solve the problem, and looks like that it make more sense. | ||||||||||||
| Comment by Marko Mäkelä [ 2017-10-16 ] | ||||||||||||
|
With a patch applied, the test still occasionally fails.
What makes the difference between crash or no crash is table->drop_aborted. If it is set, we will obviously fail later in dict_table_remove_from_cache_low() when invoking row_merge_drop_indexes() to the drop SYS_INDEXES entries for the orphan indexes. Obviously, because the indexes were just freed by the same function. It looks like this has always been broken, and the bug was introduced in WL#5526 online ADD INDEX by me. | ||||||||||||
| Comment by Marko Mäkelä [ 2017-10-16 ] | ||||||||||||
|
I cannot find any interface where InnoDB would report the UT_LIST_GET_LEN(dict_sys->table_LRU) to the client. It is neither in SHOW ENGINE INNODB STATUS nor in SHOW VARIABLES (export_vars). Therefore, it will be impossible to write a deterministic test for this. Only the first part (error during ADD UNIQUE INDEX) can be deterministically triggered by using DEBUG_SYNC. | ||||||||||||
| Comment by jianwei zhao [ 2017-10-16 ] | ||||||||||||
|
I think the reason of occasional crash is that it released the dict_sys->mutex after drop_aborted is judged TRUE in dict_table_close(). then master thread dict_index_remove_from_cache_low() is prior of dict_table_try_drop_aborted(), so it crashed. as the comment described, the indexes should be dropped when dict_table_close(), but occasionally it is dropped by master thread. so another fix plan is that hold the dict_sys->mutex if drop_aborted== TRUE until dict_table_try_drop_aborted() complete. | ||||||||||||
| Comment by Marko Mäkelä [ 2019-04-30 ] | ||||||||||||
|
jianwei zhao, sorry, I missed your last comment. It looks like I pushed my fix around the same time as you wrote it. I am now revisiting this, because I believe that I solved the second problem by modifying dict_table_remove_from_cache_low(). That part is missing from the AliSQL bug fix. |