[MDEV-16515] InnoDB: Failing assertion: ++retries < 10000 in file dict0dict.cc line 2737 Created: 2018-06-18 Updated: 2020-07-20 Resolved: 2018-06-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.1 |
| Fix Version/s: | 10.1.35, 10.2.17, 10.3.8 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | regression | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Full threads are attached. Datadir, coredump, binary, and logs are available on perro. |
| Comments |
| Comment by Elena Stepanova [ 2018-06-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It also happened in buildbot:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-06-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In the core dump, we have index->space=3426 and index->page=FIL_NULL, but the adaptive hash index is not empty:
The info->root_guess points to a different tablespace:
Assuming that the info->root_guess points to a page that was valid for the index, the issue would seem to be that table->space points to something else, and the call
in the loop would have no effect. In MariaDB before 10.2, TRUNCATE TABLE could reassign dict_table_t::space. row_truncate_table_for_mysql() does call buf_LRU_drop_page_hash_for_tablespace(table) before that, but it does not check if all the pages were dropped, I/O-fixed or buffer-fixed pages would seem to be skipped. I will try to check the fil_system->space_list and the buffer pool contents, to determine if the problem could have been TRUNCATE TABLE. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-06-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Only the tablespace 3426 exists:
I think that it is plausible that the tablespace (with original ID 3268) was truncated. The size is only 6 pages. The old root_guess page had PAGE_LEVEL=1 and 60 node pointers, and InnoDB tablespaces never shrink outside truncate, so the only way how fil_space_t::size can be only 6 (instead of something bigger than 60) should be TRUNCATE TABLE. I found exactly 56 blocks in the buffer pool with block->index == index, matching info->ref_count. All block->page.space numbers point to a yet different tablespace 2828, which does not exist either. The tablespace ID and the page number in each block->frame match those of the block->page. So, it looks like the tablespace ID in the info->root_guess is incorrect (because the block was used for some other table that was later dropped or rebuilt), and the original tablespace ID before TRUNCATE TABLE was 2828, not 3268. I think that the most plausible explanation for this failure is that TRUNCATE TABLE was executed, and it did not drop all of the adaptive hash index because some pages were I/O-fixed (likely for flushing dirty pages) or buffer-fixed at that time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-06-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In MariaDB 5.5, we would not drop the adaptive hash indexes at all during ALTER TABLE…DISCARD TABLESPACE or TRUNCATE TABLE; we would only warn that they are not empty. In MariaDB 10.0 we would call fil_discard_tablespace() in both cases. The active dropping was added recently in MariaDB 10.1.34 as part of the This failure can be classified as a regression of that fix. I think that the most feasible solution is to fix this in 10.1+ only. I believe that 10.2 could fail in a different way due to TRUNCATE TABLE. Because the tablespace ID would not change, the adaptive hash index would be dropped in the end in this scenario. But we could find false positives when searching for records through the adaptive hash index. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-07-31 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As noted in |