[MDEV-25655] Atomic DDL: InnoDB: Cannot replay rename of tablespace, error Data structure corruption Created: 2021-05-11 Updated: 2021-05-12 Resolved: 2021-05-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | N/A |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
Two examples follow. I have a few more if needed. Case 1
In the case above, the test was running concurrent DDL in 4 threads and was sigkilled in the process. The last executed statements in each thread, according to the general log, were
The logs, datadirs and rr profiles are available. Case 2
In the case above, the test was running various DDL in a single thread in PS protocol, got sigkilled upon running a query, and failed with the error above upon recovery. I think I've seen similar failures on previous releases without atomic DDL, so it's possibly not anything new, but I would expect it to be gone with atomic DDL. Strangely, I didn't see them in previous tests on atomic DDL, but now they have suddenly appeared in bulk. |
| Comments |
| Comment by Marko Mäkelä [ 2021-05-12 ] | ||||||||||||||||||||
|
I can repeat the recovery failure of atomic_ddl/
In the output produced by --debug=d,ib_log, we see quite a few records for that tablespace 81. The file test/table75_innodb_int_autoinc.ibd consists of 622,592 NUL bytes before and after recovery. Also the file test/tt6.ibd initially consists of 212,992 NUL bytes, but after recovery (both in the original run and on my local rerun) I see the tablespace ID 432 in it. One thinkable fix might be to allow the rename to happen, with the motivation that both files were garbage to begin with. I will try to determine the exact sequence of relevant events and come up with a less intrusive fix. The last durable commit of the server before the SIGKILL was for DROP /* QNO 1101 CON_ID 28 */ TABLE `table53_innodb_int_autoinc. | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-05-12 ] | ||||||||||||||||||||
|
Before the SIGKILL, the server executed 2 related DDL statements (and many unrelated ones):
The RENAME was rolled back on the high level. In InnoDB, not only the rename of table75_innodb_int_autoinc to tt6 was committed, but also the rename of tt6 back to table75_innodb_int_autoinc was committed. At the time of the SIGKILL, both files contain only NUL bytes.
I tried a few things to ensure that both files would be recovered as filled with NUL bytes, but my reduced test did not fail for me:
| ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-05-12 ] | ||||||||||||||||||||
|
This is an anomaly of the refined | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-05-12 ] | ||||||||||||||||||||
|
root cause analysis and fix (in the |