[MDEV-16124] fil_rename_tablespace() times out and crashes server during table-rebuilding ALTER TABLE Created: 2018-05-09 Updated: 2022-08-10 Resolved: 2018-06-06 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Storage Engine - XtraDB |
| Affects Version/s: | 10.0, 10.1, 10.2.14, 10.2, 10.3 |
| Fix Version/s: | 10.0.36, 10.1.34, 10.2.16, 10.3.8 |
| Type: | Bug | Priority: | Major |
| Reporter: | Steve | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | upstream | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
We are getting periodic crashes on our staging server, whilst adding a column and an index. to a table. There are about 200 databases in the server and these are updated to our latest development schemas nightly ready for testing. We will typically get a crash every other day or so. The database that fails varies, but each typically has several hundred thousand rows. The table update that triggers the crash is always:
Immediately before this alter table we update pretty much all the rows the table. which I suspect may be relevant,
In the mysql error log we see:
followed by
The engine innodb status is the dumped. This is attached. We have experimented with setting the following, but this has has no affect.
|
| Comments |
| Comment by Steve [ 2018-05-09 ] | ||||||||||||||||||||||||||||
|
The server crash occurs several minutes later:
Crash log attached. | ||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2018-06-01 ] | ||||||||||||||||||||||||||||
|
Thanks for the report. Test case from https://bugs.mysql.com/bug.php?id=84762 (with minor adjustments)
(first modifier was added to make it also fail on 10.3).
| ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-06-05 ] | ||||||||||||||||||||||||||||
|
By the design of ALTER TABLE…ALGORITHM=INPLACE, the post-commit adjustments to the InnoDB dictionary cache must succeed. Anything that can fail should be caught before the operation is committed inside InnoDB. I would primarily try to find out why any of these conditions hold for such a long time:
wlad, do we actually need this code? Is there really any problem to rename a file on Windows while the same process is accessing it via open handles? If this retry logic can be removed, then I think that also the fault injection that elenst exercised can be removed. | ||||||||||||||||||||||||||||
| Comment by Shaohua Wang [ 2022-08-10 ] | ||||||||||||||||||||||||||||
|
Marko, I was able to find out the root cause finally. please refer to: https://bugs.mysql.com/bug.php?id=108087 there is a workaroud: turn off chagne buffer. the fix is simple:
| ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-08-10 ] | ||||||||||||||||||||||||||||
|
tiandiwonder, thank you. The I/O layer in MariaDB’s InnoDB storage engine is simpler compared to MySQL, thanks to wlad. Your patch to fil_rename_tablespace() could fix the problem in MySQL. Our fix removed the retry loop from that function altogether and relaxed the file locking on Microsoft Windows so that InnoDB data files may be renamed while they are open for writing. That is, there is no need to fsync() and close() the file before renaming. Another anomaly due to unoptimal use of Microsoft Windows APIs was that synchronous and asynchronous I/O calls could not be used on the same file. So, InnoDB first opened files in synchronous mode, then closed and reopened for asynchronous I/O. wlad simplified this in MariaDB 5.5 already. |