[MDEV-28415] ALTER TABLE on a large table hangs InnoDB Created: 2022-04-26  Updated: 2022-04-27  Resolved: 2022-04-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.2.44, 10.3.35, 10.4.25

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: not-10.5

Issue Links:
PartOf
is part of MDEV-28417 Merge new release of InnoDB 5.7.38 to... Closed
Relates
relates to MDEV-14550 Error log flood : "InnoDB: page_clea... Closed
relates to MDEV-16809 Allow full redo logging for ALTER TABLE Closed
relates to MDEV-23399 10.5 performance regression with IO-b... Closed

 Description   

MySQL 5.7.38 includes the following change without a test case:
Bug #33101844 CREATE FULLTEXT INDEX CRASHES SERVER FOR LARGE TABLE
As far as I understand the description, any CREATE INDEX or table-rebuilding ALTER TABLE operation could exhaust the buffer pool by unnecessarily holding exclusive latches on non-leaf index pages, preventing those pages from being written out.



 Comments   
Comment by Marko Mäkelä [ 2022-04-26 ]

The change from MySQL is triggering an assertion failure in mtr_t::memo_modify_page(), catching an attempted modification of a non-leaf page that is not associated with the mini-transaction. The change must be reworked.

Comment by Marko Mäkelä [ 2022-04-26 ]

The function PageBulk::release() is buffer-fixing the pages before releasing the exclusive latches on them, so even though the page latches will be released, the pages will be pinned in the buffer pool. They may be written out, but not relocated or evicted.

Based on the limited information available, it seems possible to me that the MySQL 5.7 hang could have been fixed in MariaDB Server 10.5.7 by MDEV-23399, which made the checkpoint flushing (the single buf_flush_page_cleaner thread) skip pages on which a latch cannot be acquired immediately.

Comment by Marko Mäkelä [ 2022-04-26 ]

The MySQL commit message mentions that the page cleaner threads hang. That is something related to MDEV-14550 and something that was fixed in MariaDB 10.5.7 by MDEV-23399.

In the end, I believe that it is simplest to remove the page latch wait from the page cleaners (essentially port a small part of MDEV-23399 to earlier versions) and leave the DDL code alone. In that way, there should be no risk of breaking the safety of DDL operations (MDEV-16809). That change should also help other scenarios where a large number of pages are concurrently exclusively latched.

Generated at Thu Feb 08 10:00:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.