[MDEV-13637] InnoDB change buffer housekeeping can cause redo log overrun and possibly deadlocks Created: 2017-08-24  Updated: 2022-10-28  Resolved: 2017-08-28

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3
Fix Version/s: 10.0.33, 10.1.27, 10.2.9, 10.3.2

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, create_table, deadlock, recovery

Issue Links:
Relates
relates to MDEV-25267 Reported latching order violation in ... Open
relates to MDEV-29905 Change buffer operations fail to chec... Closed
relates to MDEV-11634 Improve the InnoDB change buffer Closed
relates to MDEV-13485 MTR tests fail massively with --innod... Closed
relates to MDEV-14643 InnoDB: Failing assertion: !cursor->... Closed

 Description   

As reported in MDEV-13485, the function ibuf_remove_free_page() may be called while the caller is holding several mutexes or rw-locks. Because of this, this housekeeping loop may cause performance glitches for operations that involve tables that are stored in the InnoDB system tablespace. Deadlocks are also theoretically possible.

The worst impact of all is that due to the mutexes being held, calls to log_free_check() must be skipped during this housekeeping. This means that the cyclic InnoDB redo log may be overwritten. If the system crashes during this, it would be unable to recover.

The entry point to the problematic code is ibuf_free_excess_pages(). It would make sense to call it before acquiring any mutexes or rw-locks, in any DDL operation and in any 'pessimistic' operation that involves the system tablespace.



 Comments   
Comment by Marko Mäkelä [ 2017-08-24 ]

bb-10.2-marko
I intend to backport this to 10.0 as well.

Comment by Marko Mäkelä [ 2017-08-24 ]

For the record, I cannot repeat the equivalent of the 10.2 test in 10.0:

./mtr --mysqld=--loose-innodb-sync-debug --mem --noreorder innodb_zip.innodb-zip,4k,innodb innodb_zip.wl5522_debug_zip,4k,innodb innodb_zip.wl5522_zip,4k,innodb
./mtr --mysqld=--innodb-page-size=4k --mysqld=--loose-innodb-sync-debug --mem --noreorder innodb_zip.innodb-zip innodb.innodb-wl5522-debug-zip innodb.innodb-wl5522-zip

The reason is that only innodb_zip.innodb-zip in 10.0 allows the innodb_page_size=4k combination.

Comment by Marko Mäkelä [ 2017-08-24 ]

bb-10.0-marko

Comment by Jan Lindström (Inactive) [ 2017-08-25 ]

10.0 ok to push, 10.2 can you try to avoid unnecessary code duplication.

Generated at Thu Feb 08 08:07:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.