[MDEV-21974] InnoDB DML under backup locks make buffer pool usage grow permanently Created: 2020-03-18 Updated: 2022-07-15 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Backup, Storage Engine - InnoDB |
| Affects Version/s: | 10.4, 10.5 |
| Fix Version/s: | 10.4, 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
The test case above, if it's run with innodb_buffer_pool_size=128M (which is default outside MTR) causes InnoDB abort due to >95% buffer pool usage, even though it performs short operations on a relatively small table. It happens reliably for me, but some timing issues are possible, so it may fail with 67% warnings instead.
It appears that by increasing innodb_lock_wait_timeout in the test, it can be made to either exhaust any arbitrary buffer pool or to fail with ER_LOCK_TABLE_FULL. For example, with innodb_buffer_pool_size=512M and innodb_lock_wait_timeout=180 it fails with 67% warnings, while innodb_buffer_pool_size=512M and innodb_lock_wait_timeout=240 cause ER_LOCK_TABLE_FULL (exact values may vary for different machines and builds). Top shows that memory usage in the server keeps growing for the whole time when it waits for the locks, and CPU is at steady ~100%:
I couldn't replace backup locks with regular locks in the scenario, however it's possible that I just didn't find the right combination. |
| Comments |
| Comment by Marko Mäkelä [ 2020-03-18 ] |
|
The stack trace looks like it is performing a full rollback (savept=0). In that case, all explicit locks of the transaction should be released and freed from the buffer pool (BUF_BLOCK_MEMORY). A partial rollback (to the start of a row operation, of a statement, or ROLLBACK TO SAVEPOINT) is not supposed to release explicit locks. The INSERT…SELECT must create a lot of explicit locks on the source table; there is no way around it (except For DELETE and UPDATE we could theoretically use implicit locking (MDEV-16232). For any remaining locking reads, MDEV-16406 could provide help. These would likely be way too big changes to implement in any GA release. I do not think that we avoid calls that create new record lock bitmaps during rollback, because btr_compress() will merge pages and must adjust the explicit locks accordingly. It might be possible to improve the memory management. If mem_heap_t is being used, a notable problem with it is that it is not possible to free and reuse individual allocations. Both mem_heap_empty() and mem_heap_free() are all-or-nothing deals. |
| Comment by Elena Stepanova [ 2020-03-18 ] |
|
Just to clarify in case it wasn't completely clear from the description. The memory usage (the number of locks?) keeps growing even if we only increase the timeout, without increasing the number of INSERT .. SELECT attempts. The table itself is very small, it is only 32K of rather short rows. I don't know how to calculate the amount of memory needed for the locks, but if 256M are exhausted by only four timed out INSERTs, something might be seriously wrong with the usage? (I did get the >95% abort for 256M as well; didn't get it for 512M, because I hit ER_LOCK_TABLE_FULL instead). What it may mean in practice is that users who have any DML executed in parallel with the backup will be hitting this problem, without even big queries, big data or high concurrency. |