[MDEV-29593] Purge misses a chance to free not-yet-reused undo pages Created: 2022-09-21 Updated: 2024-01-10 Resolved: 2023-04-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11 |
| Fix Version/s: | 11.1.1, 10.11.3, 11.0.2, 10.6.13, 10.8.8, 10.9.6, 10.10.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | performance | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Description |
|
While reviewing some code to prepare for In the case where the purge coordinator task invokes trx_purge_remove_log_hdr() outside of trx_purge_free_segment(), it would be possible to free the undo log page, provided that nothing was appended to the page after the purged records. Undo log pages may contain records for many unrelated transactions. If no "reuse of cached undo pages" took place, we could simply free the undo page. This could avoid some page writes and improve the performance of backup and recovery. It could also avoid unnecessary growth of the system tablespace or the temporary tablespace when they store undo log pages. |
| Comments |
| Comment by Marko Mäkelä [ 2023-03-24 ] | ||||||||||||||
|
The logic for this is a little tricky. I think that we can do this in the case when we cannot invoke trx_purge_free_segment() because the state is not TRX_UNDO_TO_PURGE but TRX_UNDO_CACHED and all other conditions (which were recently revised in In this case, we should find exactly one entry in rseg.undo_cached that points to the last record in the undo log page. That entry as well as the TRX_RSEG_UNDO_SLOTS pointer to the cached undo page will have to be removed. The rseg.undo_cached is also being used for temporary undo logs (see trx_undo_assign_low() and trx_undo_reuse_cached()), but there is no purge of history or MVCC for temporary tables. For temporary tables, we could modify trx_undo_commit_cleanup() so that it will never add anything to rseg.undo_cached and instead just free the undo page after commit. | ||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-27 ] | ||||||||||||||
|
I figured out that we can actually free the entire segment and not just the last page. My attempts at reproducing the impact of this change on the size of the undo log tablespaces have failed so far. For | ||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-21 ] | ||||||||||||||
|
According to one brave user who deployed on some replicas the version that I posted above, this seems to help prevent the undo tablespace growth and the need of innodb_undo_log_truncate=ON. In the early patch that I linked in my previous comment, there are two problems. It occasionally detects a missed opportunity to free undo pages:
This is a debug assertion, and it is there on purpose so that I would get examples of that. mleich provided a simple one. At the time the assertion fails, everything else except undo->hdr_offset matches. In the cached undo log, we have an undo->hdr_offset of 2796. The hdr_addr.boffset as well as the undo page header contents shows 120 at this time. When the 2796 was stored in undo->hdr_offset, the undo log records in the page had not been purged yet. The transaction for which the tentative reservation was made, was rolled back later, maybe due to a deadlock or a duplicate key error. I think that we must remove the condition undo->hdr_offset == hdr_addr.boffset. I need to think about the conditions on the page number. Maybe one of them would suffice. The debug assertion failure itself is ‘harmless’ here, that is, if we do not goto free_segment, the undo log segment will not be freed, just like the logic was before this fix. Another problem with the experimental patch was that the rseg_hdr at the end of the function is being restored in an unsafe way. If the block had been evicted from the buffer pool between the mtr.commit() and the reacquisition of the rseg_hdr->page.lock, we would continue operating on a totally wrong page. I think that this problem only affects systems with an extremely small buffer pool. | ||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-21 ] | ||||||||||||||
|
After analyzing a little more, I believe that the following check should be sufficient:
I analyzed a trace where undo->top_page_no was different. The last access to that page was in purge_coordinator_callback() in the same thread, in trx_purge_get_next_rec(). Before this part of the code, the |