[MDEV-29593] Purge misses a chance to free not-yet-reused undo pages - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
Fix Version/s: 11.1.1, 10.11.3, 11.0.2, 10.6.13, 10.8.8, 10.9.6, 10.10.4
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

While reviewing some code to prepare for ~~MDEV-24402~~, I came across the following.

In the case where the purge coordinator task invokes trx_purge_remove_log_hdr() outside of trx_purge_free_segment(), it would be possible to free the undo log page, provided that nothing was appended to the page after the purged records.

Undo log pages may contain records for many unrelated transactions. If no "reuse of cached undo pages" took place, we could simply free the undo page. This could avoid some page writes and improve the performance of backup and recovery. It could also avoid unnecessary growth of the system tablespace or the temporary tablespace when they store undo log pages.

Attachments

Issue Links

causes

MDEV-33137 Assertion `end_lsn == page_lsn' failed in recv_recover_page

Closed

is blocked by

MDEV-30671 innodb_undo_log_truncate=ON fails to wait for purge of transaction history

Closed

relates to

MDEV-31382 SET GLOBAL innodb_undo_log_truncate=ON does not free space when no undo logs exist

Closed

MDEV-21952 ibdata1 file size growing in MariaDB

Closed

MDEV-24402 CHECK TABLE may miss some cases of index inconsistencies

Closed

MDEV-26782 InnoDB temporary tablespace: reclaiming of free space does not work

Closed

MDEV-29401 InnoDB history list length increased in 10.6 compared to 10.5 for the same load

Closed

(2 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2023-03-24 10:37

The logic for this is a little tricky. I think that we can do this in the case when we cannot invoke trx_purge_free_segment() because the state is not TRX_UNDO_TO_PURGE but TRX_UNDO_CACHED and all other conditions (which were recently revised in ~~MDEV-30671~~) hold.

In this case, we should find exactly one entry in rseg.undo_cached that points to the last record in the undo log page. That entry as well as the TRX_RSEG_UNDO_SLOTS pointer to the cached undo page will have to be removed.

The rseg.undo_cached is also being used for temporary undo logs (see trx_undo_assign_low() and trx_undo_reuse_cached()), but there is no purge of history or MVCC for temporary tables. For temporary tables, we could modify trx_undo_commit_cleanup() so that it will never add anything to rseg.undo_cached and instead just free the undo page after commit.

Marko Mäkelä added a comment - 2023-03-24 10:37 The logic for this is a little tricky. I think that we can do this in the case when we cannot invoke trx_purge_free_segment() because the state is not TRX_UNDO_TO_PURGE but TRX_UNDO_CACHED and all other conditions (which were recently revised in MDEV-30671 ) hold. In this case, we should find exactly one entry in rseg.undo_cached that points to the last record in the undo log page. That entry as well as the TRX_RSEG_UNDO_SLOTS pointer to the cached undo page will have to be removed. The rseg.undo_cached is also being used for temporary undo logs (see trx_undo_assign_low() and trx_undo_reuse_cached() ), but there is no purge of history or MVCC for temporary tables. For temporary tables, we could modify trx_undo_commit_cleanup() so that it will never add anything to rseg.undo_cached and instead just free the undo page after commit.

Marko Mäkelä added a comment - 2023-03-27 14:44

I figured out that we can actually free the entire segment and not just the last page.

My attempts at reproducing the impact of this change on the size of the undo log tablespaces have failed so far. For ~~MDEV-26782~~ it was straightforward, because I tested speed and not the file size. I did measure some speed improvement for this change, but more testing will be needed.

Marko Mäkelä added a comment - 2023-03-27 14:44 I figured out that we can actually free the entire segment and not just the last page. My attempts at reproducing the impact of this change on the size of the undo log tablespaces have failed so far. For MDEV-26782 it was straightforward, because I tested speed and not the file size. I did measure some speed improvement for this change, but more testing will be needed.

Marko Mäkelä added a comment - 2023-04-21 07:12

According to one brave user who deployed on some replicas the version that I posted above, this seems to help prevent the undo tablespace growth and the need of innodb_undo_log_truncate=ON.

In the early patch that I linked in my previous comment, there are two problems. It occasionally detects a missed opportunity to free undo pages:

    case TRX_UNDO_CACHED:

      /* rseg.undo_cached must point to this page */

      trx_undo_t *undo= UT_LIST_GET_FIRST(rseg.undo_cached);

      for (; undo; undo= UT_LIST_GET_NEXT(undo_list, undo))

        if (undo->top_page_no == hdr_addr.page &&

            undo->hdr_page_no == hdr_addr.page &&

            undo->hdr_offset == hdr_addr.boffset)

          goto found_cached;

      ut_ad("inconsistent undo logs" == 0);

      break;

    found_cached:

      UT_LIST_REMOVE(rseg.undo_cached, undo);

…

      goto free_segment;

This is a debug assertion, and it is there on purpose so that I would get examples of that. mleich provided a simple one. At the time the assertion fails, everything else except undo->hdr_offset matches. In the cached undo log, we have an undo->hdr_offset of 2796. The hdr_addr.boffset as well as the undo page header contents shows 120 at this time. When the 2796 was stored in undo->hdr_offset, the undo log records in the page had not been purged yet. The transaction for which the tentative reservation was made, was rolled back later, maybe due to a deadlock or a duplicate key error.

I think that we must remove the condition undo->hdr_offset == hdr_addr.boffset. I need to think about the conditions on the page number. Maybe one of them would suffice.

The debug assertion failure itself is ‘harmless’ here, that is, if we do not goto free_segment, the undo log segment will not be freed, just like the logic was before this fix.

Another problem with the experimental patch was that the rseg_hdr at the end of the function is being restored in an unsafe way. If the block had been evicted from the buffer pool between the mtr.commit() and the reacquisition of the rseg_hdr->page.lock, we would continue operating on a totally wrong page. I think that this problem only affects systems with an extremely small buffer pool.

Marko Mäkelä added a comment - 2023-04-21 07:12 According to one brave user who deployed on some replicas the version that I posted above, this seems to help prevent the undo tablespace growth and the need of innodb_undo_log_truncate=ON . In the early patch that I linked in my previous comment, there are two problems. It occasionally detects a missed opportunity to free undo pages: case TRX_UNDO_CACHED: /* rseg.undo_cached must point to this page */ trx_undo_t *undo= UT_LIST_GET_FIRST(rseg.undo_cached); for (; undo; undo= UT_LIST_GET_NEXT(undo_list, undo)) if (undo->top_page_no == hdr_addr.page && undo->hdr_page_no == hdr_addr.page && undo->hdr_offset == hdr_addr.boffset) goto found_cached; ut_ad( "inconsistent undo logs" == 0); break ; found_cached: UT_LIST_REMOVE(rseg.undo_cached, undo); … goto free_segment; This is a debug assertion, and it is there on purpose so that I would get examples of that. mleich provided a simple one. At the time the assertion fails, everything else except undo->hdr_offset matches. In the cached undo log, we have an undo->hdr_offset of 2796. The hdr_addr.boffset as well as the undo page header contents shows 120 at this time. When the 2796 was stored in undo->hdr_offset , the undo log records in the page had not been purged yet. The transaction for which the tentative reservation was made, was rolled back later, maybe due to a deadlock or a duplicate key error. I think that we must remove the condition undo->hdr_offset == hdr_addr.boffset . I need to think about the conditions on the page number. Maybe one of them would suffice. The debug assertion failure itself is ‘harmless’ here, that is, if we do not goto free_segment , the undo log segment will not be freed, just like the logic was before this fix. Another problem with the experimental patch was that the rseg_hdr at the end of the function is being restored in an unsafe way. If the block had been evicted from the buffer pool between the mtr.commit() and the reacquisition of the rseg_hdr->page.lock , we would continue operating on a totally wrong page. I think that this problem only affects systems with an extremely small buffer pool.

Marko Mäkelä added a comment - 2023-04-21 10:00

After analyzing a little more, I believe that the following check should be sufficient:

      trx_undo_t *undo= UT_LIST_GET_FIRST(rseg.undo_cached);

      for (; undo; undo= UT_LIST_GET_NEXT(undo_list, undo))

        if (undo->hdr_page_no == hdr_addr.page)

          goto found_cached;

      ut_ad("inconsistent undo logs" == 0);

      break;

    found_cached:

I analyzed a trace where undo->top_page_no was different. The last access to that page was in purge_coordinator_callback() in the same thread, in trx_purge_get_next_rec(). Before this part of the code, the ~~MDEV-30671~~ logic is already checking that the entire undo log segment is safe to remove. Basically the only difference in this fix is that we allow not only fully processed TRX_UNDO_TO_PURGE but also fully-processed-and-not-yet-reused TRX_UNDO_CACHED undo log segments to be freed.

Marko Mäkelä added a comment - 2023-04-21 10:00 After analyzing a little more, I believe that the following check should be sufficient: trx_undo_t *undo= UT_LIST_GET_FIRST(rseg.undo_cached); for (; undo; undo= UT_LIST_GET_NEXT(undo_list, undo)) if (undo->hdr_page_no == hdr_addr.page) goto found_cached; ut_ad( "inconsistent undo logs" == 0); break ; found_cached: I analyzed a trace where undo->top_page_no was different. The last access to that page was in purge_coordinator_callback() in the same thread, in trx_purge_get_next_rec() . Before this part of the code, the MDEV-30671 logic is already checking that the entire undo log segment is safe to remove. Basically the only difference in this fix is that we allow not only fully processed TRX_UNDO_TO_PURGE but also fully-processed-and-not-yet-reused TRX_UNDO_CACHED undo log segments to be freed.

MariaDB Server

Purge misses a chance to free not-yet-reused undo pages

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration