Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30753

Possible corruption due to trx_purge_free_segment()




      There is a potential problem if the server is killed amid freeing undo log pages:

      diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
      index f273903ef93..e7f61162dcd 100644
      --- a/storage/innobase/trx/trx0purge.cc
      +++ b/storage/innobase/trx/trx0purge.cc
      @@ -365,6 +365,8 @@ void trx_purge_free_segment(mtr_t &mtr, trx_rseg_t* rseg, fil_addr_t hdr_addr)
       		       + block->frame, &mtr)) {
      +		log_write_up_to(mtr.commit_lsn(), true);
      +		abort();
       		rseg_hdr = trx_rsegf_get(rseg->space, rseg->page_no, &mtr);

      The following scenario would seem to be possible:

      1. InnoDB is killed between that point and the time when the mini-transaction of a subsequent trx_purge_remove_log_hdr() becomes durable.
      2. InnoDB is restarted, and the pages that were freed above are being allocated for something else (further undo log records, or data located in the system tablespace).
      3. Purge attempts to access an invalid page.

      The function trx_purge_free_segment() is also missing calls to log_free_check(), which means that an overrun of the redo log is possible, and the database might become impossible to recover if the server is killed while the function is being executed.

      There is a hint in the source code how this could be fixed:

      	/* We may free the undo log segment header page; it must be freed
      	within the same mtr as the undo log header is removed from the
      	history list: otherwise, in case of a database crash, the segment
      	could become inaccessible garbage in the file space. */
      	trx_purge_remove_log_hdr(rseg_hdr, block, hdr_addr.boffset, &mtr);
      	do {
      		/* Here we assume that a file segment with just the header
      		page can be freed in a few steps, so that the buffer pool
      		is not flooded with bufferfixed pages: see the note in
      		fsp0fsp.cc. */
      	} while (!fseg_free_step(TRX_UNDO_SEG_HDR + TRX_UNDO_FSEG_HEADER
      				 + block->frame, &mtr));

      If we simply call trx_purge_remove_log_hdr() in the first mini-transaction, everything should be safe. Yes, the pages might not be easy to free afterwards, but that is not a problem for those who use multiple innodb_undo_tablespaces and innodb_undo_log_truncate=ON.

      We could also try to free everything in a single mini-transaction, provided that there is sufficient capacity in the redo log and the buffer pool.


        Issue Links



              marko Marko Mäkelä
              marko Marko Mäkelä
              0 Vote for this issue
              2 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.