Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12288

Reset DB_TRX_ID when the history is removed, to speed up MVCC

Details

    • 10.3.1-2

    Description

      The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning and for determining if a record is implicitly locked. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert).

      When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0.

      The persistent InnoDB undo log is split into two parts: insert_undo and update_undo. The insert_undo log is discarded at transaction commit or rollback, and the update_undo log is processed by the purge subsystem. As part of this change, we must merge the two types of undo logs into one, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is ‘touched’.

      Upgrade considerations

      This will change the persistent InnoDB file formats, not only in the undo log and redo log, but also in the data files. There are some debug assertions that would not allow any record to contain DB_TRX_ID=0.

      A new redo log format tag must be introduced so that the writes of the system columns can be properly redo-logged. (See MDEV-11432, MDEV-11782.) This will prevent a startup of an older version with the new-version redo logs. We may also prevent a crash recovery of MariaDB 10.2 files with the newer version. (Crash recovery of files from 10.1 or earlier versions is already prevented in 10.2.)

      The undo log format will be changed as well. To be able to get rid of legacy code, InnoDB startup should detect if any old-format undo logs are present. If yes, startup will be refused, and the user must perform a slow shutdown (SET GLOBAL innodb_fast_shutdown=0) with the old server in order to empty the undo logs.

      A proof-of-concept implementation for 10.2 consists of 4 consecutive commits. It is missing any of the above-mentioned upgrade logic.

      Attachments

        Issue Links

          Activity

            I implemented a new redo log format version and the MLOG_ZIP_WRITE_TRX_ID record.
            While testing the latter, I found out that we are not resetting the DB_TRX_ID as often as I would expect. Some further revision will be needed. Maybe the upgrade compatibility changes broke it, or maybe it was not fully working. It is hard to test this, because the DB_TRX_ID column is hidden from the SQL layer.

            Nevertheless, the fields do get reset sometimes (during innodb_zip.bug56680 even for ROW_FORMAT=COMPRESSED tables). The implemented file format changes will allow the resetting to be improved later. So, I would push this now, before the file formats are frozen.

            marko Marko Mäkelä added a comment - I implemented a new redo log format version and the MLOG_ZIP_WRITE_TRX_ID record . While testing the latter, I found out that we are not resetting the DB_TRX_ID as often as I would expect. Some further revision will be needed. Maybe the upgrade compatibility changes broke it, or maybe it was not fully working. It is hard to test this, because the DB_TRX_ID column is hidden from the SQL layer. Nevertheless, the fields do get reset sometimes (during innodb_zip.bug56680 even for ROW_FORMAT=COMPRESSED tables). The implemented file format changes will allow the resetting to be improved later. So, I would push this now, before the file formats are frozen.

            When testing the recovery of the added MLOG_ZIP_WRITE_TRX_ID record, I noticed that the system columns are not being reset in every case, such as soon after committing an INSERT.

            I decided to push this nevertheless, so that we will have the necessary file format changes in place. The bug that the history is not always being reset can be fixed later when time permits.

            marko Marko Mäkelä added a comment - When testing the recovery of the added MLOG_ZIP_WRITE_TRX_ID record, I noticed that the system columns are not being reset in every case, such as soon after committing an INSERT. I decided to push this nevertheless, so that we will have the necessary file format changes in place. The bug that the history is not always being reset can be fixed later when time permits.

            The resetting of the DB_TRX_ID column was fixed and regression tests added in
            MDEV-13536 DB_TRX_ID is not actually being reset when the history is removed

            marko Marko Mäkelä added a comment - The resetting of the DB_TRX_ID column was fixed and regression tests added in MDEV-13536 DB_TRX_ID is not actually being reset when the history is removed

            svoj noticed that the function lock_rec_convert_impl_to_expl() was unnecessarily looking up trx_id=0, and acquiring trx_sys->mutex when doing the futile lookup.
            The follow-up fix in 10.3.3 fixes this omission. The initial MDEV-12288 commit already included a corresponding fast-path for the secondary index lock check in the function row_vers_impl_x_locked_low().

            marko Marko Mäkelä added a comment - svoj noticed that the function lock_rec_convert_impl_to_expl() was unnecessarily looking up trx_id=0, and acquiring trx_sys->mutex when doing the futile lookup. The follow-up fix in 10.3.3 fixes this omission. The initial MDEV-12288 commit already included a corresponding fast-path for the secondary index lock check in the function row_vers_impl_x_locked_low().

            For the record: Due to this change, InnoDB moved to a single persistent undo log. By design, this ought to fix the upstream MySQL Bug #55283, which to my knowledge is still open. The bug should be present in all upstream InnoDB versions at least since MySQL 5.0, where the two-phase commit mechanism was introduced.

            marko Marko Mäkelä added a comment - For the record: Due to this change, InnoDB moved to a single persistent undo log. By design, this ought to fix the upstream MySQL Bug #55283 , which to my knowledge is still open. The bug should be present in all upstream InnoDB versions at least since MySQL 5.0, where the two-phase commit mechanism was introduced.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.