Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12288

Reset DB_TRX_ID when the history is removed, to speed up MVCC

Details

    • 10.3.1-2

    Description

      The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning and for determining if a record is implicitly locked. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert).

      When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0.

      The persistent InnoDB undo log is split into two parts: insert_undo and update_undo. The insert_undo log is discarded at transaction commit or rollback, and the update_undo log is processed by the purge subsystem. As part of this change, we must merge the two types of undo logs into one, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is ‘touched’.

      Upgrade considerations

      This will change the persistent InnoDB file formats, not only in the undo log and redo log, but also in the data files. There are some debug assertions that would not allow any record to contain DB_TRX_ID=0.

      A new redo log format tag must be introduced so that the writes of the system columns can be properly redo-logged. (See MDEV-11432, MDEV-11782.) This will prevent a startup of an older version with the new-version redo logs. We may also prevent a crash recovery of MariaDB 10.2 files with the newer version. (Crash recovery of files from 10.1 or earlier versions is already prevented in 10.2.)

      The undo log format will be changed as well. To be able to get rid of legacy code, InnoDB startup should detect if any old-format undo logs are present. If yes, startup will be refused, and the user must perform a slow shutdown (SET GLOBAL innodb_fast_shutdown=0) with the old server in order to empty the undo logs.

      A proof-of-concept implementation for 10.2 consists of 4 consecutive commits. It is missing any of the above-mentioned upgrade logic.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Attachment mysql-5.7.17-MDEV-12288.patch [ 43453 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä added a comment - - edited

            I attached experimental patches to port this to MySQL 5.7.17 for performance evaluation. The second patch also depends on a patch that is attached to MDEV-11585.
            In addition to these patches it could be useful to apply the patch attached to MDEV-12121.

            Based on a quick test with MySQL 5.7.17 and these patches, the setting internal_tmp_disk_storage_engine=MyISAM should be used when testing the patches related to temporary tables. (MariaDB does not support InnoDB as the optimizer-internal storage engine, and MDEV-11487 removed the InnoDB implementation of such tables, leaving only the user-visible possibility to CREATE TEMPORARY TABLE…ENGINE=InnoDB.)

            marko Marko Mäkelä added a comment - - edited I attached experimental patches to port this to MySQL 5.7.17 for performance evaluation. The second patch also depends on a patch that is attached to MDEV-11585 . In addition to these patches it could be useful to apply the patch attached to MDEV-12121 . Based on a quick test with MySQL 5.7.17 and these patches, the setting internal_tmp_disk_storage_engine=MyISAM should be used when testing the patches related to temporary tables. (MariaDB does not support InnoDB as the optimizer-internal storage engine, and MDEV-11487 removed the InnoDB implementation of such tables, leaving only the user-visible possibility to CREATE TEMPORARY TABLE…ENGINE=InnoDB.)
            marko Marko Mäkelä made changes -

            The following tasks are prerequisites for including the patch in a release:

            • Support upgrade from the old undo log format. (Recover transactions from both insert_undo and update_undo logs.)
            • Add consistency checks: Delete-marked records must never ever carry DB_TRX_ID=0 (except in secondary indexes, where DB_TRX_ID does not exist).
            marko Marko Mäkelä added a comment - The following tasks are prerequisites for including the patch in a release: Support upgrade from the old undo log format. (Recover transactions from both insert_undo and update_undo logs.) Add consistency checks: Delete-marked records must never ever carry DB_TRX_ID=0 (except in secondary indexes, where DB_TRX_ID does not exist).
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            bb-10.3-marko passes an upgrade test (see the commit message for details).

            The only thing that remains to be done is redo logging for setting DB_TRX_ID,DB_ROLL_PTR on ROW_FORMAT=COMPRESSED pages. We probably have to introduce a new redo log record type for that.

            marko Marko Mäkelä added a comment - bb-10.3-marko passes an upgrade test (see the commit message for details). The only thing that remains to be done is redo logging for setting DB_TRX_ID,DB_ROLL_PTR on ROW_FORMAT=COMPRESSED pages. We probably have to introduce a new redo log record type for that.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ jplindst ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            Firstly, changes look correct but I have to say most of the critical changes are on code that is not familiar to me in great detail. Secondly, if upgrade tests pass, this code also works, there could be a new test case where both insert and update undo records are produced to persistent storage and then see that crash recovery works correctly (on different page sizes including compressed row format). Ok to push.

            jplindst Jan Lindström (Inactive) added a comment - Firstly, changes look correct but I have to say most of the critical changes are on code that is not familiar to me in great detail. Secondly, if upgrade tests pass, this code also works, there could be a new test case where both insert and update undo records are produced to persistent storage and then see that crash recovery works correctly (on different page sizes including compressed row format). Ok to push.
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            Thank you for the review. I responded to your review comments, and will next implement the remaining redo log format changes.

            marko Marko Mäkelä added a comment - Thank you for the review . I responded to your review comments, and will next implement the remaining redo log format changes.
            marko Marko Mäkelä made changes -
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.3.1-2 [ 174 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Rank Ranked lower

            I implemented a new redo log format version and the MLOG_ZIP_WRITE_TRX_ID record.
            While testing the latter, I found out that we are not resetting the DB_TRX_ID as often as I would expect. Some further revision will be needed. Maybe the upgrade compatibility changes broke it, or maybe it was not fully working. It is hard to test this, because the DB_TRX_ID column is hidden from the SQL layer.

            Nevertheless, the fields do get reset sometimes (during innodb_zip.bug56680 even for ROW_FORMAT=COMPRESSED tables). The implemented file format changes will allow the resetting to be improved later. So, I would push this now, before the file formats are frozen.

            marko Marko Mäkelä added a comment - I implemented a new redo log format version and the MLOG_ZIP_WRITE_TRX_ID record . While testing the latter, I found out that we are not resetting the DB_TRX_ID as often as I would expect. Some further revision will be needed. Maybe the upgrade compatibility changes broke it, or maybe it was not fully working. It is hard to test this, because the DB_TRX_ID column is hidden from the SQL layer. Nevertheless, the fields do get reset sometimes (during innodb_zip.bug56680 even for ROW_FORMAT=COMPRESSED tables). The implemented file format changes will allow the resetting to be improved later. So, I would push this now, before the file formats are frozen.

            When testing the recovery of the added MLOG_ZIP_WRITE_TRX_ID record, I noticed that the system columns are not being reset in every case, such as soon after committing an INSERT.

            I decided to push this nevertheless, so that we will have the necessary file format changes in place. The bug that the history is not always being reset can be fixed later when time permits.

            marko Marko Mäkelä added a comment - When testing the recovery of the added MLOG_ZIP_WRITE_TRX_ID record, I noticed that the system columns are not being reset in every case, such as soon after committing an INSERT. I decided to push this nevertheless, so that we will have the necessary file format changes in place. The bug that the history is not always being reset can be fixed later when time permits.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.3.1 [ 22532 ]
            Fix Version/s 10.3 [ 22126 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            elenst Elena Stepanova made changes -
            elenst Elena Stepanova made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            The resetting of the DB_TRX_ID column was fixed and regression tests added in
            MDEV-13536 DB_TRX_ID is not actually being reset when the history is removed

            marko Marko Mäkelä added a comment - The resetting of the DB_TRX_ID column was fixed and regression tests added in MDEV-13536 DB_TRX_ID is not actually being reset when the history is removed
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            svoj noticed that the function lock_rec_convert_impl_to_expl() was unnecessarily looking up trx_id=0, and acquiring trx_sys->mutex when doing the futile lookup.
            The follow-up fix in 10.3.3 fixes this omission. The initial MDEV-12288 commit already included a corresponding fast-path for the secondary index lock check in the function row_vers_impl_x_locked_low().

            marko Marko Mäkelä added a comment - svoj noticed that the function lock_rec_convert_impl_to_expl() was unnecessarily looking up trx_id=0, and acquiring trx_sys->mutex when doing the futile lookup. The follow-up fix in 10.3.3 fixes this omission. The initial MDEV-12288 commit already included a corresponding fast-path for the secondary index lock check in the function row_vers_impl_x_locked_low().
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            For the record: Due to this change, InnoDB moved to a single persistent undo log. By design, this ought to fix the upstream MySQL Bug #55283, which to my knowledge is still open. The bug should be present in all upstream InnoDB versions at least since MySQL 5.0, where the two-phase commit mechanism was introduced.

            marko Marko Mäkelä added a comment - For the record: Due to this change, InnoDB moved to a single persistent undo log. By design, this ought to fix the upstream MySQL Bug #55283 , which to my knowledge is still open. The bug should be present in all upstream InnoDB versions at least since MySQL 5.0, where the two-phase commit mechanism was introduced.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 80015 ] MariaDB v4 [ 133182 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.