Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17598

InnoDB index option for per-record transaction ID

Details

    Description

      In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.

      In multi-versioning, if a secondary index record has been marked for deletion, or if PAGE_MAX_TRX_ID indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.

      As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.

      As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.

      For clustered index records, determining whether a record is implicitly locked is easy: the hidden DB_TRX_ID column will belong to an active (or XA PREPARE; not committed) transaction. For secondary index records it is very expensive because there is no per-record DB_TRX_ID but only a PAGE_MAX_TRX_ID. Therefore the function row_vers_impl_x_locked() has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a blog post I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.

      If we had DB_TRX_ID in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.

      To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by repurposing the field PAGE_MAX_TRX_ID.

      1. PAGE_MAX_TRX_ID = 0: All records in the page are visible for all transactions.
      2. PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF: A DB_TRX_ID will be appended to all records, possibly multiple records per (key_cols,pk_cols) value.
      3. Else: There is no per-record versioning. (The old way.)

      With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Description In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.

            In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.

            As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.

            As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.

            For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post 2 years ago|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.

            If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.

            An indexed column can be modified multiple times during a transaction. If we also include {{DB_ROLL_PTR}} in the secondary index records, MVCC could construct earlier versions of the secondary index record by fetching earlier undo log records.
            In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.

            In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.

            As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.

            As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.

            For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.

            If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.

            An indexed column can be modified multiple times during a transaction. If we also include {{DB_ROLL_PTR}} in the secondary index records, MVCC could construct earlier versions of the secondary index record by fetching earlier undo log records.

            To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by slightly repurposing the field {{PAGE_MAX_TRX_ID}}.
            # {{PAGE_MAX_TRX_ID = 0}}: All records in the page are visible for all transactions.
            # {{PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF}}: All records in the page have {{DB_TRX_ID,DB_ROLL_PTR}}.
            # Else: There is no per-record versioning. (The old way.)

            With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields.
            marko Marko Mäkelä made changes -
            NRE Projects RM_105_CANDIDATE
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Component/s Virtual Columns [ 10803 ]
            Description In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.

            In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.

            As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.

            As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.

            For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.

            If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.

            An indexed column can be modified multiple times during a transaction. If we also include {{DB_ROLL_PTR}} in the secondary index records, MVCC could construct earlier versions of the secondary index record by fetching earlier undo log records.

            To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by slightly repurposing the field {{PAGE_MAX_TRX_ID}}.
            # {{PAGE_MAX_TRX_ID = 0}}: All records in the page are visible for all transactions.
            # {{PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF}}: All records in the page have {{DB_TRX_ID,DB_ROLL_PTR}}.
            # Else: There is no per-record versioning. (The old way.)

            With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields.
            In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.

            In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.

            As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.

            As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.

            For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.

            If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.

            To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by repurposing the field {{PAGE_MAX_TRX_ID}}.
            # {{PAGE_MAX_TRX_ID = 0}}: All records in the page are visible for all transactions.
            # {{PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF}}: A {{DB_TRX_ID}} will be appended to all records, possibly multiple records per (key_cols,pk_cols) value.
            # Else: There is no per-record versioning. (The old way.)

            With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.6 [ 24028 ]
            serg Sergei Golubchik made changes -
            Rank Ranked higher
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Rank Ranked lower
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.6 [ 24028 ]
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels lock performance purge ServiceNow lock performance purge
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow lock performance purge 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z lock performance purge
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z lock performance purge lock performance purge
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 90387 ] MariaDB v4 [ 130935 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 130638 106007
            Zendesk active tickets 201658
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              5 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.