In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.
In multi-versioning, if a secondary index record has been marked for deletion, or if PAGE_MAX_TRX_ID indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.
As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.
As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.
For clustered index records, determining whether a record is implicitly locked is easy: the hidden DB_TRX_ID column will belong to an active (or XA PREPARE; not committed) transaction. For secondary index records it is very expensive because there is no per-record DB_TRX_ID but only a PAGE_MAX_TRX_ID. Therefore the function row_vers_impl_x_locked() has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a blog post I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.
If we had DB_TRX_ID in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.
An indexed column can be modified multiple times during a transaction. If we also include DB_ROLL_PTR in the secondary index records, MVCC could construct earlier versions of the secondary index record by fetching earlier undo log records.
To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by slightly repurposing the field PAGE_MAX_TRX_ID.
- PAGE_MAX_TRX_ID = 0: All records in the page are visible for all transactions.
- PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF: All records in the page have DB_TRX_ID,DB_ROLL_PTR.
- Else: There is no per-record versioning. (The old way.)
With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields.