Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.
In multi-versioning, if a secondary index record has been marked for deletion, or if PAGE_MAX_TRX_ID indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.
As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.
As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.
For clustered index records, determining whether a record is implicitly locked is easy: the hidden DB_TRX_ID column will belong to an active (or XA PREPARE; not committed) transaction. For secondary index records it is very expensive because there is no per-record DB_TRX_ID but only a PAGE_MAX_TRX_ID. Therefore the function row_vers_impl_x_locked() has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a blog post I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.
If we had DB_TRX_ID in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.
To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by repurposing the field PAGE_MAX_TRX_ID.
- PAGE_MAX_TRX_ID = 0: All records in the page are visible for all transactions.
- PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF: A DB_TRX_ID will be appended to all records, possibly multiple records per (key_cols,pk_cols) value.
- Else: There is no per-record versioning. (The old way.)
With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields.
Attachments
Issue Links
- blocks
-
MDEV-11634 Improve the InnoDB change buffer
- Closed
- relates to
-
MDEV-22367 Remove write support for ROW_FORMAT=COMPRESSED
- Closed
-
MDEV-23571 InnoDB does not raise a warning if ALTER TABLE index operations are optimized away
- Open
-
MDEV-31589 Secondary index MVCC access is unnecessarily inefficient
- Stalled
-
MDEV-5800 indexes on virtual (not materialized) columns
- Closed
-
MDEV-11215 Several locks taken to same record inside a transaction.
- Stalled
-
MDEV-11655 Transactional data dictionary
- Open
-
MDEV-11658 Simpler, faster IMPORT of InnoDB tables
- Open
-
MDEV-14341 Allow LOCK=NONE in table-rebuilding ALTER when indexed virtual columns exist
- Open
-
MDEV-15140 Implement Partial / Filtered Indexes
- Open
-
MDEV-16962 Assertion `!error || !ot_ctx.can_recover_from_failed_open()' failed in open_purge_table upon concurrent ALTER and FLUSH
- Closed
-
MDEV-20301 InnoDB's MVCC has O(N^2) behaviors
- Closed
-
MDEV-20973 Allow write operations to an InnoDB tablespace that is locked with "FLUSH TABLES ... FOR EXPORT"
- Open
-
MDEV-22361 Cross-engine foreign keys support
- Open
-
MDEV-22363 Reimplement the InnoDB virtual column code
- Open
-
MDEV-25599 innodb_debug_sync for mariadb 10.5+
- Closed
-
MDEV-32050 UNDO logs still growing for write-intensive workloads
- Closed
-
MDEV-33067 SCN(Sequence Commit Number) based MVCC
- Open
-
MDEV-34515 Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size
- Closed