Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.
In multi-versioning, if a secondary index record has been marked for deletion, or if PAGE_MAX_TRX_ID indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view.
As noted in MDEV-16962, the purge of history must ensure that a delete-marked secondary index record is not visible in any MVCC view. If the secondary index comprises virtual columns, then the matching of clustered index record versions and the secondary index record may have to evaluate virtual column values.
As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait.
For clustered index records, determining whether a record is implicitly locked is easy: the hidden DB_TRX_ID column will belong to an active (or XA PREPARE; not committed) transaction. For secondary index records it is very expensive because there is no per-record DB_TRX_ID but only a PAGE_MAX_TRX_ID. Therefore the function row_vers_impl_x_locked() has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a blog post I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS.
If we had DB_TRX_ID in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes.
To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by repurposing the field PAGE_MAX_TRX_ID.
- PAGE_MAX_TRX_ID = 0: All records in the page are visible for all transactions.
- PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF: A DB_TRX_ID will be appended to all records, possibly multiple records per (key_cols,pk_cols) value.
- Else: There is no per-record versioning. (The old way.)
With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields.
Attachments
Issue Links
- blocks
-
MDEV-11634 Improve the InnoDB change buffer
-
- Closed
-
- relates to
-
MDEV-22367 Remove write support for ROW_FORMAT=COMPRESSED
-
- Closed
-
-
MDEV-23571 InnoDB does not raise a warning if ALTER TABLE index operations are optimized away
-
- Open
-
-
MDEV-31589 Secondary index MVCC access is unnecessarily inefficient
-
- Stalled
-
-
MDEV-5800 indexes on virtual (not materialized) columns
-
- Closed
-
-
MDEV-11215 Several locks taken to same record inside a transaction.
-
- Stalled
-
-
MDEV-11655 Transactional data dictionary
-
- Open
-
-
MDEV-11658 Simpler, faster IMPORT of InnoDB tables
-
- Open
-
-
MDEV-14341 Allow LOCK=NONE in table-rebuilding ALTER when indexed virtual columns exist
-
- Open
-
-
MDEV-15140 Implement Partial / Filtered Indexes
-
- Open
-
-
MDEV-16962 Assertion `!error || !ot_ctx.can_recover_from_failed_open()' failed in open_purge_table upon concurrent ALTER and FLUSH
-
- Closed
-
-
MDEV-20301 InnoDB's MVCC has O(N^2) behaviors
-
- Closed
-
-
MDEV-20973 Allow write operations to an InnoDB tablespace that is locked with "FLUSH TABLES ... FOR EXPORT"
-
- Open
-
-
MDEV-22361 Cross-engine foreign keys support
-
- Open
-
-
MDEV-22363 Reimplement the InnoDB virtual column code
-
- Open
-
-
MDEV-25599 innodb_debug_sync for mariadb 10.5+
-
- Closed
-
-
MDEV-32050 UNDO logs still growing for write-intensive workloads
-
- Closed
-
-
MDEV-33067 SCN(Sequence Commit Number) based MVCC
-
- Open
-
-
MDEV-34515 Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to MDEV-11655 [ MDEV-11655 ] |
Link | This issue relates to MDEV-11215 [ MDEV-11215 ] |
Link | This issue relates to MDEV-11658 [ MDEV-11658 ] |
Link |
This issue relates to |
Description |
In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.
In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view. As noted in As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait. For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post 2 years ago|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS. If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes. An indexed column can be modified multiple times during a transaction. If we also include {{DB_ROLL_PTR}} in the secondary index records, MVCC could construct earlier versions of the secondary index record by fetching earlier undo log records. |
In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.
In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view. As noted in As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait. For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS. If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes. An indexed column can be modified multiple times during a transaction. If we also include {{DB_ROLL_PTR}} in the secondary index records, MVCC could construct earlier versions of the secondary index record by fetching earlier undo log records. To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by slightly repurposing the field {{PAGE_MAX_TRX_ID}}. # {{PAGE_MAX_TRX_ID = 0}}: All records in the page are visible for all transactions. # {{PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF}}: All records in the page have {{DB_TRX_ID,DB_ROLL_PTR}}. # Else: There is no per-record versioning. (The old way.) With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields. |
NRE Projects | RM_105_CANDIDATE |
Link | This issue relates to MDEV-20973 [ MDEV-20973 ] |
Link | This issue relates to MDEV-22363 [ MDEV-22363 ] |
Link | This issue relates to MDEV-14341 [ MDEV-14341 ] |
Link | This issue relates to MDEV-22361 [ MDEV-22361 ] |
Component/s | Virtual Columns [ 10803 ] | |
Description |
In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.
In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view. As noted in As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait. For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS. If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes. An indexed column can be modified multiple times during a transaction. If we also include {{DB_ROLL_PTR}} in the secondary index records, MVCC could construct earlier versions of the secondary index record by fetching earlier undo log records. To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by slightly repurposing the field {{PAGE_MAX_TRX_ID}}. # {{PAGE_MAX_TRX_ID = 0}}: All records in the page are visible for all transactions. # {{PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF}}: All records in the page have {{DB_TRX_ID,DB_ROLL_PTR}}. # Else: There is no per-record versioning. (The old way.) With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields. |
In secondary indexes, InnoDB trades complexity for space, by not keeping track of version history for each secondary index record separately. This adds complexity to three interdependent areas: multi-version concurrency control (MVCC), purge of transaction history, and implicit locking.
In multi-versioning, if a secondary index record has been marked for deletion, or if {{PAGE_MAX_TRX_ID}} indicates that the page where the record resides has been recently modified, InnoDB has to look up the record in the clustered index and then painstakingly construct each available version of the record to find out if the secondary index record exists in the current read view. As noted in As noted in MDEV-11215, a record is implicitly locked by a transaction if it was written or modified by that transaction. If a conflicting lock request arrives from another transaction, that transaction may convert the implicit lock request to an explicit one (on behalf of the lock-owner transaction) before creating its own lock object for starting the lock wait. For clustered index records, determining whether a record is implicitly locked is easy: the hidden {{DB_TRX_ID}} column will belong to an active (or {{XA PREPARE}}; not committed) transaction. For secondary index records it is very expensive because there is no per-record {{DB_TRX_ID}} but only a {{PAGE_MAX_TRX_ID}}. Therefore the function {{row_vers_impl_x_locked()}} has to look up the matching version of the record in the clustered index before we can determine whether the secondary index record is implicitly locked. This and the similarly slow MVCC logic can lead to a 'death spiral' of a busy server. These operations slow down the purge of transaction history, and they become slower as the history grows. In my response to a [blog post|https://www.percona.com/blog/2014/12/31/small-changes-impact-complex-systems-mysql-example/] I mentioned Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS. If we had {{DB_TRX_ID}} in InnoDB secondary index records, the existence implicit locks could be detected directly based on that field, just like in clustered indexes. To reduce the space overhead, we could introduce a change to the secondary index leaf page header, allowing the following choices. This could be implemented by repurposing the field {{PAGE_MAX_TRX_ID}}. # {{PAGE_MAX_TRX_ID = 0}}: All records in the page are visible for all transactions. # {{PAGE_MAX_TRX_ID = 0xFFFFFFFFFFFFFFFF}}: A {{DB_TRX_ID}} will be appended to all records, possibly multiple records per (key_cols,pk_cols) value. # Else: There is no per-record versioning. (The old way.) With this, we could gradually convert secondary index pages to the new format as the pages are being modified. No change to the metadata would be needed; all secondary indexes would be in this hybrid format. Tables could be imported to old MariaDB versions, but secondary indexes would have to be dropped and re-created if any page contains the extra fields. |
Link |
This issue relates to |
Link |
This issue relates to |
Fix Version/s | 10.6 [ 24028 ] |
Rank | Ranked higher |
Link | This issue relates to MDEV-23571 [ MDEV-23571 ] |
Link | This issue blocks MDEV-15140 [ MDEV-15140 ] |
Rank | Ranked lower |
Fix Version/s | 10.6 [ 24028 ] |
Link |
This issue relates to |
Labels | lock performance purge | ServiceNow lock performance purge |
Labels | ServiceNow lock performance purge | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z lock performance purge |
Labels | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z lock performance purge | lock performance purge |
Workflow | MariaDB v3 [ 90387 ] | MariaDB v4 [ 130935 ] |
Link |
This issue blocks |
Link | This issue relates to MDEV-31589 [ MDEV-31589 ] |
Link |
This issue relates to |
Link | This issue relates to MDEV-33067 [ MDEV-33067 ] |
Issue Type | Task [ 3 ] | New Feature [ 2 ] |
Link | This issue relates to MDEV-15140 [ MDEV-15140 ] |
Link | This issue blocks MDEV-15140 [ MDEV-15140 ] |
Zendesk Related Tickets | 201658 130638 106007 | |
Zendesk active tickets | 201658 |
Link |
This issue relates to |
Assignee | Marko Mäkelä [ marko ] | Thirunarayanan Balathandayuthapani [ thiru ] |
Because multi-versioning in secondary indexes is based on duplicating the records, it does not appear necessary to store an undo log pointer (DB_ROLL_PTR) in secondary index records. In the clustered index, we must store it, because all versions of a record share a single primary key.
If we always stored a per-record transaction identifier in secondary index leaf pages, the access patterns that are explained in a lengthy comment at the start of the file read0read.cc will be greatly simplified. We would never have to visit the clustered index for checking if a secondary index record is visible or implicitly locked.
When InnoDB evaluates indexed virtual column values
The code that evaluates indexed virtual columns (
MDEV-5800) in InnoDB is causing hard-to-fix problems, such asMDEV-16962. A sign of virtual column evaluation is the function innodb_find_table_for_vc(). It is being called from innobase_allocate_row_for_vcol(), which in turn is called due to the following:Note: row_vers_impl_x_locked_low() (checking whether a secondary index record is implicitly locked) will retrieve the virtual column information from undo log records and does not have to evaluate anything. All undo log record types TRX_UNDO_UPD_EXIST_REC, TRX_UNDO_UPD_DEL_REC, TRX_UNDO_DEL_MARK_REC will contain the values of all indexed virtual columns.
Indexed virtual column evaluation that cannot be avoided
Until MDEV-22361 moves the FOREIGN KEY processing to the SQL layer, InnoDB must be able to evaluate indexed virtual column values when a CASCADE or SET NULL operation would change the values of base columns.
ALTER TABLE…ADD INDEX or CREATE INDEX on virtual columns must be able to evaluate the column values. The calling thread is necessarily holding MDL, and the THD and both old and new TABLE are available.
When an INSERT (or UPDATE of a PRIMARY KEY) replaces a delete-marked record, that record must either have been deleted by the current transaction, or a DELETE must have been executed by a previously committed transaction. Either way, the delete-marked record may be accessible to old MVCC read views. The TRX_UNDO_UPD_DEL_REC must store both the old and new values of all indexed columns. The information is strictly needed on ROLLBACK if the current transaction executed both the DELETE and the INSERT (or UPDATE).
Eliminating indexed virtual column evaluation in MVCC and Purge
It turns out that we will only need a trivial check of a per-record transaction identifier between the secondary index record, the current read view, or the clustered index record.
Let us prove this for row_vers_build_clust_v_col() in more detail. The following quote from the lengthy comment in read0read.cc is relevant:
The function row_purge_poss_sec() invokes row_vers_old_has_index_entry(true, …) on a clustered index record rec and a secondary index tuple ientry that has been constructed from rec (non-virtual column values) and the undo log record (indexed virtual column values). So, both rec and ientry are valid. At this point, we are not holding a secondary index page latch; only the clustered index leaf page is latched. The aim is to check whether any visible (active) version of rec matches ientry.
If we have a transaction identifier in the secondary index record, also this matching becomes a simple matter of comparing the DB_TRX_ID in the available clustered index record versions with the secondary index record.
Furthermore, we can avoid constructing the exact clustered index record value for each active version and converting it into secondary index record values. We will merely have to determine the DB_TRX_ID of each active version of rec by dereferencing the chain of DB_ROLL_PTR.
In other words, we can greatly simplify the function trx_undo_prev_version_build(). Let us consider the following example:
--source include/have_innodb.inc
disconnect mvcc;
Let us assume that the INSERT was transaction identifier 100 and the UPDATE was 101. The database contents before the disconnect mvcc will be as follows:
With a per-record transaction identifier in the secondary index, we will have 2 records and can filter out those transaction identifiers that do not match the visible clustered index record version.
Without a per-record transaction identifier, we are forced to construct the secondary index record from the visible clustered index record version. With the per-record transaction identifier, we can avoid that as follows:
MVCC (SELECT b FROM t)
Purge (row_purge_poss_sec(), after disconnect mvcc)
At the end of the purge (if it completes before DROP TABLE t), the table will contain the following:
MDEV-12288)Compatibility notes
Before this change, InnoDB will treat secondary index leaf pages with the new two PAGE_MAX_TRX_ID values as corrupted. Hence, tables can be exported to old versions only after dropping all secondary indexes.
We might want to retain support for the old format (without per-record secondary indexes) and remove the support in a later major release. Until the support is removed, this would allow users to upgrade without having to rebuilding secondary indexes (or entire tables).
We could also require that all secondary indexes be rebuilt on upgrade. Any secondary index leaf pages where PAGE_MAX_TRX_ID is not one of our two special values would be treated as corrupted, and the secondary indexes would not be updated, and user transactions would be aborted.